Sabermetrics: Rethinking pitching statistics

It’s the first thing the announcer says when the pitcher takes the mound. “Justin Masterson went 11-15 last season,” or “Justin Verlander has 17 wins on the year.” For more than a century, wins and losses have been the first indicators most fans look to when they argue the merits of different pitchers. And they couldn’t be more meaningless as measuring sticks of pitching skill.

The very words “wins” and “losses” carry a certain gravitas. If a pitcher comes away with a big ‘W’ it means he carried the team on his back. If he ends up with an ‘L’ it means he didn’t do his job. “He” went out and faced the other team. “He” was the driving force in his team’s victory or defeat. And, of course, it was solely within “his” power to make sure that his teammates scored more runs than their opponents.

Of course, this is sheer lunacy. The starting pitcher is not the only player on his team. There are batters. There are fielders. Heck, there are even other pitchers! Not to mention that the other team also has batters and fielders and at least one other pitcher. Even if both starters throw complete games — a rarity in the modern era at any level — a minimum of 19 other players will shape the outcome of the game, as well as the coaches, the umpires, the wind, the size of the ballpark, the muddiness of the dirt and what the second baseman’s girlfriend said to him as he left for the stadium. You wouldn’t hold the author of the article on page three accountable for the entirety of this newspaper.

Put it another way: When looking at the final standings, would you conclude that the team that wins the most games necessarily had the best pitching staff? No? Then why give final standings even a second’s thought at the individual level?

The situation gets ever hairier when relief pitchers come into play. Wins and losses are assigned only to pitchers who are removed from the game after the last lead change. So, if a starter pitches eight shutout innings and leaves with a 1-0 lead but the team’s closer coughs up the game in the top of the ninth, the starter gets nothing. And if our team rallies back to win in the bottom of the ninth, the ‘W’ goes not to our starting ace but to our inept closer — who was technically on the mound when his team retook the lead (an event in which he had no hand). The so-called “no decision” is almost more insulting than an unjust ‘L’ as it implies that the starter had no impact on the game at all. All the while, an incompetent starter who gets shelled for 10 runs on five innings can be called a winner so long as his team scores 11 before he’s sent to the showers.

The next thing you’ll hear out of the color commentator’s mouth will probably be the pitcher’s earned run average (ERA), the number of runs scored for which the pitcher is held accountable divided by his innings pitched and multiplied by nine. Compared to wins and losses, ERA is a phenomenal tool for measuring pitching ability. But, of course, that isn’t saying much.

As with most of the worst conventional baseball statistics, the biggest problem with ERA is its blind internalization of factors that are out of the player’s control. In 1999, Voros McCracken dropped a bombshell on the baseball world with his argument that pitchers have very little control over the destinies of batted balls that land within the field of play. Though this theory has become more nuanced over time — pitchers do have a large degree of control over the types of batted balls they induce and certain kinds of pitchers are better at inducing weak contact than others — the basic idea is that strikeouts, walks, hit-by-pitches and (to a lesser extent) home runs are the only outcomes of plate appearances that are truly under the pitcher’s direct control. The difference between a groundout and an infield hit or between a flyout and a bloop single is mostly up to the hitter, the fielders, the ballpark conditions and luck.

The term “earned run” also opens a Pandora’s box of arbitrary distinctions that would seem ridiculous if they were not so engrained in the game. A run that scores thanks to a fielding error is not counted against the pitcher — if the third baseman bobbles a would-be groundout and the batter comes home later in the inning, that run should not have scored, so the pitcher is not to blame. But if a third baseman fails to field an identical ground ball simply because he could not get to it in time, the play is ruled a clean hit and therefore the pitcher’s fault. (This confusion is rooted in the near-existential insanity of fielding stats — we’ll get to that next week.)

Rather than using these superficial statistics, the best way to get a sense of a pitcher’s skill is to look at the numbers that are most under his control — namely strikeout, walk, home run and ground ball rates. Several statistics have been developed for the MLB that use these variables to estimate pitchers’ true-talent ERAs. Unfortunately there are no such estimators for the Ivy League, but these same input numbers will still give you a better picture of how good the Bears’ pitchers are than the ones you’ll see featured in the box scores.

What does this mean empirically? Looking at Bruno’s current roster, the starkest manifestation of ERA’s fickleness is starting pitcher Anthony Galan ’14. After six starts in 2013, Galan has a 4.06 ERA, almost a three-run improvement on his 7.02 ERA from 2012. Yet he’s walked or beaned 20 batters against just 16 strikeouts so far this season, actually a downturn from his 26 free passes and 42 strikeouts last year. So how has he managed to put up better numbers than he did in 2012?

The answer lies in batting average on balls in play (BABIP), the proportion of batted balls hit inside the field of play that fall for hits. BABIP is the easiest way to get a sense of how exogenous factors have affected a pitcher’s performance (a small deviation from the league mean might be attributable to the player’s skill, but most outliers are the results of random chance). So far in 2013, opposing hitters have a relatively low .269 BABIP against Galan — in 2012, it was an eye-popping .402. For some perspective on how bad the breaks were for Galan last year and how much better his fortune has been this season, the Bears’ overall BABIP-against this year is .335 — which just so happens to be the midpoint between Galan’s two extremes.

It’s still too early in both the NCAA and MLB seasons for most of the statistics to be significant, and pitchers are known for being less consistentthan their position-playing counterparts. But even once the numbers start to stabilize, don’t be fooled into thinking that the numbers you hear about on ESPN mean much at all.

Sabermetrics: Rethinking pitching statistics

Baseball falls to Yale in Ivy League Tournament championship game

Rahman ’26: Why I write

A snapshot of sports game attendance at Brown