Tuesday, August 18, 2009

Explaining a Team's W-L Record

According to Baseball-Reference.com:

The Pythagorean Theorem of Baseball is a creation of Bill James which relates the number of runs a team has scored and surrendered to its actual winning percentage, based on the idea that runs scored/runs allowed is a better indicator of a team's (future) performance than a team's actual winning percentage. This results in a formula which is referred to as Pythagorean Winning Percentage....

There are two ways of calculating Pythagorean Winning Percentage (W%). The more commonly used, and simpler version uses an exponent of 2 in the formula.

W%=[(Runs Scored)^2]/[(Runs Scored)^2 + (Runs Allowed)^2]

More accurate versions of the formula use 1.81 or 1.83 as the exponent.

W%=[(Runs Scored)^1.81]/[(Runs Scored)^1.81 + (Runs Allowed)^1.81]

Expected W-L can then be obtained by multiplying W% by the team's total number of games played, then rounding off....

The rationale behind Pythagorean Winning Percentage is that, while winning as many games as possible is still the ultimate goal of a baseball team, a team's run differential (once a sufficient number of games have been played) provides a better idea of how well a team is actually playing. Therefore, barring personnel issues (injuries, trades), a team's actual W-L record will approach the Pythagorean Expected W-L record over time, not the other way around. Expected W-L is almost always within 3 games of actual W-L at the end of a season (although a recent exception is the 2005 and 2007 Arizona Diamondbacks, who both beat their expected W-L by 11 games). Deviations from expected W-L are often attributed to the quality of a team's bullpen, or more dubiously, "clutch play"; many sabermetrics advocates believe the deviations are the result of luck and random chance.
I agree with those who say that deviations reflect the quality of a team's bullpen. A more precise formula can be obtained by regressing winning percentage on two explanatory variables: RFA (runs scored/[runs scored + runs allowed]) and saves recorded by a team's bullpen. The result for the American League in 2008:
W-L percentage (expressed as a decimal fraction) = -0.44595 + 1.66556 x RFA + 0.002747 x saves

Adjusted R-squared: 0.899; standard error: 0.022 (i.e., 2.2 percentage points); t-statistics on the intercept and coefficients: -4.246, 7.319, 3.763 (all significant above the 0.99 level).
That is, the average American League team (RFA = .506, saves = 41) compiled a W-L percentage of .510. (The AL beat the NL in interleague play, thus enabling the AL as a whole to compile a better-than-.500 average.)

According to the Pythagorean formula, the LA Angels were the lucky recipients of 11 extra wins in 2008; that is, the formula underestimates the Angels' 2008 wins by 11. The regression equation, on the other hand, underestimates the Angels' 2008 wins by only 2. Generally, the regression equation (indicated by blue) gives much better results than the Pythagorean formula (indicated by black): "Luck" is a catch-all term for unexplained variance. It shouldn't be thrown around as if it has real meaning. In this case, the evidence suggests that a decisive factor in a team's W-L record is the quality of its bullpen -- especially the quality of its closers.