Wednesday, August 22, 2007

Preseason Analysis Part I - Using Preseason Stats to Predict Regular Season Wins

As a Dolphins fan, the preseason has been less than encouraging despite the 2 wins. The offensive line's run blocking sucks to put it mildly, and the pass protection has been mediocre at best. Trent Green's accuracy is off, and the secondary hasn't been great either. But the sample sizes have been extremely small, which is the most important reason why I think preseason performance has little meaning. Many of the first-string QBs and RBs have less than 20 attempts. If one game is hard to predict because of the natural variance in performance, than one regular season game's worth of plays won't tell you a whole lot about a team. Then again, skill is skill and should show up to some extent in any game, regardless of its meaning. Depth is also important. So I've decided to examine the validity of my assumptions over the next few articles.

First up, can we use preseason efficiency stats to estimate regular season win totals?

For this experiment, I decided to use yards per rush and yards per pass stats only to keep things simple. Using unadjusted Value Over League Average to predict regular season win totals in 2006, a system with only Off. and Def. Pass and Rush Efficiency stats had a mean error of 1.541 games and an R2 of 0.6224. A system using all unadjusted VOLA inputs had a mean error of 1.233 games and R2 of 0.77047 in comparison. So even with the reduction in detail, the retrodictive system is still pretty good.

For the preseason efficiency stats, I wanted to stick with the performance of first stringers as much as possible. They're the ones that are going to be playing all season (hopefully). For offensive efficiency stats, this was pretty straightforward. I just took the yards per pass from the QB stats page on and the yards per rush from the RB stats page. Clearly this relies on my selection of who's first string. Sometimes the true starter was injured or holding out. If you want to know exactly whose stats I chose, feel free to e-mail me. On defense, I wanted to use first half stats only, but those would have been non-trivial to obtain. In the interest of just getting a rough draft of the idea out there, I just used overall yards per rush/pass. To get a VOLA, I just used the average of the efficiencies as the "league average". Again in the interests of time, I kept that calculation very rudimentary. To predict regular season win totals, I simply pretend that the preseason VOLA stats are the regular season VOLA stats and plug them into the retrodictive system. In other words, we're assuming that the VOLA stats at the end of preseason will be the same as at the end of the regular season, though it's not clear at all that a strong correlation exists. In an upcoming article, I will look at the correlation between preseason efficiency and regular season efficiency.

Because the sample sizes for offensive stats were small, some teams ended up with extremely high or low VOLAs. This did not mesh well with the regression coefficients, which resulted in some teams being predicted to win less than zero games and some to win more than 16 games. Take the actual win totals with a grain of salt. But let's look at how it predicts the divisional standings.

2007 Predicted Final Standings Based on Preseason Stats through Week 3
AFC East

  1. New England, 6.9208 wins
  2. Buffalo, 4.3031
  3. New York Jets, 4
  4. Miami, -0.42247

New England tops the list thanks to an about average pass offense and pass defense efficiency. Ranks Buffalo too high when it should be last by most opinions.

AFC North

  1. Pittsburgh, 14.568 wins
  2. Cleveland, 9.0493
  3. Baltimore, 6.5127
  4. Cincinnati, 2.4849

Like Buffalo, Cleveland should probably be last, but the order is otherwise plausible.

AFC South

  1. Tennessee, 7.636 wins
  2. Houston, 7.2683
  3. Indianapolis, 6.3487
  4. Jacksonville, 6.161

Exactly the reverse order of what it should be. Indy's run defense is 26.255% above league average this preseason. It's pass offense is only 5.8% above average. Think that will last?

AFC West

  1. Oakland, 13.317 wins
  2. San Diego, 8.2215
  3. Kansas City, 5.0995
  4. Denver, 4.3164

Another case of one team being ranked too high instead of last. Oakland's run offense efficiency is 128.77% above league average. Lamont Jordan has had 8.4 ypc. Interesting that this and the rankings based on actual vs. expected wins in 2006 put Kansas City ahead of Denver, despite the near certainty that Larry Johnson's ACLs will spontaneously combust before Week 4.

NFC East

  1. Philidelphia, 23.678 wins
  2. Dallas, 11.756
  3. Washington, 9.7259
  4. New York Giants, 7.0968

Based on FO's prediction of Washington being on the rise, these rankings seem totally plausible.

NFC North

  1. Detroit, 10.856 wins
  2. Minnesota, 10.071
  3. Chicago, 6.7431
  4. Green Bay, 3.9523

Jon Kitna was right! They ARE going to win 10 games! Or not. Another division in reverse order of what it should be.

NFC South

  1. New Orleans, 9.145 wins
  2. Carolina, 8.435
  3. Atlanta, 8.2839
  4. Tampa Bay, -0.64007

0.8 ypc from Cadillac Williams would do that to Tampa Bay. The rankings in this division seem plausible.

NFC West

  1. Seattle, 17.836 wins
  2. San Francisco, 13.602
  3. St. Louis, 6.5272
  4. Arizona, 4.7789

Both Seattle and San Francisco benefit greatly from very strong offensive pass efficiency stats, but the ranking is plausible.

I'll refine and recalculate the system at the end of preseason and post the revised predictions. I'll also post predictions using pass and run offense efficiency stats based on entire team performance, rather than on one player at each position. But in addition to sample size issues, the opponent quality is much more varied from team to team, as they do not travel far for road games in the preseason. Expect a lot of noise in the projections because of it. In terms of regular season wins and losses, I don't think this system will be particularly accurate, but in terms of predicting division standings, preseason performance might yield interesting information.

No comments: