Wednesday, August 29, 2007

Preseason Analysis Part II - Correlation with Regular Season, More on Expected Wins

Previously on Preseason Analysis...
And now Part II of Preseason Analysis.

Preseason wins and losses are essentially meaningless, but maybe team performance, as measured by things like run efficiency and pass efficiency, still has some information whose accuracy carries over into the regular season. Intuitively, first-string players are still playing first-string players, so some true reflections of skill are bound to show up. So I decided to take preseason box scores from 1997-2006, and see what the stats show about teams' regular season performance. Please note that separating out first-stringer offensive stats only is too time-consuming (and possibly for something not that useful), so I'm just using total team stats for the game. Because of missing stats and the questionable usefulness of those stats, I've dropped punt returns, kick returns, and penalty first downs from my models.

Average preseason and regular season league averages
StatPreseasonRegular Season

Correlation of preseason league average with regular season league average
StatCorr. coef.P-value

R=Rush, P=Pass, SR=Sack Rate, 3C=3rd Down Conv., PY=Penalty Yards, IR=Int. Rate, FR=Fum. Rate

Simply put, the preseason game is appreciably different than the regular season game. The first thing you notice is that the preseason favors defense. Yards per play is lower, sack rates are higher, fewer third downs are converted, and fumble rates are higher. On the other hand, interception rates are lower. More penalties are called. Perhaps coaches are more conservative on offense, saving most of their plays for the regular season. Maybe it's because coaches give almost all of their QBs playing time.

Meanwhile, pass yards per play, penalty yards per game, and fumble rate are the only three stats whose preseason averages correlate significantly with regular season averages, but only penalty yards has a significant p-value. The p-value is essentially the probability that a correlation coefficient that extreme could be achieved with entirely random inputs. Usually, the level for a stat to be considered significant is at 5% or less. Given the small sample size of postseason, we'll excuse the higher p-values of fumble rates and pass efficiency. Preseason average rushing efficiency, meanwhile, has essentially no correlation with regular season average rushing efficiency. If the preseason has little meaning on a league-wide level, then how does it fare on a team scope?

Correlation of preseason stats with regular season stats, Unadj. VOLA
StatCorr. coef.P-value
IRT 0.1180.036931

O=Offense, D=Defense, M=Made, A=Allowed, G=Given, T=Taken

Offensive performance seems to correlate better overall between preseason and regular season than defensive performance, with turnover rates being the only exception. Surprisingly, almost all of the p-values are below 5%, with fumble rate given being the only significant exception. In other words, it's highly unlikely that random inputs could create similar correlation coefficients, so it's safe to assume that overall team preseason performance means something, just not much. If your team does well in the preseason, that's great, but it's hardly a guarantee of success. If your team does poorly, it's really not all that much to sweat about. Of course, this meets our expectations because second-string and third-string players get playing time they won't get in the regular season. If someone wants to take the time to sort through the box scores to figure out the efficiency stats for first stringers, they can be my guest. It's questionable how much the correlation coefficients would actually improve. Similar results can be seen with the correlation coefficients of preseason stats with regular season wins.

Correlation of preseason stats with regular season wins, Unadj. VOLA
StatCorr. coef.P-value

Out of curiosity, I decided to create a linear regression model of regular season win totals using the following preseason stats: pass efficiency, sack rates, and third down conversion rates. With 1997-2006 stats, I tested on each year in 1998-2006, using all previous years as training data. On average, the predicted win totals have a correlation of 0.30593 with the actual win totals, not very high. The yearly average of mean absolute error was 3.1519 games, about twice what it is when using regular season stats. The average R2 was 0.67495, which took me by surprise a little. 67.945% of the variance is accounted for by this data, compared to 79% for the regular season stats? I was expecting 40-50% tops.

What's more interesting about the model, however, is its ability to predict which teams will improve/decline the following season. In a manner similar to what I did here, I looked at which teams exceeded or fell short of their predicted win totals by more than the mean absolute error. Teams that outperformed their projected win total based on preseason stats are predicted to decline the next year, and teams that underperformed their projected win total are predicted to improve. Because there is some positive correlation between regular season and preseason stats, I expected some of the success using regular season stats to carry over. What I found, however, was that the model with preseason stats is slightly more accurate than the model with regular season stats. This might be a result of noise created by the extra inputs in the regular season stats model (e.g. kick and punt returns).

1998-2005 (predicting 1999-2006)
Accuracy predicting risers: 74%
Accuracy predicting fallers: 61.818%

2002-2005 (predicting 2003-2006)
Accuracy predicting risers: 67.857%
Accuracy predicting fallers: 67.742%

Preseason seems to have some useful meaning then. On the other hand, we're talking about a 6-game range that of which a team has to fall outside for it to be a faller/riser. If the projection is 8 wins (average), a team could be bad (5 wins) or very good (11 wins) and still be within the average error. The method predicts about 6-7 risers and 6-7 fallers every year, so it's accurately predicting 8-9 teams to improve/decline each year. That's pretty good, I think. Without further ado, here are the projected risers and fallers for 2007:


  • Houston Texans (9.7623 expected wins vs. 6 actual wins)
  • Jacksonville Jaguars (12.371 vs. 8)
  • Oakland Raiders (10.005 vs. 2)
  • Dallas Cowboys (15.354 vs. 9)
  • New York Giants (11.979 vs. 8)
  • Tampa Bay Buccaneers (8.6278 vs. 4)

Based on the accuracy and what other projection models have shown, I'd pick Jacksonville, Oakland, Dallas, and Tampa Bay as the ones to actually improve.


  • New York Jets (4.9253 expected wins vs. 10 actual wins)
  • Baltimore Ravens (8.2891 vs. 13)
  • Kansas City Chiefs (2.9868 vs. 9)
  • Chicago Bears (8.6448 vs. 13)
  • New Orleans Saints (5.9049 vs. 10)
  • San Francisco 49ers (4.1093 vs. 7)
  • Seattle Seahawks (2.0842 vs. 9)

Of these, I'd pick the Jets, Ravens, Chiefs, and Bears to decline. It really could go either way with the 49ers and Seahawks. That division is chaos.

After jumping through some hoops, I do seem to have found some relevance to the preseason. But it's nothing you couldn't find using regular season performance. As intuition would tell you, preseason performance is only slightly indicative of regular season performance.

2001 BAL@PHI box score was missing.


Brian Burke said...

A couple thoughts.

At first I thought you might be on to something big here. I thought we could isolate pre-season "starting squad" stats by using known starting RB and QB individual stats as proxies for team proficiency. Then regress those stats onto following regular season win totals.

We might get a fairly good projection of regular season performance. But then I thought, that's a lot of work and it's not likely to beat 1)conventional wisdom or 2)last years wins.

A second thought was we could look at the pre-season of "surprise" teams, such as last year's Saints, for indications they'd be very good in the regular season. If we saw similar patterns in other teams this pre-season, we might be able to make some "out on a limb" predictions.

Also, keep in mind a 4-game sample is affected very strongly by opponent strength.

Derek said...

I actually tried "starting squad" stats after week 3 and plug those into the regular season win totals regression model. That's Part I, which I plan on redoing now that the preseason's over. The sample size is not even 4 games. It's closer to one and change because most of the starters don't play that long, so clearly opponent effects and "luck" are going to play an inordinately large factor in the projections.

Anonymous said...

baby justin bieber ringtone download [url=]justin bieber baby video download hd [/url] baby baby justin bieber mp3 download

Anonymous said...

Will not plan on the purchase of a house inside Houston as an example with one of their particular loans, you must look not in the more intensely populated areas and in the direction of some of the smaller sized towns [url=]payday loans[/url] payday loan Besides the levy paperwork, there are additional money associated paperwork that you simply keep:

Anonymous said...

These people are capable of paying their lending products back devoid of falling further into debts Anthony Clarke Thus, people go on with the choice of choosing perfectly appropriate loan program for them