Preseason Analysis Part II - Correlation with Regular Season, More on Expected Wins
Previously on Preseason Analysis...
And now Part II of Preseason Analysis.
Preseason wins and losses are essentially meaningless, but maybe team performance, as measured by things like run efficiency and pass efficiency, still has some information whose accuracy carries over into the regular season. Intuitively, first-string players are still playing first-string players, so some true reflections of skill are bound to show up. So I decided to take preseason box scores from 1997-2006, and see what the stats show about teams' regular season performance. Please note that separating out first-stringer offensive stats only is too time-consuming (and possibly for something not that useful), so I'm just using total team stats for the game. Because of missing stats and the questionable usefulness of those stats, I've dropped punt returns, kick returns, and penalty first downs from my models.Average preseason and regular season league averages Stat Preseason Regular Season R 3.8239 4.0708 P 5.5977 5.8866 SR 0.069901 0.068252 3C 0.37317 0.3777 PY 61.886 54.266 IR 0.026319 0.029888 FR 0.04069 0.031635 Correlation of preseason league average with regular season league average Stat Corr. coef. P-value R -0.023 0.94972 P 0.53682 0.10961 SR -0.03985 0.91297 3C -0.16464 0.64945 PY 0.72912 0.016728 IR 0.052921 0.88456 FR 0.41608 0.23171
R=Rush, P=Pass, SR=Sack Rate, 3C=3rd Down Conv., PY=Penalty Yards, IR=Int. Rate, FR=Fum. Rate
Simply put, the preseason game is appreciably different than the regular season game. The first thing you notice is that the preseason favors defense. Yards per play is lower, sack rates are higher, fewer third downs are converted, and fumble rates are higher. On the other hand, interception rates are lower. More penalties are called. Perhaps coaches are more conservative on offense, saving most of their plays for the regular season. Maybe it's because coaches give almost all of their QBs playing time.
Meanwhile, pass yards per play, penalty yards per game, and fumble rate are the only three stats whose preseason averages correlate significantly with regular season averages, but only penalty yards has a significant p-value. The p-value is essentially the probability that a correlation coefficient that extreme could be achieved with entirely random inputs. Usually, the level for a stat to be considered significant is at 5% or less. Given the small sample size of postseason, we'll excuse the higher p-values of fumble rates and pass efficiency. Preseason average rushing efficiency, meanwhile, has essentially no correlation with regular season average rushing efficiency. If the preseason has little meaning on a league-wide level, then how does it fare on a team scope?Correlation of preseason stats with regular season stats, Unadj. VOLA Stat Corr. coef. P-value RO 0.18156 0.0012547 RD 0.15345 0.0065266 PO 0.28506 2.8993e-007 PD 0.25502 4.8902e-006 SRM 0.10118 0.073865 SRA 0.24271 1.4125e-005 3CM 0.20165 0.00033031 3CA 0.17538 0.0018427 PY 0.22266 7.0842e-005 IRG 0.10879 0.054507 IRT 0.118 0.036931 FRG 0.047389 0.40343 FRT 0.11199 0.047738
O=Offense, D=Defense, M=Made, A=Allowed, G=Given, T=Taken
Offensive performance seems to correlate better overall between preseason and regular season than defensive performance, with turnover rates being the only exception. Surprisingly, almost all of the p-values are below 5%, with fumble rate given being the only significant exception. In other words, it's highly unlikely that random inputs could create similar correlation coefficients, so it's safe to assume that overall team preseason performance means something, just not much. If your team does well in the preseason, that's great, but it's hardly a guarantee of success. If your team does poorly, it's really not all that much to sweat about. Of course, this meets our expectations because second-string and third-string players get playing time they won't get in the regular season. If someone wants to take the time to sort through the box scores to figure out the efficiency stats for first stringers, they can be my guest. It's questionable how much the correlation coefficients would actually improve. Similar results can be seen with the correlation coefficients of preseason stats with regular season wins.Correlation of preseason stats with regular season wins, Unadj. VOLA Stat Corr. coef. P-value RO 0.03494 0.53798 RD 0.049808 0.37983 PO 0.18824 0.00081724 PD 0.18979 0.00073831 SRM 0.087023 0.12445 SRA 0.14418 0.01065 3CM 0.14428 0.010596 3CA 0.21909 9.2999e-005 PY 0.054488 0.33663 IRG 0.11554 0.04107 IRT 0.029016 0.60908 FRG -0.023628 0.67711 FRT 0.093994 0.096927
Out of curiosity, I decided to create a linear regression model of regular season win totals using the following preseason stats: pass efficiency, sack rates, and third down conversion rates. With 1997-2006 stats, I tested on each year in 1998-2006, using all previous years as training data. On average, the predicted win totals have a correlation of 0.30593 with the actual win totals, not very high. The yearly average of mean absolute error was 3.1519 games, about twice what it is when using regular season stats. The average R2 was 0.67495, which took me by surprise a little. 67.945% of the variance is accounted for by this data, compared to 79% for the regular season stats? I was expecting 40-50% tops.
What's more interesting about the model, however, is its ability to predict which teams will improve/decline the following season. In a manner similar to what I did here, I looked at which teams exceeded or fell short of their predicted win totals by more than the mean absolute error. Teams that outperformed their projected win total based on preseason stats are predicted to decline the next year, and teams that underperformed their projected win total are predicted to improve. Because there is some positive correlation between regular season and preseason stats, I expected some of the success using regular season stats to carry over. What I found, however, was that the model with preseason stats is slightly more accurate than the model with regular season stats. This might be a result of noise created by the extra inputs in the regular season stats model (e.g. kick and punt returns).
1998-2005 (predicting 1999-2006)
Accuracy predicting risers: 74%
Accuracy predicting fallers: 61.818%
2002-2005 (predicting 2003-2006)
Accuracy predicting risers: 67.857%
Accuracy predicting fallers: 67.742%
Preseason seems to have some useful meaning then. On the other hand, we're talking about a 6-game range that of which a team has to fall outside for it to be a faller/riser. If the projection is 8 wins (average), a team could be bad (5 wins) or very good (11 wins) and still be within the average error. The method predicts about 6-7 risers and 6-7 fallers every year, so it's accurately predicting 8-9 teams to improve/decline each year. That's pretty good, I think. Without further ado, here are the projected risers and fallers for 2007:
Risers
Based on the accuracy and what other projection models have shown, I'd pick Jacksonville, Oakland, Dallas, and Tampa Bay as the ones to actually improve.
Fallers
Of these, I'd pick the Jets, Ravens, Chiefs, and Bears to decline. It really could go either way with the 49ers and Seahawks. That division is chaos.
After jumping through some hoops, I do seem to have found some relevance to the preseason. But it's nothing you couldn't find using regular season performance. As intuition would tell you, preseason performance is only slightly indicative of regular season performance.
2001 BAL@PHI box score was missing.