Monday, July 16, 2007

The Predictive Ability of Averages

The predictions made by my and others' models are simply an expected level of performance given the averages. In other words, if this game were played an infinite number of times, we'd expect the average result to converge towards this number. Players don't play at the mean, however. They play above and below it. On any given Sunday, numerous unpredictable factors can influence performance levels and thus the outcome. Weather, in-game injuries, the coaches' playcalls, bad seafood, a scorned woman (e.g. Mrs. Nick Harper), a player soliciting prostitution from a cop, a player soliciting prostitutes for opposing players. Over time, however, these factors should balance themselves out, and everything but ability should be filtered out of the averages. So how predictive of the next week's performance are the averages?


To find out, I took the to-date averages of weeks 3-16 for the years 1996-2006 and matched them up with the in-game averages from weeks 4-17 of the same years. This matches up with my testing methods for the prediction models except I'll start making predictions for week 3. You can't really expect to make good predictions based off 1 or 2 weeks of data, however. So the test is how well do averages based on weeks 1 through W-1 correlate with the results in week W?
























Correlation coefficients of averages up to week W-1 with performances in week W
StatUnadj.Adj. for Opponent
RO0.146470.15258
RD0.0912640.11125
PO0.16958 0.1335
PD-0.00132360.069243
SRM0.0289810.036354
SRA0.233580.22261
PR0.0263460.023044
PC0.0635920.066901
KR0.0488250.030787
KC0.0397680.0099349
3CM0.127930.12153
3CA0.0668260.060486
PFD0.0484450.052096
PY0.0887830.083038
IRG0.000466380.0038317
IRT0.0460330.025835
FRG0.0587330.039033
FRT0.0232680.002728



Here we see a fundamental problem with the prediction models: Players don't play at their mean level, and thus, the averages aren't terribly predictive of performance within a single game. Even adjusting for performance doesn't really help. The yards per carry and yards per pass attempt stats are highly dependent on down-and-distance and current scoring margin situations. Busted plays might cause huge shifts in "momentum," as Marcus Allen's 74-yard TD run and Joe Thiesmann's pre-halftime pick six in Super Bowl XVIII did. And if Don McNeal hadn't slipped as he stopped following the receiver in motion, maybe he would have tackled John Riggins for a 2-yard-gain instead of letting him break off a 43-yard TD run, and the Dolphins would have won Super Bowl XVII. Little things like that greatly affect the in-game averages. Excluding that one play, Riggins averaged 3.324 yards on 37 carries, not a particularly good performance against a poor run defense (4.38 yards/carry allowed). Considering that he had a paltry 3.1 yards/carry that season, the odds of him getting that one long play were probably very small. But the play call by Gibbs put McNeal in the position to slip up. Was it luck? I wouldn't say that. Actually, reading up on the 2006 Cardinals in Baseball Prospectus, I came across this fitting quote: "Luck is the residue of design." Was it a repeatable event with predictive value? I have my doubts.

So happenstance can make the statistics more retrodictive than predictive, judging by the correlations with next-game scoring margins, next-game averages, and with season win totals. Sixteen games doesn't seem to be a large enough sample size to filter out all of the happenstances from the stats.

1 comment:

Brian Burke said...

Couldn't agree more.

Unfortunately, averages are the best we have. It's simply the best way to measure a central tendency in naturally distributed metrics like those in football and sports in general.

That's one reason I prefer logit probability models to linear regression models. It says "given these central tendencies, the win probability for each team is ..." It automatically allows for weekly deviations from the mean.

I've stated before a theory that a better rushing metric is the median yard per carry instead of the average.