Tuesday, October 16, 2007

Predictions 2007 Week 7 and Accuracy Report

The predictions are based on yards per play on rushing and passing plays (sacks and sack yards included), sack rates, third down conversion rates, penalty yards per game, and fumble and interception rates. The models are trained on the 1996-2006 seasons. Linear regression predicts the final score margin (home team points - away team points). Logistic regression estimates the probability that the home team will win.

Also, I finally wrote a program to throw together some stats on the accuracy of my predictions. After the jump, you'll see accuracy for each week 3 through 6 and overall accuracy for weeks 3-6.

Based on opponent-adjusted stats

















GamePredicted Final Score MarginP(Home Team Wins)
BAL @ BUF-0.250653.5347
NE @ MIA-4.548436.8913
NYJ @ CIN7.535275.2756
TEN @ HOU0.916954.1300
IND @ JAX-0.611846.5277
PIT @ DEN0.315248.1660
KC @ OAK0.717951.4972
MIN @ DAL8.785475.8699
SF @ NYG10.780881.4045
CHI @ PHI8.523476.2157
ARI @ WAS7.934773.5261
TB @ DET-4.553034.7391
ATL @ NO-0.599448.8374
STL @ SEA6.420370.9617


Based on unadjusted stats
















GamePredicted Final Score MarginP(Home Team Wins)
BAL @ BUF-4.199539.9591
NE @ MIA-5.864832.6904
NYJ @ CIN5.897371.7787
TEN @ HOU1.382957.3695
IND @ JAX-0.257848.5868
PIT @ DEN-1.593041.5125
KC @ OAK1.871452.4718
MIN @ DAL8.561174.9064
SF @ NYG9.863979.8292
CHI @ PHI10.533880.4633
ARI @ WAS5.893768.8291
TB @ DET-2.493838.6871
ATL @ NO-1.288247.0080
STL @ SEA7.494475.1640


Only 47% of intradivision games have been won by the home team this year. Home field has been a disadvantage this year so far in these games. Some regression to the mean is possible, so you might want to stay away from betting on IND@JAX, KC@OAK and ATL@NO as the system thinks those games are pretty evenly matched.



10-18-07: Found misprint in accuracy numbers. Corrected them and cut out the weekly accuarcies, which will still make an appearance at the end of each week. Sorry for the mistake.

Accuracy for all weeks, opponent-adjusted stats
Linear regression
Win prediction accuracy: 73.6842%
Mean absolute error: 11.4704 points
Correlation with result: 0.2929
% of games predicted as home team wins: 68.4211%
Logistic regression
Win prediction accuracy: 73.6842%
Correlation with result: 0.2946
% of games predicted as home team wins: 71.9298%
Win prediction accuracy when 20%≤P(home team wins)<30%: 100.0000
Win prediction accuracy when 30%≤P(home team wins)<40%: 100.0000
Win prediction accuracy when 40%≤P(home team wins)<50%: 62.5000
Win prediction accuracy when 50%≤P(home team wins)<60%: 83.3333
Win prediction accuracy when 60%≤P(home team wins)<70%: 78.5714
Win prediction accuracy when 70%≤P(home team wins)<80%: 50.0000
Win prediction accuracy when 80%≤P(home team wins)<90%: 66.6667

Accuracy for all weeks, unadjusted stats
Linear regression
Win prediction accuracy: 64.9123%
Mean absolute error: 11.3041 points
Correlation with result: 0.3260
% of games predicted as home team wins: 66.6667%
Logistic regression
Win prediction accuracy: 66.6667%
Correlation with result: 0.3248
% of games predicted as home team wins: 64.9123%
Win prediction accuracy when 10%≤P(home team wins)<20%: 100.0000
Win prediction accuracy when 20%≤P(home team wins)<30%: 100.0000
Win prediction accuracy when 30%≤P(home team wins)<40%: 75.0000
Win prediction accuracy when 40%≤P(home team wins)<50%: 28.5714
Win prediction accuracy when 50%≤P(home team wins)<60%: 60.0000
Win prediction accuracy when 60%≤P(home team wins)<70%: 83.3333
Win prediction accuracy when 70%≤P(home team wins)<80%: 62.5000
Win prediction accuracy when 80%≤P(home team wins)<90%: 50.0000
Win prediction accuracy when 90%≤P(home team wins)<100%: 100.0000

4 comments:

Anonymous said...

You have Buffalo and Denver with a >50% probability of winning their Week 7 games, but their Predicted Final Score Margins are negative (indicating a loss). Can you explain?

Derek said...

Sure, no problem. The predicted final score margin and predicted probability of winning are computed by two separate programs. They use the same data, but the math behind the algorithms is different enough that there will be the occasional disagreement between the two programs. In these cases, you could just interpret the numbers to say that the game could easily go either way. For betting people, it means, "stay away from betting on this game."

The one caveat I will throw out about Denver is that the system seems to overrate them greatly. Using yards per play, small sample sizes can skew the averages, and Denver has run very few offensive plays this season (they're ranked in the bottom 10 of the league in that I think). That's why the power rankings had them so high until the loss to San Diego. Given that, you'd probably want to go with Pittsburgh.

Anonymous said...

Thanks for the response, Derek. That exactly answered my question.

The point you made in your caveat was interesting. I thought that your model's overrating of Denver in previous weeks was due to Denver outplaying their "true" ability (at least in terms of the stats that you use in your model). But you're saying that it was just the small sample size in their net passing yards per play stat (and/or other offensive passing stats that you use) that skewed their averages upward (e.g., maybe they had a couple big passing plays), and thus their predicted probability of winning.

Do you have more info like your Denver example, like maybe Kansas City's net passing yards allowed are skewed upward/downward because teams have mainly run the ball against them, so the number of plays that would make up that stat are far below league average, or something like that.

Also, I love the site and your detailed work. If you are interested, here are some other sites that use computer models to predict NFL games:

These all have good winning percentages (straight up):
http://www.bbnflstats.com/
http://footballpredictionnetwork.blogspot.com/
http://insider.espn.go.com/nfl/projections
http://www.mcs.sdsmt.edu/rwjohnso/nfl/nflrank.html
http://www.sportznutz.com/nfl/
http://www.toddhester.net/nfl07/index.html

These sites also have models, but they suck:
http://home.hawaii.rr.com/predictor/
http://www.cappersmall.com/betting/predictor.php

Derek said...

The small sample size was something that had occurred to me last week when Denver had their precipitous drop in rankings, but I didn't confirm it until today at NFL.com's stats page. Denver's ranked 30th in total plays from scrimmage. Buffalo and San Fran are 31st and 32nd respectively. KC is 14th.

One of my philosophies is that accuracy varies from year to year dependent on the value home-field advantage, which is surprisingly unstable. This year, an above-average number of games are being won by the home team, especially in interconference games, which are the most unstable in that respect. This probably means that every model will decline in accuracy in 2008. We're all using similar stats and methods.