Saturday, June 16, 2007

The Value of Homefield Advantage Part II - Weather

Editor's Note: 2005 Saints games and the ARI/SF Mexico City game are thrown out because of neutral site issues. The Saints played "home" games in a warm weather environment and a dome. They also had a faux home game in cold, grim New Jersey. The numbers have been subsequently corrected. 7/11/07

During the NFC Championship this year, they threw out some stat about dome teams being winless in road NFC championship games or something along those lines. And the Saints followed that trend by falling to the Bears 39-14. In the summer, players from cold-weather cities aren't used to the intense heat and humidity of places like Miami. In the winter, players from warm-weather cities aren't used to the icy winds, sleet, and snow of places like Green Bay and Cincinnati. Dome teams always seem to be at a disadvantage on the road when they don't have the luxury of A/C during games. Well, teams like the 1999 Rams were also built for the speed they could achieve on Astroturf. Anyway, the numbers bear out weather playing a factor in home field advantage. See how the spread accounts for different weather situations.

Average Result/Average Spread
Cold @Warm @Dome @
@ Cold2.2983/-2.43983.9104/-2.74123.4324/-2.4611
@ Warm2.0283/-1.93792.4495/-2.49794.25/-2.7147
@ Dome0.3688/-2.70792.2527/-3.01572.7407/-2.4198
% of games won by home team/% of games home team was favorite
Cold @Warm @Dome @
@ Cold56.358%/66.302%64.792%/71.637%60.81%/66.32%
@ Warm59.109%/64.689%55.37%/65.25%61.92%/68.36%
@ Dome52.13%/67.42%56.41%/69.11%58.64%/58.49%

Average Result/Average Spread, Weeks 1-8
Cold @Warm @Dome @
@ Cold2.1404/-2.40242.7937/-2.85482.6522/-1.9890
@ Warm2.7115/-2.20073.2378/-2.22974.6964/-3.1594
@ Dome-0.3846/-2.66252.5469/-2.81110.4091/-1.9222
% of games won by home team/% of games home team was favorite, Weeks 1-8
Cold @Warm @Dome @
@ Cold57.192%/67.619%63.677%/72.903%56.52%/65.93%
@ Warm62.019%/65.789%60.84%/64.86%58.04%/72.46%
@ Dome50.77%/63.75%58.59%/71.11%51.52%/55.56%

Average Result/Average Spread, Weeks 9-17
Cold @Warm @Dome @
@ Cold2.4319/-2.47174.8794/-2.64714.1139/-2.8824
@ Warm1.5315/-1.74011.7622/-2.7363.9122/-2.4306
@ Dome1.0132/-2.74491.9931/-3.1984.3438/-2.7896
% of games won by home team/% of games home team was favorite, Weeks 9-17
Cold @Warm @Dome @
@ Cold55.652%/65.182%65.759%/70.588%64.56%/66.67%
@ Warm56.993%/63.861%50.61%/65.60%64.86%/65.74%
@ Dome53.29%/70.41%54.48%/67.33%63.54%/60.66%

Note: The Houston Texans normally play with the roof closed, so they are considered a dome team. Idea from Football Outsiders. Arizona is still a warm weather team, though.

Weeks 1-8 cover roughly September and October, while Weeks 9-17 cover November-January. Warm-weather teams lose some home field advantage in the winter, and cold-weather teams gain some. Dome teams just get screwed against cold weather teams. Home field advantage is surprisingly worth next to nothing. The spread is mostly insensitive to home field climate matchups, leaving it particularly inefficient in dealing with dome teams. But can this be corrected for? Let's assume that the same inefficiencies show up in my model.

Adding binary variables for the home field climate matchups to my model slightly increases its accuracy on 2001-6, from 61.213% to 61.916%. The correlation coefficients of these variables to the final score margin are all very weak, ranging from -0.02 to 0.055. All 3 of the dome team home variables have negative correlations. Perhaps dome teams are just less talented on average because of the Lions and the Texans.

I also took the average results for each climate matchup and created a schedule difficulty score based on that. The score had 0.09735 correlation with win totals, suggesting that weather has a small overall effect on games.

Perhaps the climate factor is categorized too broadly. Gametime temperature and weather (wind, rain, etc.) could be compared with monthly average temperatures for both team's home cities. NFL gamebooks dating back to 2002 are available on that include that sort of data. One could even calculate wind chill and heat index to account for wind speed and humidity. It's a project for the future.

Read More......

The Value of Home Field Advantage Part I

So let's go back to basics. Really basic stuff.

How much is home field advantage actually worth? Pretty straightforward question, but there are several ways to tackle the question. From 1994-2006, home field advantage was worth about 2.6362 points on average, but the standard deviation of the results was about 14.0780 points. 69.91% of the games fell within one standard deviation of the mean, or in other words, those games were within 2 touchdowns either way of the average result. As a side note, the outcomes fall in line with a normal distribution, which means that linear regression is a good way to try predicting future outcomes. So home field advantage on average matters to a small extent. Home teams win 58.81% of games. Between two teams very close in talent, go for the home team, but if one team is clearly better than the other, you're better off picking the better team. That's pretty much the conventional wisdom, isn't it?

So how does a predictor like the spread do in terms of valuing home field advantage? The average spread from 1998-2006 was -2.5346, very close to the actual average. The standard deviation, however, is only 5.7783. The extreme outcomes (games decided by 20+ points) have a 18.515% chance of occurring. There's very little incentive statistically to predict such large wins, though the outcome is more frequent than is perhaps expected. I ran a few experiments to classify games as big wins or close wins and for which team in the original research, and the more classes I introduced, the worse classification accuracy became. For the 4-class problem, 28-32% accuracy was the best I could do. The prediction systems play the odds and thus have a tighter range of margins than the actual outcomes. Tightening the bounds of the actual range within the training data does not help accuracy of the prediction systems I've implemented.

What's interesting to note is that the value of home field advantage fluctuates a fair deal from year to year, but reached a peak in 2005 and a deep, deep valley in 2006, which caused the accuracy of the spread and my prediction systems to similarly fluctuate, particularly on 2006. The chart below lays out all the specific numbers.

Average Actual ResultAverage Spread*Proportion of games won by home teamProportion of games home team was favorite
OVERALL** 2.6362 -2.534658.51%66.859%

Average Actual Margin of VictoryAverage Margin of Victory Predicted by SpreadProportion of games won by favoriteProportion of games in which favorite beat the spread

* Spread is negative when home team is favored.
** Spread covers 1998-2006, but the averages for actual outcomes are from 1994-2006.
*** Spreads for week 4 of 2000 could not be found and are not included in the 2000 spread stats.

Curiously, there's a correlation with the predictive performance of Football Outsider's DVOA stats as well. In 2005, two-thirds of games were won by the team with the higher DVOA. In 2006, that number fell to 55.80%. Without more years of data, it's hard to say if this is just some natural aberration. But there was something unusual about 2006. Was it rule changes, a change in how rules are enforced, a change in stadiums or playing fields? Is there anyway to account for the natural variance from year to year? For prediction systems like linear regression, one could alter the bias coefficient to reduce the bias towards home teams, but there's no guarantee that it'll improve accuracy.

In what ways are the spread and other prediction systems being inefficient in dealing with home-field advantage? One obvious place to start is the weather.

Read More......

Previous Works

This is an updated version of the "Previous Works" page from my research website:

Neural network quarterbacking: Michael Purucker

Michael Purucker began researching the use of neural networks for predicting NFL games in 1996. He used 5 basic statistics based on each team’s performance over the previous 3 weeks: Total yards gained – total yards allowed, Rush yards gained – rush yards allowed, turnover margin (takeaways – giveaways through fumbles and interceptions), time of possession, and victories. Out of all the networks used on the problem, a back propagation network performed the best, achieving a 70.83% accuracy rate over weeks 14 and 15 of the 1994 season. Using the Las Vegas spread, which predicts the winner and the margin of victory, the results improved to 75%. Two weeks, however, is a very small test set. In the 2 weeks, the BP network with the spread was 9 of 14 and 12 of 14 respectively. Three games is a very big variance, and there’s nothing to guarantee it won’t go 6 of 14 in some weeks. The rush yards statistic overlaps with the total yards statistic, and both of the statistics are comparing a team’s offense with its own defense by subtracting yards allowed from yards gained. This does not reflect the matchups that actually take place on the field. The victory input does not take into account the margin of victory, and close games can come down to random events that would have let either team win.

Neural Network Prediction of NFL Games

Joshua Kahn continued Purucker’s study, testing statistics from the entire season in addition to the previous 3 weeks and eliminating the victories input. Kahn cites Purucker’s system as being 60.7% accurate over time. Kahn’s 3 week averages were 37.5 and 62.5% accurate over weeks 14 and 15 of the 2003 season respectively. Using season-long averages, Kahn achieved 75% accuracy in each of those two weeks, while the ESPN experts achieved 57% and 87% accuracy on average (~72% 2-week average). The study has the same problem of an extremely limited test set, but the results demonstrate an interesting point. Using season-long averages rather than 3-week averages yields a better predictor. This makes sense given the larger sample sizes involved. Teams that start off the season 0-3 almost never make the playoffs, so a 3 game winning streak that leads to a 3-10 record does not reflect the team’s quality. That the ESPN experts experienced a 30% variance in the 2 weeks, like the predictor using 3-week averages, could reflect how recent performance can skew human perception.

NFL Point-Spread Ratings

Roger Johnson uses only the Las Vegas spreads to formulate rankings for each team in the league and based on those rankings, predicts the winner of each game for weeks 3-17. For 2003-2006, the system has averaged about 64% accuracy, making it slightly less efficient than the simple “Las Vegas favorite wins” predictor. About 65% of the teams favored by Las Vegas in weeks 3-17 of the 1999-2006 seasons won.

NFL Computer Handicapper Home Page

Daniel Imamura’s “Computer Handicapper” takes more advantage of the data found in the box scores to produce efficiency ratings for various aspects of each team. Rather than just yards per game, many of the ratings are based on yards per play but also factor in turnovers and touchdowns. Using these metrics along with the Las Vegas spread, the handicapper predicts the winner and the margin of victory. Though primarily intended for use on betting with or against the spread, the system has been 55-68% accurate over the 2001-2006 seasons in simply predicting winners.

Football Outsiers has gone beyond box scores and into play-by-play data and their own game charting project to devise an array of new statistics, the centerpiece being Defense-adjusted Value over Average (DVOA). The idea behind DVOA is that yards per game statistics are lossy data because the amount of yards gained in an individual play varies in true value based on the context of the down, yardage to go, field position, time left in the game, and the current score margin. In DVOA, each play is categorized as a success or failure and assigned some value based on the context. Given the baseline rates of success and the success value for each play’s context, a team’s overall performance is assigned a value over average, which is then adjusted for the opponent’s average performance. DVOA can be broken down into performance in any situation and by a certain subset of players, allowing for a very fine-grained evaluation of why a team is likely to win. The coarse team total DVOA, however, has shown to be a good predictor as well. In weeks 3-16 of the 2004 season, the team with the higher total DVOA won 67.3% of games. After week 17, the accuracy fell down to 65.625%, having predicted 7 out of 16 correct games. In the last week of the season, many playoff positions have already been decided, so some teams will rest their starters, which could skew the results. In 2005, the team with the greater DVOA won 66.67% of games, but in 2006, accuracy plummeted to 55.80%.

Two Minute Warning: Using NFL injury reports to predict winners

Injury reports categorize players as probable (P(Playing)=75%), questionable (50%), doubtful (25%), or out (0%). Using 1-P(Playing), an injury score is assigned to each player, and the score for each team is the sum of injury scores for all of its players. The team with the lower injury score won 52% of games in 2001-4. Teams with injury scores of 3 fewer points than opponents won 55% of games in that same time frame. When looking at changes in injury score from week to week, the team with the lower injury score "delta" won 55% of games. My original research included injury score inputs. In 2001-6, the team with the lower injury score won 51.947% of games (week 3-17 only). In the same time frame, the team with the lower delta won 52.036% of games. The obvious problem with the injury score is that it's not weighted for player values, but it's not entirely clear how to best do that.

Read More......

Friday, June 8, 2007

The Initial Research

The website built for the original research. Contains graphs, tables, and more complete descriptions.

Rather than predict the exact final score of games, which is dependent on a good deal of random factors, I tried to create a system that could predict the margin of victory/defeat for the home team. The idea being that the better team will win and the better they are, the more points they'll win by. Again, random factors do play a part in the final score margin, but it at least reduces the number of possible outcomes to approximately 93. From 1994-2006, the maximum margin of victory was 49 points, and the maximum margin of defeat was 43 points. Essentially, given what we know about the two teams, I'm trying to see what the expected margin of victory/defeat, the average of the possible outcomes weighted by their probability, is.

Using box scores from this site, I gathered statistics I would use as inputs: rush offense vs. defense (yards per game), pass offense vs. defense (YPG), punt return vs. return coverage (yards per return), sack rate made vs. sack rates allowed (sacks per pass play), time of possession (season average), and turnover ratio (season cumulative). Rather than look at each metric individually, I wanted to compare a team's offense with its opponent's defense in an input to keep in line with the output of "how much better do I expect this team to be." Football Outsiders being a major inspiration for the research, I didn't want to use YPG statistics, but I was forced to. To capture some of what they did, I took the statistics (except TOP and turnovers) and turned them into values over league average.

Value over league average = (Team average - League average) / League average

For defensive measurements, where giving up fewer yards than league average is desirable, I simply multiplied VOLA by -1.

In a similar fashion, I tried adjusting for opponent quality, weighting a team's performance in each game by the opponent's performance over league average. If the opponents gain 150 yards per game rushing and the league average is 100 yards, holding them to 150 yards rushing is an average performance rather than below average. Each yard allowed is worth only 100/150 = 2/3 of a yard. Thus...

Adjusted VOLA = (Team's adjusted average - League average) / League average

If the equations are just stating the obvious, my apologies. To compare home team rush offense with away team rush defense, I merely subtracted the away team's rush defense AVOLA from the home team's rush offense VOLA. Same with the converse and for pass offense, pass protection, and punt returns.

In addition to the statistics, I also used home field climate as an input. Based on Football Outsiders' work, I divided home field climate into 4 types: warm, cold, dome, and Denver (high altitude). For each team, there are 4 binary (0 or 1/true or false) variables representing the 4 possible climate types.

So we have all of these statistics. What do we do with them? Well, as I was studying artificial intelligence when I did this, I went for methods such as artificial neural networks and support vector machines. The methods were tested on the years 2000-2006, one at a time, and training on all years previous to the test set year. I also tried comparing my results against the spread, which similarly represents the expected margin of victory/defeat.

All methods, including the spread, had the following weaknesses:

  1. 10 point barrier: Predictions were conistently off by at least 10 points on average from year to year. Only a couple methods on a couple years could get to a mean absolute error of 9.7-9.8 points. Interestingly, games in 1994-2006 were won by 11.329 points on average.
  2. Too many games classified as wins for home team: 58.51% of games were won by the home team from 1994-2006. Methods would regularly classify 65-80% of games as home team wins.
  3. Small range of predictions: The predictions usually ranged from about -10 to about 15. The actual range of outcomes is -43 to 49. 27.681% of games from 1994 to 2006 were won by more than 15 points. My guess is that this inflated the error, though I haven't checked this out for sure. As soon as I typed this, I put it on the to-do list, however. If you graph the actual outcomes vs. the predicted outcomes, it ends up looking like a parallelogram as seen below.
  4. Sensitive to yearly variations in home-field advantage: The average outcome of an NFL game from 1994 to 2006 was the home team winning by 2.63 points. 58.51% of those games were won by the home team. In 2006, however, only 53.125% of games were won by the home team, and the average result was the home team winning by 0.84 points. As a result, the performance of all methods, including the spread, was aberrantly poor. In 2005, the average result was the home team winning by 3.677 points, and the home team won 59.14% of games. As a result, the performance of all methods was aberrantly good.

The methods I used ran into the following problems:

  1. Some predictions close to zero: About 10-15% of the predictions were that the game would be won by less than one point. Furthermore, about half (5-8%) of those predictions were that the game would be won by less than half a point. As long as they're still not zero, they can still be used for win/loss prediction, but it just doesn't look good to say "Team A is predicted to win by 0.15 points." I'd be curious to see what the win/loss accuracy of these predictions is.
  2. Input stats have very low correlation to margin of victory/defeat: The problem with YPG stats is that they don't necessarily represent quality. Teams with comfortable leads run the ball more to eat up the clock, so the rushing metrics have the highest correlation to the margin. Correlation does not equal causation, however. It's possible the team used the passing game to rack up points quickly, and the defense took care of the rest. Conversely, teams that are behind will abandon the running game in favor of the passing game, which covers more ground in less time. Thus, the passing metrics have a low correlation to the margin.

    Other highly correlated inputs were the turnover ratios and the sack rate metrics.

Specific numbers are given here, but overall, the spread was clearly the best predictor in terms of the following metrics:

  • Win/loss accuracy
  • Average error (how many points off was it?)
  • Correlation of predictions with actual result
  • Proportion of games classified as home team wins.

Using the spread as an extra input, I could get better results in one or two areas in most years, but the spread was clearly carrying the load. Support vector machines with the spread did the best overall, giving stable predictions (unlike neural networks), but they're not easily interpretable models. Linear regression without the spread was 57-63% accurate, which was on the lower end of performance, but it's a model that is easily interpretable.

Where to go from here
For interpretability issues, I'm going to be mainly experimenting with linear regression. The tradeoff in accuracy isn't worth it at this point.

If the spread's the best predictor, then I think trying to model the spread could yield some knowledge that leads toward better predictions. It's a simple matter of replacing the final score margin with the spread as the output I'm trying to predict. I'll be following up on this soon.

The bias towards the home team in all of the methods needs to be taken down a notch. In the case of linear regression, 3.2-3.6 points were being added in favor of the home team automatically (via bias term), which is up to a point more than what should be added on average. Rather than using the bias linear regression comes back with, I could use the actual average result for that year, so in 2006, when the average result was 0.84 points in favor of the home team, an extra 2 points wouldn't be added. The bias could also be adjusted for the home field climate type, eliminating the need for the binary variables.

Most importantly, better inputs are needed. All of the computing power in the world isn't going to help otherwise. From the box scores, I can also include kickoff returns and third down conversion rates. I'm also going to follow up on that soon. Other than that, I think something like Football Outsiders' DVOA statistics are necessary. The DVOA stats take the context of every play into account, filter out random noise, are adjusted for opponent quality, and break down into very specific situations and for specific players and groups of players. DVOA for the pass defense actually measures quality of the pass defense, unlike the YPG stats.

As long as this article is, I glossed over a good deal in terms of specific results. To put it shortly, we could do better. In a way, I spent several months discovering what I pretty much knew already: YPG stats aren't very valuable, warm-weather teams have trouble in cold weather. But it was nevertheless interesting to quantify things like home-field advantage (more on this coming). The research is going to need time to evolve, and my resources are limited. Can't be afraid to fail.

Read More......


You know, they say the most successful pickup line is simply "Hi," so...


Welcome to the Football Prediction Network blog. As the name (part lack of creativity, part homage to GODZILLA 2000) subtly implies, I'll be delving into methods of predicting the outcomes of football games. For now, the focus will be exclusively on the NFL. Though if your interest is college football or the Arena league, feel free to join me. As per the GODZILLA 2000 homage, the Network is open to other people who want to contribute some research of their own. The blog's really meant as an open source repository of knowledge. An open discourse will hopefully lead to more promising things. E-mail me if you're interested in sharing or starting some research of your own, and I can set you up as a blog author. The first week or so is going to be spent recapping what I've done over the last several months, and then we'll see what happens. It's going to be trail and error, and the best way to get past the error is to just start working. So without further ado...

Read More......