NFL Stats in Perl


If you play with simple averages enough, you'll run into situations where a single value can skew the results markedly. For example, the sequence 1,2,3,4,100 averages to over 20, but the middle value, the median, is just 3. In 2007 Philadelphia, which didn't score over 16 points for almost half the season, had a one game exception where they scored 56 points versus Detroit. In the language of statistics, this point is an outlier and you want statistical methods that are resistant to these bad points.

Robust Statistics


Statistics that can handle outliers well are called robust statistics. And although we could write routines to give us these values, there is a Perl module that gets us headed on that way. The module is named Statistics::Descriptive and is found as part of the Ubuntu repositories. Through it you can calculate both the median and the trimmed mean of a sequence of values, which is useful to us in search of reliable estimators of our game stats.

The program that we will use to test our newfound ability to generate robust statistics is given here. This program gives the average point spread, the median point spread, the standard error of the point spread, and the trimmed mean of the spread.

If we run the program using data good through November 1, 2007, we get the following results:

desktop:~/perl/nfl$ ./spread_stats.pl
Global Statistics:
Games  Home Wins Winning_Score Losing_Score Margin
116        64        27.07         15.01     12.06


Rank  Team    Spread   Median    StdErr TrMean
---------------------------------------------------
1     NE      25.50    22.5   +/-  8.85  22.50
2     IND     17.43    19.0   +/- 10.16  17.80
3     DAL      9.86    10.0   +/- 16.39  12.40
4     GB       5.86     6.0   +/-  8.61   5.20
5     NYG      5.12     9.0   +/- 14.57   8.50
6     PIT     13.29    21.0   +/- 13.44  14.60
7     TEN      4.00     3.0   +/-  6.68   2.80
8     JAC      3.00     6.0   +/- 13.22   4.60
9     DET     -3.14     7.0   +/- 20.74  -0.40
10    SD       6.14    11.0   +/- 22.15   5.80
11    SEA      4.14     3.0   +/- 17.25   4.60
12    BAL      0.71     2.0   +/- 10.72   0.00
13    CLE     -1.29     6.0   +/- 15.23   0.80
14    CAR     -1.57     3.0   +/- 15.14  -0.40
15    WAS     -1.57     2.0   +/- 22.74   0.60
16    KC      -1.57     2.0   +/- 11.03  -1.60
17    TB       1.62     1.0   +/- 14.61   2.00
18    PHI      3.14    -3.0   +/- 15.84   0.00
19    ARI     -1.43    -2.0   +/-  7.07  -0.40
20    NO      -4.29    -3.0   +/- 18.34  -4.00
21    BUF     -5.43    -1.0   +/- 15.38  -3.40
22    DEN     -9.14    -6.0   +/- 14.83  -5.80
23    HOU     -3.75    -4.0   +/- 14.73  -3.75
24    CHI     -4.62    -6.0   +/- 11.17  -4.75
25    MIN     -0.86    -3.0   +/- 10.49  -3.40
26    OAK     -2.57    -3.0   +/- 11.04  -4.20
27    CIN     -4.86    -6.0   +/-  9.91  -4.00
28    SF     -11.14   -18.0   +/- 11.19 -12.00
29    ATL     -8.29    -7.0   +/- 10.58  -9.40
30    NYJ     -8.25    -7.0   +/-  7.72  -7.75
31    MIA     -9.75    -6.5   +/-  7.83  -8.25
32    STL    -15.00   -16.5   +/- 10.49 -15.25
desktop:~/perl/nfl$

Some things to note. New England has just won a huge game over the Washington Redskins, and so the average spread for New England is larger than their median or trimmed mean. The New York Giants, which lost a couple big games early but haven't lost since, have a median point spread about 4 points higher than their average. Pittsburgh, which wins big often, but not always, has a much larger median point spread than average spread. And San Diego, another good team that started slowly, shows a much higher median point spread than average point spread. Cleveland is another team whose median point spread is significantly higher than its point spread average.

On the flip side, it looks as if San Francisco had some initial success but otherwise is getting worse.