Simulating the Presidential Polls


One of the problems I have with media descriptions of polling is that they take a single poll and make no attempt to compare them to similar polls. Therefore, if there is a poll where Obama has a 2 point lead and the standard error of the poll is 2 points they then assume that Obama and McCain are tied. And rather than do any kind of simple trend analysis, they run with the naive point of view and try to make a story out of it.

The problem is that if Obama indeed had a 2 point lead and the error in that analysis was about 2 percent, then McCain would be winning a lot more polls than he is at present. By my calculations, if that were the case, then Obama would win about 76% of the polls and McCain would be winning about 24% of the polls. Instead, Obama is winning the overwhelming majority of polls. For example, by my count, Obama has won the last 30 polls 26 times, has lost 2 times, and has tied 2 times (90.0% margin). Over the last 70 polls, by my count, Obama has won 62 polls, lost 5 and tied 3, for a margin of 90.71%. This winning ratio is not well explained by a 2 percent margin with a 2 percent stardard deviation.

I have written a polling simulator named sim_polls.pl, whose source code is available here. If you run it with no parameters, it assumes a 2 percent advantage to Obama and a 2 percent standard deviation. You can pass different parameters to the program to simulate different advantages and different standard deviations of the advantage and then calculate how many polls a candidate should win given those variables. Any randomness is assumed to be normally distributed. Any systematic errors are ignored. These are all naive assumptions, but necessary at this point.

Sample results include:

./sim_polls.pl --verbose 3.7 2
This program does a Monte Carlo simulation of the polling of
the current election to determine if the media gabbing about
the polls is in any way representative of the data presented.
Currently - 8-24-2008 - using polls listed on pollster.com,
Barack Obama is leading by a score of 62-3-5 over the last 70
polls. This is a ratio of 90.71%.

Despite this, people continue to insist that the race is dead
even. This program simulates the distribution of 10000 occurrences
of 70 polls by assuming that Obama has a 3.7 percent lead
with a 2 percent standard deviation. These popularity
profiles are assumed to be normally distributed.


Greater total: 633265
Lesser total : 66735
Greater pct  : 90.4664285714286
Lesser pct   : 9.53357142857143
Greater max  : 70
Lesser max   : 19