SPREAD OF LOTTERY NUMBERS

This is not an attempt to find a statistical method to win the lottery; instead, it measures the "spread" of numbers within a lottery pick. It all began when the Mega Millions jackpot went over $200 Million; several friends and I decided to each throw in $5.00 to buy some tickets. We are all numerically astute and realize the odds against winning, but it was payday and a Full Moon, so we took a chance.

The way this game is played is that you first pick 5 numbers from the range of 1 to 52. Then you pick one Mega Ball number, also from the range of 1 to 52. If you hit the five regular numbers plus the mega number, you win the jackpot. Lesser combinations return lesser payments. In fact, our $40.00 worth of tickets returned total winnings of $2.00. So much for the early retirement plans!

I noticed that the 5 regular numbers that won (23, 25, 43, 46, and 49) seemed rather bunched up, one node in the 20's and one in the 40's. To measure the spread, I took the standard deviation of these numbers and came up with 10.962. Is that a large standard deviation for the possible outcomes or is it a reasonable spread?

I wrote a UBasic program which ran through all 2,598,960 possible combinations of 5 draws without replacement from the numbers from 1 through 52. This program took the standard deviation of each set of 5 as it was drawn. If the standard deviation was less than the minimum standard deviation already discovered or greater than the maximum standard deviation already discovered, it recorded this new standard deviation as either the new minimum or the new maximum and also recorded the five numbers which were drawn to produce it. After the program ran through the possibilities, the following answers were given:

Maximum Standard Deviation: 24.260 for combination 1,2,3,51,52

Minimum Standard Deviation: 1.414 for combination 1,2,3,4,5

These are not the only combinations which will give those standard deviations, just the first ones to be recorded by the program. Compared to the endpoints of the possible range of values, the 10.962 standard deviation of the supposedly compressed sample I was testing actually doesn't look too bad. However, I didn't actually know the mean or median standard deviation for the range of possibilities.

This program also randomly chooses about one-tenth of the standard deviations generated and stores them in a text file. This text file was imported into Access to calculate the mean and standard deviation of the approximately 260,000 standard deviations chosen. I later modified the program to compute the values for all 2.6 Million possibilities. The following table shows the results for the sample and then for the entire population:

Test Cases Mean Std. Dev.
Sample    260,526 13.094 3.491
All 2,598,960 13.100 3.483
It seems that the combination which seemed horribly bunched up to me was actually a fairly average distribution of the five terms; the value of its standard deviation, 10.962, is only about two-thirds of the standard deviation from the mean shown above. Not very unusual at all.

The extracted sample data was also imported into MiniTab to check the distribution. A histogram of the results with the normal curve superimposed is shown below.



The distribution seems to be fairly normal but skewed a few points towards the high end.

The program code to generate the min/max and extract the sample file is shown at this link. The program which runs all 2.6 Million combinations is at at this link. If you don't have UBasic, you can get a free download at http://archives.math.utk.edu/software/msdos/number.theory/ubasic/.html

If you'd like to save a lot of typing, I'd be glad to send you the UBasic program file. Email me at PAVEL314@comcast.net


Return To Home Page