PROBABILITY OF A BETTER SCORE
Someone posted a problem to one of the math newsgroups asking how to calculate a probability. The situation is that this person wrote a program to try to generate a certain number. This number could be pi or e or whatever; we'll call it the "target". After 25 runs through this program, one of the numbers generated is found to be considerably closer to the target than the rest. The question is, what are the odds of getting a closer answer by continuing to run the program.
Assuming a relatively normal distribution of the sample, I came up with the following solution:
- Take the mean and standard deviation of the 25 numbers generated. These are shown on the graph below as x-bar and z.
- The target is indicated by the red line
- The best guess of the 25 is indicated by the solid blue line; it is distance "d" from the target.
- Construct the broken blue line distance "d" from the target on the other side from your best guess.
- The probability of getting a better guess than already generated would be the area under the curve which is between the blue lines, the distance from target-d to target+d. Knowing the mean and standard deviation of the sample, this can be easily calculated from a Standard Normal Distribution table.

You can feel fairly confident using the mean of the sample as the mean of the universe of outputs of this function, as it was described above. The value of the t-score for a 95% confidence level from a sample of 25 is 1.71. That means the 95% confidence interval for the true mean is:
Sample Mean +/- (1.71 * Sample Standard Deviation / (Number of Samples ^ .5))
Sample Mean +/-(1.71 * Sample Standard Deviation / (25 ^ .5))
Sample Mean +/- (1.71 * Sample Sanded Deviation / 5)
Sample Mean +/- (.34 * Sample Sanded Deviation)
For the original problem, assume that the 25 samples are in a normal distribution with a sample mean of 14.6 and a sample standard deviation of 4.24. That gives a 95% confidence interval for the universal mean of 14.6 +/- (.34 * 4.24), or 13.16 to 16.04.
Now let's assume that our target is pi, roughly 3.14, and that our best guess was 4.00. The difference, d, is 4.00 -3.14 or .86, so the "dotted line" value on the other side of pi would be 3.14 - .86 or 2.28. The best guess of 4 is 2.5 standard deviations below the mean: (14.6 - 4.00)/4.24 = 2.5 The lower limit of 2.28 is 2.9 standard deviations below the mean : (14.6 - 2.28)/4.24 = 2.9
On the Standard Normal Distribution table, the value of 2.5 is .4938 and the value of 2.9 is .4981. Since we want the value of the area between these two limits, we subtract: .4981 - .4938 = .0043. This tells us that there is only a 0.43% probability that our program will generate a random number closer to pi than the best guess of 4.00 that we've already generated.
Return To Home Page