Musical notes are vibrations in the air that have an audible pitch. For humans, these pitches, or frequencies, are in the range of 20 - 20,000 cycles per second, or Hertz. Two notes are an octave apart if their frequencies differ by a factor of two.
But we typically want more notes within an octave. One constraint sometimes employed is that the notes be "well-spaced". This means that a pair of adjacent notes sound as far apart as any other pair of adjacent notes.
Now we jump into the mathematics...
Suppose we have 7 different notes within an octave. After 7 steps we require that the frequencies differ by a factor of 2. So, after one step, the notes differ by a factor of 2^(1/7), or about 1.104.
Is this a good musical scale? It depends, of course, on the criteria we use to judge it.
One way of judging the worthiness of a well-spaced scale is seeing how close it comes to other simple ratios. A given note and its octave have frequencies in a 2:1 ratio. So - perhaps it would be nice if other notes had frequency ratios of 3:2 or 4:3 as well. Two notes with simple frequency ratios tend to sound pleasing when played together.
One reason that 12 steps per octave is prevalent is that it comes very close to hitting the 3:2 and 4:3 ratios. How close? Well, 2^(7/12) = 1.498, and 1.5 would hit the 3:2 ratio exactly. So, 7 steps that are 2^(1/12) apart is very close to a perfect 3:2 ratio. In fact, if the notes are short in duration, they can sound perfect. How short? Well, that depends on the frequency of the notes. Suppose the two notes are the D above Middle C, and the A above Middle C. These notes have frequencies of 293.6 and 440 cycles per second, respectively.
Now, 293.6*1.5 = 440.4. 440.4 Hertz (Hz) would be a perfect 3:2 ratio with 293.6 Hz. But 440 is almost perfect - after a second, it is only .4 of a cycle behind. After 2 seconds it is .8 of a cycle behind, and after 2.5 seconds it is one full cycle behind. It is at this point that a trained human ear can hear that the two notes are not quite in a perfect 3:2 ratio. So - as long as the two notes are sounded for < 2.5 seconds, they will sound perfect!
How about a 4:3 ratio? Well, it turns out that 2^(5/12) = 1.335, and a perfect 4:3 ratio is 1.333. Again, 12 notes per octave gives us a great match.
If we judge the number of well-spaced steps per octave by their ability to hit the 3:2 and 4:3 ratios, we would get something like the following:

The number on the X-axis is the number of equally-spaced notes per octave (N, in the remainder of this page). The number on the Y-axis is a metric I made up that measures how close the scale is to hitting the 3:2 and 4:3 ratios.
Notice that N=12 gives us the first high peak. Higher Ns tend to give rise to better metrics - no surprise, they have more chances to hit the magic numbers. So, perhaps we should penalize the metric for high N - after all, a piano with 27 keys per octave would be huge! This leads to a graph like the following:

Notice N=12 starts to look very good here!
But wait - there are other simple integer ratios, like 5:4 and 5:3, that are worth considering. If we repeated the above graph using the 5:4 and 5:3 ratios, we get a far different picture:

Hmmm - this is very different! If we considered the 5:4 and 5:3 ratios to be the most important, we might have 34 notes per octave! Note that our usual choice for N, 12, is one of the poorest values on this graph, and 19 is strong.
What if we start with the 4 simplest consecutive-integer ratios? After 2:1 (which is built in to our model, though it need not be) we have 3:2, 4:3, 5:4, and 6:5. A metric using all four of these ratios gives the following graph:

Once again we see why N=12 is such a good choice. But what about, say N=19? It scores pretty high as well, and it would be fascinating to hear.
Another interesting series of ratios is {6:5, 7:5, 8:5, 9:5}. A graph based on this metric looks like this:

Wow - very different! Now N=13, not N=12, is the best low choice. N=19 appears to score nicely here also, and N=31 is the best.
One more for the road. Let's base the metric on 4:3 and 5:3:

Again, N=12 looks strong, but N=19 isn't too shabby either! I think that for a wide range of reasonable modelling assumptions, N=19 is the second-best choice after N=12. I'd love to hear some good 19-tone music, if you know of any, let me know.
If you would like to play around with the Excel file that produced all these graphs, here it is.