I have been looking at the Kümmel Mandelbrot Benchmark V 0.53 MT and have found some possible improvements, pending verification.
Further improvements have been made. OK, this should be my final version for a while.
Well, a while has transpired, and now it's a new year.
Oops, yesterday's version had a bug in the X4 ThreadProc that prevented threads from rendering their first row correctly; rather than create a new version I just repaired it in situ.
I also include a table of results.
I have seen some strange results for the new AMD phenom processor. Here is an attempt to make the X3 version run at more or less normal speeds on that processor.
Since my above attempt was successful, I am now trying to obtain maximum performance simultaneously on Core 2 Duo and phenom with this.
Following the an FASM thread I have tried an SSSE3 version of memcpy(), memcpy.asm with output in memcpy.txt.

Mandelbrot Benchmarks for 32- and 64-bit Code [explanation]
Processor Model Speed
(MHz)
Cores/
Socket
Sockets HT OS FPU SSE2 quickman 2T:X1 2T:X2 2T:X3 2T:X4 MT:X1 MT:X2 MT:X3 MT:X4
Core 2 Quad Q6600240041NOServer 2003 Standard x64 591.6371281.249643.1710.4241049.178 1463.4241715.3911374.5851945.896 2798.5933129.080
Xeon E5310200042NOVista U x64 1135.2742437.131 611.977905.7261260.5001470.335 1194.0301734.5042399.5642749.169
AMD Phenom 9500250041NOUnknown 635.0821260.500 623.9351142.545 627.7091541.3041177.1022072.169 1181.5692725.107
Core 2 Duo E6700266021NOXP x64 336.684738.218713.4 815.5691207.9281678.4011958.135 809.2101181.5691636.4961921.873
AMD Phenom 9500166641NOUnknown 415.235761.231 418.4721029.234789.2101393.035 799.3411834.341
Athlon 64 FX-70260022NOVista U x64 655.816899.836 521.077622.687672.448897.243 979.8381157.4111224.5561647.320
Core 2 Mobile T7200200021NOServer 2003 Enterprise x64 250.482547.899532.4609.581900.487 1260.5001470.335603.964883.224 1231.8241429.821
Core 2 Mobile T7200200021NOServer 2003 Standard x64 251.849550.807532.4609.581903.099 1255.4171470.335601.630885.757 1231.8241429.821
Pentium D Presler 950340021NOXP x64 213.134491.272469.0 542.410809.210960.9371186.070 530.624785.229932.8641135.254

Table explanations:
Processor Model
Description of processor used
Speed (MHz)
Processor clock speed in megahertz
Cores/Socket
Number of cores per processor
Sockets
Number of processors used in machine
HT
ON = Hyperthreading enabled, OFF = Hyperthreading disabled, NO = Processor incapable of hyperthreading.
OS
Operating system used for benchmark
FPU
Millions of iterations/second for KMB_V0.53_MT_FPU.exe from Kümmel 32-bit benchmark
SSE2
Millions of iterations/second for KMB_V0.53_MT_SSE2.exe from Kümmel 32-bit benchmark
quickman
Millions of iterations/second for quickman 32-bit benchmark To be comparable with the other benchmarks, a few steps were required:
2T:X1
Millions of iterations/second for KMB_V0.57_2T_X1.exe from Xorpd! 64-bit benchmark
2T:X2
Millions of iterations/second for KMB_V0.57_2T_X2.exe from Xorpd! 64-bit benchmark
2T:X3
Millions of iterations/second for KMB_V0.57_2T_X3.exe from Xorpd! 64-bit benchmark
2T:X4
Millions of iterations/second for KMB_V0.57_2T_X4.exe from Xorpd! 64-bit benchmark
MT:X1
Millions of iterations/second for KMB_V0.57_MT_X1.exe from Xorpd! 64-bit benchmark
MT:X2
Millions of iterations/second for KMB_V0.57_MT_X2.exe from Xorpd! 64-bit benchmark
MT:X3
Millions of iterations/second for KMB_V0.57_MT_X3.exe from Xorpd! 64-bit benchmark
MT:X4
Millions of iterations/second for KMB_V0.57_MT_X4.exe from Xorpd! 64-bit benchmark
top table