Bandwidth: a memory bandwidth benchmark




Description

My program, called bandwidth, is an artificial benchmark for measuring memory bandwidth, useful for identifying a computer's weak areas.

Version 0.16

The latest version performs sequential memory access tests using a range of chunk sizes. It requires an AMD64 or Intel64 processor running 64-bit Linux. It performs its tests both using 64-bit memory transfers and 128-bit transfers using SSE2 instructions.

Version 0.15

Version 0.15 uses preset chunk sizes to effectively test several types of memory:
  • Level 1 cache sequential read accesses
  • Level 1 cache sequential write accesses
  • Level 2 cache sequential read accesses
  • Level 2 cache sequential write accesses
  • Main memory sequential read accesses
  • Main memory sequential write accesses
  • Framebuffer sequential read accesses (Linux only)
  • Framebuffer sequential write accesses (Linux only)
  • String library routines

Change log

Version 0.16 enhancements

This version offers x86_64 assembly optimizations, particularly the use of SSE2 for 128-bit transfers.

Version 0.16 is 64-bit only as I haven't had time to finish improvements to the 32-bit assembly code.

Version 0.15 enhancements

I made two key enhancements to version 0.15:
  1. I rewrote core test routines in x86 assembly.
  2. I switched the code to using microsecond timing instead of whole seconds.

Results

The following table juxtaposes the results from the 32-bit variant of bandwidth and the 64/128-bit version.

The first interesting thing to notice is the difference in performance on my Lenovo Celeron 550 in between 32 bit transfers and 64/128 transfers. It shows that if programmers were to go through the trouble to probably revise software to use 64 and 128 bit transfers, speed-ups could be achieved along the lines of that people originally hoped 64-bit processors would provide.

The second insight is that, although the AMD Quad Opteron has a very fast front-side bus, it is still up against the limits of DRAM, which can only accept write data so quickly.

And finally you might observe that the lowly 1.3 GHz SU4100 processor was able to write to main memory at 32 bits per transfer faster than a 2 GHz Celeron 550 at 32 bits presumably not because it has a faster front-side bus (800 versus 533 MHz), but that the newer generation of DDR3 memory writes fast.

All bandwidth values are in millions of bytes per second, rounded to 4 significant figures.

Transfer size Make/model CPU CPU speed Front-side bus speed L1 read MB/sec L1 write MB/sec L2 read MB/sec L2 write MB/sec Main read MB/sec Main write MB/sec Main memory RAM type/speed FB read MB/sec FB write MB/sec
64/128 bits AMD Quad Opteron 2352 2.1 GHz 1000 MHz 49700 (128 bit) 2800. (128 bit) PC2-5300?
64/128 bits Lenovo 3000 N200 Intel Celeron 550 2.0 GHz 533 MHz 32000 (128 bit) 14000 (64 bit) 13000 (128 bit) 9400 (64 bit) 3500 (128 bit) 2700 (128 bit). PC2-5300
32 bits Toshiba L505 Intel T4300 2.1 GHz 800 MHz 7983 7573 6775 4873 2571 1874 DDR2-800
32 bits Lenovo 3000 N200 Intel Celeron 550 2.0 GHz 533 MHz 7489 7125 6533 5007 2088 1290. PC2-5300 23.05 100.2
32 bits Toshiba A205 Intel Pentium Dual T2390 1.86 GHz 533 MHz 7098 6734 7095 5675 2146 1255 PC2-5300 23.36 84.50
32 bits Toshiba A135 Intel Core 2 Duo T5200 1.6 GHz 533 MHz 6086 5774 5241 3745 2000. 1030. PC2-4200
32 bits Acer 5810TZ-4761 Intel SU4100 1.3 GHz 800 MHz 4937 4682 4160. 3013 1803 1682 DDR3-1066
32 bits Dell XPS T700r Intel Pentium III 700 MHz 100 MHz 2629 2284 2607 1630. 448.5 163.7 PC100 6.680 47.40
32 bits IBM Thinkpad 560E Intel Pentium MMX 150 MHz Up to 66 MHz 500.7 75.49 520.6 74.81 86.64 74.32 EDO 60ns; 50 MHz

Download

Commentary

One has certain expectations about the performance of different memory subsystems in a computer. My program confirms these.
  1. Reading is usually faster than writing.
  2. L1 cache writing is usually almost as fast as L1 cache reading.
  3. L2 cache reading is roughly as fast as L1 reading.
  4. L2 cache writing is usually slower than L1 writing.
  5. If the L2 cache is write-through mode then L2 writing will be very slow and more on par with main memory write speeds.
  6. L2 cache reading is usually faster than main memory reading.
  7. L2 cache writing is usually faster than main memory writing.
  8. However framebuffer writing is usually faster than framebuffer reading.
  9. Framebuffer accesses are usually slower than main memory.

The main thing that reduces a computer's bandwidth is a write-through cache, be it L2 or L1. This is especially apparent in the Pentium 150 results above.

Author

Zack Smith, email.

Links





Valid HTML 4.01 Transitional

eXTReMe Tracker