Description
My program, called
bandwidth,
is an artificial benchmark for measuring memory bandwidth,
useful for identifying a computer's weak areas.
Version 0.16
The latest version performs sequential memory access tests using a range of chunk sizes.
It requires an AMD64 or Intel64 processor running 64-bit Linux.
It performs its tests both using 64-bit memory transfers and 128-bit transfers using SSE2.
Version 0.15
Version 0.15 uses preset chunk sizes to effectively test
several types of memory:
- Level 1 cache sequential read accesses
- Level 1 cache sequential write accesses
- Level 2 cache sequential read accesses
- Level 2 cache sequential write accesses
- Main memory sequential read accesses
- Main memory sequential write accesses
- Framebuffer sequential read accesses (Linux only)
- Framebuffer sequential write accesses (Linux only)
- String library routines
Change log
Version 0.16 enhancements
This version offers x86_64 assembly optimizations, particularly
the use of SSE2 for 128-bit transfers.
Version 0.16 is 64-bit only as I haven't had time
to finish improvements to the 32-bit assembly code.
Version 0.15 enhancements
I made two key enhancements to version 0.15:
- I rewrote core test routines in x86 assembly.
- I switched the code to using microsecond timing instead of whole seconds.
Results from 0.16
On my Celeron 550 with PC2-5300 memory, running 64-bit Linux at 2 GHz, I get the following key values:
- Using 128-bit SSE2 transfers I get around 32 gigabytes per second reading from L1 cache.
- Using 64-bit transfers I can write to L1 cache at 14 gigabytes per second.
- Using 128-bit SSE2 transfers I get around 13 gigabytes per second reading from L2 cache.
- Using 64-bit transfers I can write to L2 cache at 9.4 gigabytes per second.
- Using 128-bit SSE2 transfers I can read from main memory at 3.5 gigabytes per second.
- Using 128-bit SSE2 transfers I can write to main memory at 2.7 gigabytes per second.
- Reading 64 bits per access: L1 is about 5 times faster than DRAM and L2 is about 3 times faster.
On an AMD Quad Opteron 2352 running at 2.1 GHz, the 128-bit sequential read
maxes out at 49.7 GB/second, and maximum 128-bit sequential write speed
to main memory of 2.8 GB/second.
Results from 0.15
All bandwidth values are in millions of bytes per second,
rounded to 4 significant figures.
| Make/model |
CPU |
CPU speed (MHz) |
L1 read MB/sec |
L1 write MB/sec |
L2 read MB/sec |
L2 write MB/sec |
Main read MB/sec |
Main write MB/sec |
Main memory RAM type/speed |
FB read MB/sec |
FB write MB/sec |
| Lenovo 3000 N200 |
Intel Celeron 550 |
2.0 GHz |
7489 |
7125 |
6533 |
5007 |
2088 |
1290. |
PC2-5300 |
23.05 |
100.2 |
| Toshiba A205 |
Intel Pentium Dual T2390 |
1.86 GHz |
7098 |
6734 |
7095 |
5675 |
2146 |
1255 |
PC2-5300 |
23.36 |
84.50 |
| Dell XPS T700r |
Intel Pentium III |
700 MHz |
2629 |
2284 |
2607 |
1630. |
448.5 |
163.7 |
PC100 |
6.680 |
47.40 |
| IBM Thinkpad 560E |
Intel Pentium MMX |
150 MHz |
500.7 |
75.49 |
520.6 |
74.81 |
86.64 |
74.32 |
EDO 60ns; 50 MHz |
N/A |
N/A |
Download
Commentary
One has certain expectations about the performance
of different memory subsystems in a computer.
My program confirms these.
- Reading is usually faster than writing.
- L1 cache writing is usually almost as fast as L1 cache reading.
- L2 cache reading is roughly as fast as L1 reading.
- L2 cache writing is usually slower than L1 writing.
- If the L2 cache is write-through mode then L2 writing will be very slow and more on par with main memory write speeds.
- L2 cache reading is usually faster than main memory reading.
- L2 cache writing is usually faster than main memory writing.
- However framebuffer writing is usually faster than framebuffer reading.
- Framebuffer accesses are usually slower than main memory.
The main thing that reduces a computer's
bandwidth is a write-through cache, be it L2 or L1.
This is especially apparent in the Pentium 150
results above.
Author
Zack Smith, email.
Links
|
|