|
Description
My program, called
bandwidth,
is an artificial benchmark for measuring memory bandwidth,
useful for identifying a computer's weak areas.
Version 0.16
The latest version performs sequential memory access tests using a range of chunk sizes.
It requires an AMD64 or Intel64 processor running 64-bit Linux.
It performs its tests both using 64-bit memory transfers and 128-bit transfers using SSE2 instructions.
Version 0.15
Version 0.15 uses preset chunk sizes to effectively test
several types of memory:
- Level 1 cache sequential read accesses
- Level 1 cache sequential write accesses
- Level 2 cache sequential read accesses
- Level 2 cache sequential write accesses
- Main memory sequential read accesses
- Main memory sequential write accesses
- Framebuffer sequential read accesses (Linux only)
- Framebuffer sequential write accesses (Linux only)
- String library routines
Change log
Version 0.16 enhancements
This version offers x86_64 assembly optimizations, particularly
the use of SSE2 for 128-bit transfers.
Version 0.16 is 64-bit only as I haven't had time
to finish improvements to the 32-bit assembly code.
Version 0.15 enhancements
I made two key enhancements to version 0.15:
- I rewrote core test routines in x86 assembly.
- I switched the code to using microsecond timing instead of whole seconds.
Results
The following table juxtaposes the results from the
32-bit variant of
bandwidth and the 64/128-bit version.
The first interesting thing to notice
is the difference in performance on my Lenovo Celeron 550
in between 32 bit transfers and 64/128 transfers.
It shows that if programmers were to go through
the trouble to probably revise software to use
64 and 128 bit transfers, speed-ups could be achieved
along the lines of that people originally hoped 64-bit
processors would provide.
The second insight is that, although the AMD Quad Opteron
has a very fast front-side bus, it is still up against
the limits of DRAM, which can only accept write data
so quickly.
And finally you might observe that the lowly 1.3 GHz
SU4100 processor was able to write to main memory
at 32 bits per transfer faster than a 2 GHz Celeron 550
at 32 bits presumably not because it has a faster
front-side bus (800 versus 533 MHz), but that
the newer generation of DDR3 memory writes fast.
All bandwidth values are in millions of bytes per second,
rounded to 4 significant figures.
| Transfer size |
Make/model |
CPU |
CPU speed |
Front-side bus speed |
L1 read MB/sec |
L1 write MB/sec |
L2 read MB/sec |
L2 write MB/sec |
Main read MB/sec |
Main write MB/sec |
Main memory RAM type/speed |
FB read MB/sec |
FB write MB/sec |
| 64/128 bits |
|
AMD Quad Opteron 2352 |
2.1 GHz |
1000 MHz |
49700 (128 bit) |
|
|
|
|
2800. (128 bit) |
PC2-5300? |
|
|
| 64/128 bits |
Lenovo 3000 N200 |
Intel Celeron 550 |
2.0 GHz |
533 MHz |
32000 (128 bit) |
14000 (64 bit) |
13000 (128 bit) |
9400 (64 bit) |
3500 (128 bit) |
2700 (128 bit). |
PC2-5300 |
|
|
| 32 bits |
Toshiba L505 |
Intel T4300 |
2.1 GHz |
800 MHz |
7983 |
7573 |
6775 |
4873 |
2571 |
1874 |
DDR2-800 |
|
|
| 32 bits |
Lenovo 3000 N200 |
Intel Celeron 550 |
2.0 GHz |
533 MHz |
7489 |
7125 |
6533 |
5007 |
2088 |
1290. |
PC2-5300 |
23.05 |
100.2 |
| 32 bits |
Toshiba A205 |
Intel Pentium Dual T2390 |
1.86 GHz |
533 MHz |
7098 |
6734 |
7095 |
5675 |
2146 |
1255 |
PC2-5300 |
23.36 |
84.50 |
| 32 bits |
Toshiba A135 |
Intel Core 2 Duo T5200 |
1.6 GHz |
533 MHz |
6086 |
5774 |
5241 |
3745 |
2000. |
1030. |
PC2-4200 |
|
|
| 32 bits |
Acer 5810TZ-4761 |
Intel SU4100 |
1.3 GHz |
800 MHz |
4937 |
4682 |
4160. |
3013 |
1803 |
1682 |
DDR3-1066 |
|
|
| 32 bits |
Dell XPS T700r |
Intel Pentium III |
700 MHz |
100 MHz |
2629 |
2284 |
2607 |
1630. |
448.5 |
163.7 |
PC100 |
6.680 |
47.40 |
| 32 bits |
IBM Thinkpad 560E |
Intel Pentium MMX |
150 MHz |
Up to 66 MHz |
500.7 |
75.49 |
520.6 |
74.81 |
86.64 |
74.32 |
EDO 60ns; 50 MHz |
|
|
Download
Commentary
One has certain expectations about the performance
of different memory subsystems in a computer.
My program confirms these.
- Reading is usually faster than writing.
- L1 cache writing is usually almost as fast as L1 cache reading.
- L2 cache reading is roughly as fast as L1 reading.
- L2 cache writing is usually slower than L1 writing.
- If the L2 cache is write-through mode then L2 writing will be very slow and more on par with main memory write speeds.
- L2 cache reading is usually faster than main memory reading.
- L2 cache writing is usually faster than main memory writing.
- However framebuffer writing is usually faster than framebuffer reading.
- Framebuffer accesses are usually slower than main memory.
The main thing that reduces a computer's
bandwidth is a write-through cache, be it L2 or L1.
This is especially apparent in the Pentium 150
results above.
Author
Zack Smith, email.
Links
- Software
- FBUI: My in-kernel windowing system for Linux.
- FWE: my userspace windowing system.
- Bandwidth: My C-based benchmark to measure memory bandwidth.
- Foodist: My C++-based nutrition management program for Windows Mobile.
- Wanderlust: My C++/MFC based outline editor for Windows.
- Maxilla: My OpenGL-based VRML viewer for Windows.
- Anglify: my other-to-English translation program.
- My other stuff.
- Documentation
|
|