Stephane Eranian: HP
Robert Cohn and CK Luk:
Intel
The intended audience for this
tutorial are researchers and developers that want to learn more about
Itanium/Linux facilities for program analysis. We discuss two different
approaches, hardware performance monitoring and software instrumentation. This
tutorial gives an under the cover look at the underlying technology, tools for
program analysis, and applications that exploit this information. This tutorial
is useful for researchers who are designing hardware performance monitoring
features and want to understand how tools use them, people who want to write
program analysis tools that use instrumentation or hardware monitoring, and end
users who want to use tools to understand program performance and behavior.
We begin with an overview of the performance-monitoring unit for Itanium II. Itanium allows monitoring of 100+ hardware events and can give precise information about the address and instruction pointer for cache misses and branch events. Support for accessing the PMU is integrated into the Linux kernel and is controlled by the perfmon API. Perfmon simplifies the use of the PMU by providing in-kernel buffering of sampling and virtualization of counters. We present the design of perfmon and explain how to use it. We continue with two applications that use perfmon, pfmon and qprof. Pfmon is a command line tool to configure the PMU and collect samples, and qprof generates profiles. We conclude the discussion of hardware performance monitoring by examining how the ispike optimization tool uses data collected by pfmon to make programs run faster.
Next, we present the Pin dynamic instrumentation
tool. Pin provides a simple, but flexible API for transparently inserting new
code at runtime into an application. The new code is used to observe the
behavior of the program, and can be used to write profilers, memory leak
detectors, instruction trace generators, etc. We explain how to write
instrumentation tools with Pin by working through some examples. We conclude
the discussion of software instrumentation by describing the design of Pin. Pin
uses a just in time compiler to transparently and efficiently insert new code
into an application. Finally, we discuss the tradeoffs between using software
instrumentation and hardware performance monitoring. We identify the types of
program analysis each is best suited for and the relative costs.
Presenter information:
Robert Cohn is a Principal Engineer at Intel, where he works on
just in time compilation, dynamic instrumentation, and post link optimization
in Spike/Pin project. Previously, he worked for Digital and Compaq where he
implemented profile guided optimization in the product compiler and Om post
link optimizer for Alpha. He was a lead developer for the Spike post link
optimizer. Robert received a Ph.D. in Computer Science from Carnegie Mellon in
1992.
Stephane Eranian is a senior research
scientist at Hewlett Packard Labs where he has been working on the port of
Linux to the IA-64 platform since 1998. He has made numerous contributions to
the Linux/ia64 kernel and related user level programs. He is the main architect
of the Linux/ia64 kernel performance monitoring subsystem (perfmon). He is also
the creator of the pfmon tool that uses this subsystem to collect performance
information.
Before joining HP, Stephane
was working on his PhD at Chorus Systems (now Jaluna) in France. He holds a
D.E.A. (B.Sc degree) in Operating systems from Universite PARIS 6, France and a
Doctorate (Ph.D degree) in Computer Science from Universite PARIS 7, France. He
is a member of USENIX and co-author of the book "IA-64 Linux
kernel: design and implementation".
Chi-Keung (C-K) Luk is a Staff Engineer at the
Intel Massachusetts Microprocessor Design Center, where he currently works on
the Spike/Pin project. His research
interests include memory system performance, compiler optimization, binary
translation, and performance monitoring. He received his PhD from the
University of Toronto and was a visiting scholar at Carnegie Mellon University.
His dissertation titled "Optimizing the Cache Performance of Non-Numeric
Applications" was nominated for the ACM Doctoral Dissertation Award in
2000. He has published numerous papers at top conferences and journals and
filed several patent applications. He served on the program committees of the
34th MICRO and the First ACM Workshop on Memory System Performance.
* Other names and brands may be claimed as the property of others