CGO ’04 Tutorial

 

Title: Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux*

 

Stephane Eranian: HP

Robert Cohn and CK Luk: Intel

 

 

The intended audience for this tutorial are researchers and developers that want to learn more about Itanium/Linux facilities for program analysis. We discuss two different approaches, hardware performance monitoring and software instrumentation. This tutorial gives an under the cover look at the underlying technology, tools for program analysis, and applications that exploit this information. This tutorial is useful for researchers who are designing hardware performance monitoring features and want to understand how tools use them, people who want to write program analysis tools that use instrumentation or hardware monitoring, and end users who want to use tools to understand program performance and behavior.

 

We begin with an overview of the performance-monitoring unit for Itanium II. Itanium allows monitoring of 100+ hardware events and can give precise information about the address and instruction pointer for cache misses and branch events. Support for accessing the PMU is integrated into the Linux kernel and is controlled by the perfmon API. Perfmon simplifies the use of the PMU by providing in-kernel buffering of sampling and virtualization of counters. We present the design of perfmon and explain how to use it. We continue with two applications that use perfmon, pfmon and qprof. Pfmon is a command line tool to configure the PMU and collect samples, and qprof generates profiles. We conclude the discussion of hardware performance monitoring by examining how the ispike optimization tool uses data collected by pfmon to make programs run faster.

 

Next, we present the Pin dynamic instrumentation tool. Pin provides a simple, but flexible API for transparently inserting new code at runtime into an application. The new code is used to observe the behavior of the program, and can be used to write profilers, memory leak detectors, instruction trace generators, etc. We explain how to write instrumentation tools with Pin by working through some examples. We conclude the discussion of software instrumentation by describing the design of Pin. Pin uses a just in time compiler to transparently and efficiently insert new code into an application. Finally, we discuss the tradeoffs between using software instrumentation and hardware performance monitoring. We identify the types of program analysis each is best suited for and the relative costs.

 

Presenter information:

 

Robert Cohn is a Principal Engineer at Intel, where he works on just in time compilation, dynamic instrumentation, and post link optimization in Spike/Pin project. Previously, he worked for Digital and Compaq where he implemented profile guided optimization in the product compiler and Om post link optimizer for Alpha. He was a lead developer for the Spike post link optimizer. Robert received a Ph.D. in Computer Science from Carnegie Mellon in 1992.

 

Stephane Eranian is a senior research scientist at Hewlett Packard Labs where he has been working on the port of Linux to the IA-64 platform since 1998. He has made numerous contributions to the Linux/ia64 kernel and related user level programs. He is the main architect of the Linux/ia64 kernel performance monitoring subsystem (perfmon). He is also the creator of the pfmon tool that uses this subsystem to collect performance information.

 

Before joining HP, Stephane was working on his PhD at Chorus Systems (now Jaluna) in France. He holds a D.E.A. (B.Sc degree) in Operating systems from Universite PARIS 6, France and a Doctorate (Ph.D degree) in Computer Science from Universite PARIS 7, France. He is a member of USENIX and co-author of the book "IA-64 Linux kernel: design and implementation".

 

Chi-Keung (C-K) Luk is a Staff Engineer at the Intel Massachusetts Microprocessor Design Center, where he currently works on the Spike/Pin project.  His research interests include memory system performance, compiler optimization, binary translation, and performance monitoring. He received his PhD from the University of Toronto and was a visiting scholar at Carnegie Mellon University. His dissertation titled "Optimizing the Cache Performance of Non-Numeric Applications" was nominated for the ACM Doctoral Dissertation Award in 2000. He has published numerous papers at top conferences and journals and filed several patent applications. He served on the program committees of the 34th MICRO and the First ACM Workshop on Memory System Performance.

 

* Other names and brands may be claimed as the property of others