PAPI++ as De Facto Standard Interface for Performance Event Monitoring at the Exascale

The Exa-PAPI project within the US Department of Energy’s Exascale Computing Project (ECP) is developing a new performance application programming interface, named PAPI++, by taking advantage of modern C++ programming language features. PAPI++ offers a standard interface and methodology for using low-level performance counters for hardware types found across the entire compute system, including CPUs, GPUs, on/off-chip memory, interconnects, the I/O system, and energy/power.

PAPI++ is building upon classic-PAPI functionality and strengthening its path to exascale with a more efficient and flexible software design, one that takes advantage of C++’s object-oriented nature but preserves the low-overhead monitoring of performance counters and adds a vast testing suite.

In addition to developing a new C++ performance API from the ground up, the Exa-PAPI team is extending PAPI++ to have performance counter monitoring capabilities for new and advanced exascale hardware. Also part of this extension is software-defined event (SDE) support for exposing performance-critical events that originate from different software layers, such as communication and math libraries.

The Impact of Exa-PAPI

With the at-length development of a new PAPI++ software package—by leveraging modern C++ and extending the functionality of PAPI’s abstraction and unification layer into the realm of a more sustainable software design—the Exa-PAPI project is strengthening the ability of the high-level performance toolkits that utilize PAPI. Ultimately, everyone who uses PAPI++, regardless of whether they do so directly or through end-user tools, will benefit from the set of innovations the Exa-PAPI team is making available via the product.

Without the Exa-PAPI effort, the ECP community would lack a consistent, standard interface that offers the ability to not only monitor performance events for next-generation ECP hardware but also power/energy and export software-critical events from ECP libraries—all in a uniform way. Moreover, without PAPI++, software developers are destined to use multiple APIs to access hardware counters from across the system, which ultimately adversely affects productivity. As a result, performance assessment and improvement for multiple vendor platforms would become exceedingly difficult.

Current Progress

The development of PAPI’s new SDE API enables ECP software layers to expose software-defined events that performance analysts can use to form a complete picture of the entire application performance. Because software complexity is one of the fundamental issues the community will face at exascale, one of Exa-PAPI’s central goals is to close the gap between SDE monitoring and hardware performance counter monitoring. The design and development of the new SDE API is nearly complete and, thanks to the strong feedback from various ECP teams, the Exa-PAPI team has successfully integrated SDE’s into ECP applications (e.g., NWChemEx), math libraries (e.g., MAGMA-Sparse), and runtimes (e.g., PaRSEC) to export important internal behavior. Exa-PAPI’s new SDE functionality enables scientists to monitor the behavior of low-level linear algebra routines without the need for expert knowledge of the simulation code’s full software stack.

In March the latest version of PAPI (5.7) was released. This version ships a new component, called “pcp,” that interfaces to the performance co-pilot. It enables PAPI users to monitor IBM POWER9 hardware performance events, particularly shared “NEST” events without the need for root access. The 5.7 version also upgrades the (to date read-only) PAPI “nvml” component with write access to the information and controls exposed via the NVIDIA management library. The PAPI “nvml” component now supports both measuring and capping power usage on recent NVIDIA GPU architectures (e.g., V100).

Furthermore, the release adds power monitoring and performance counter monitoring support for recent Intel architectures such as Cascade Lake, Kaby Lake, Skylake, and Knights Mill. Last but not least, measuring energy consumption for AMD Fam17h chips is now available via the PAPI “rapl” component.

Researchers (the University of Tennessee)

Jack Dongarra, principal investigator (PI)
Heike Jagode, Co-PI
Anthony Danalis, Co-PI
Tony Castaldo
Frank Winkler
Gerald Ragghianti