Understanding the performance characteristics of exascale applications is necessary for identifying and addressing the barriers to achieving performance goals. This becomes more difficult as the architectures become more complex. The Performance Application Programming Interface (PAPI) provides library and application developers with generic and portable access to low-level performance counters found across the exascale machine, enabling users to see the relationships between software performance and hardware events. These relationships provide a critical step toward improving performance.
Performance metrics that monitor statistics like processor usage are available in all computers, from smartphones and laptops to supercomputers. These metrics are important for identifying and addressing bottlenecks to making software more efficient. This task gets more and more difficult as computers become more and more complex, making it particularly tricky for high performance computing.
Each type of hardware component—computer processor, memory, or interconnect, for example—has its own vendor-provided interface for monitoring performance and power statistics, as well as configuring hardware parameters, such as power caps. The Performance Application Programming Interface (PAPI), developed by the Exascale Computing Project (ECP), provides a unified consolidation of these statistics in a general interface that is portable across platforms. With PAPI, users can easily access these statistics rather than having to rely on multiple vendor-specific products. PAPI, like other performance monitoring software, runs in the background of ongoing applications, monitoring various hardware and software events to gauge their impact on software performance.
A few key points, however, set PAPI apart from other performance monitoring software. PAPI allows users to register and monitor new software-defined events, in contrast to most vendor tools, which typically only monitor hardware. Furthermore, software-defined events introduced into a library can be incorporated into any application that uses the library, allowing developers to monitor how well each library works within a certain application. These features enable software and application teams, both within and external to ECP, to better understand and improve the performance and power usage of their own software layers, often serving as a foundational tool while providing uniformity across systems.
PAPI has become a ubiquitous tool—often seen as a requirement—in high performance computing, and is pre-installed on most systems. In addition to providing support for the latest hardware and software layers, ECP also improved PAPI’s sustainability by enabling integration into Spack and the Extreme-scale Scientific Software Stack (E4S), and ensuring software robustness through continuous integration and deployment. With the ongoing integration of new monitoring capabilities for advanced hardware and software technologies, PAPI is well-positioned to meet the emerging needs of the high-performance computing community, continuing to make an impact well beyond the ECP era.
The Exascale PAPI (Exa-PAPI) project is developing a new C++ PAPI (PAPI++) software package from the ground up that offers a standard interface and methodology for using low-level performance counters in CPUs, GPUs, on/off-chip memory, interconnects, and the I/O system, including energy/power management. PAPI++ is building on classic PAPI functionality and strengthening its path to exascale with a more efficient and flexible software design that takes advantage of C++’s object-oriented nature but preserves the low-overhead monitoring of performance counters and adds a vast testing suite.
In addition to providing hardware counter-based information, a standardizing layer for monitoring software-defined events (SDEs) is being incorporated that exposes the internal behavior of runtime systems and libraries, such as communication and math libraries, to the applications. As a result, the notion of performance events is broadened from strictly hardware-related events to include software-based information.
Enabling the monitoring of hardware and software events provides more flexibility to developers when capturing performance information.
In summary, the Exa-PAPI team is preparing PAPI support to solve the challenges posed by exascale systems by (1) widening its applicability and providing robust support for exascale hardware resources; (2) supporting finer-grain measurement and control of power, thus offering software developers a basic building block for dynamic application optimization under power constraints; extending PAPI to support SDEs; and (4) applying semantic analysis to hardware counters so that application developers can better make sense of the ever-growing list of raw hardware performance events that can be measured during execution.
The team will channel the monitoring capabilities of hardware counters, power usage, and SDEs into a robust PAPI++ software package. PAPI++ is meant to be PAPI’s replacement with a more flexible and sustainable software design.