This tutorial was held on April 12, 2021 as part of the 2021 ECP Annual Meeting.
At LLNL, we have developed a workflow enabling users to automate application performance analysis and pinpoint bottlenecks in their codes. Our workflow leverages three open-source tools – Caliper, SPOT, and Hatchet – to provide a wholistic suite for integrated performance data.
In this tutorial, the presenters provided an overview of each of the tools, and demonstrated how to profile your applications with Caliper, how to visualize your performance data in SPOT, and how to programmatically analyze your data with Hatchet. Caliper is a performance analysis toolbox in a library. It provides performance profiling capabilities for HPC applications, making them available at runtime for any application run. This approach greatly simplifies performance profiling tasks for application end users, who can enable performance measurements for regular program runs without the complex setup steps often required by specialized performance debugging tools.
SPOT is a web-based tool for visualizing application performance data collected with Caliper. SPOT visualizes an application’s performance data across many runs. Users can track performance changes over time, compare the performance achieved by different users, or run scaling studies across MPI ranks. With a high-level overview of an application’s performance, users are also quickly able to identify data that they might be interested in analyzing in finer-grained detail. Hatchet is a Python-based tool for analyzing and visualizing performance data generated by popular profiling tools, such as Caliper, HPCToolkit, and gprof.
With Hatchet, users can write small code snippets to answer questions such as: What speedup am I getting from using the GPUs? Which portions of my code are scaling poorly? What differences exist in using one MPI implementation over another?
To answer these questions, Hatchet provides operations (e.g., sub-selection, aggregation, arithmetic) to analyze and visualize calling context trees and call graphs from one or multiple executions.