2022 ECP Annual Meeting

Each year, the Exascale Computing Project, or ECP, holds an annual meeting to highlight the many technical accomplishments of its talented research teams and to provide a collaborative working forum. Through a combination of talks, workshops, tutorials, and gatherings for planning and co-design, the meeting fosters project understanding, team building, and continued progress.

Among the attendees are hundreds of researchers, domain scientists, mathematicians, computer and computational scientists, US high-performance computing (HPC) vendor participants, and project management experts who support ECP’s research focus areas of Application Development, Software Technology, and Hardware and Integration. Also present are members of ECP’s external advisory group, the ECP Industry and Agency Council, and the project’s sponsors and stakeholder program managers from the Department of Energy, with representation from the Advanced Scientific Computing Research program of the Office of Science and the Advanced Simulation and Computing program of the National Nuclear Security Administration.

Behind the collaborative spirit of the ECP Annual Meeting is inspiration borne of a vision shared by technology leaders, industry luminaries, computational pioneers, and forward-thinking application, software, and hardware experts to push the boundaries of HPC to shape the nation’s future exascale ecosystem.

The objective of sharing the following content is to provide insights from the ECP 2022 Annual Meeting to the broad HPC community.


Affiliation Abbreviations:
  • AL: Ames Laboratory
  • AMD: Advanced Micro Devices
  • ANL: Argonne National Laboratory
  • BNL: Brookhaven National Laboratory
  • CAL: University of California, Berkeley
  • CBIT: Center for Biomedical Informatics and Information Technology
  • CMU: Carnegie Mellon University
  • CU Denver: University of Colorado Denver
  • CU: University of Colorado Boulder
  • DOD: Department of Defense
  • GE: General Electric
  • GFDL: Geophysical Fluid Dynamics Laboratory
  • HPE: Hewlett Packard Enterprise
  • ICL: Innovative Computing Laboratory
  • ISU: Iowa State University
  • LANL: Los Alamos National Laboratory
  • LBNL: Lawrence Berkeley National Laboratory
  • LLNL: Lawrence Livermore National Laboratory
  • MIT CCSE: MIT Center for Computational Science and Engineering
  • NASA: National Aeronautics and Space Administration
  • NERSC: National Energy Research Scientific Computing Center
  • NETL: National Energy Technology Laboratory
  • NOAA: National Oceanic and Atmospheric Administration
  • NREL: National Renewable Energy Laboratory
  • NU: Northwestern University
  • ORNL: Oak Ridge National Laboratory
  • PPPL: Princeton Plasma Physics Laboratory
  • PU: Purdue University
  • RU: Rice University
  • SBU: Stony Brook University
  • SFSU: San Francisco State University
  • SHI: Sustainable Horizons Institute
  • SLAC: SLAC National Accelerator Laboratory
  • SMU: Southern Methodist University
  • SNL: Sandia National Laboratory
  • TUD: Technical University – Dresden, Germany
  • UA: University of Alabama
  • UB: University of Buffalo
  • UC: University of Chicago
  • UDEM: Montreal university
  • UIUC: University of Illinois
  • UO: University of Oregon
  • UofU: University of Utah
  • UT: University of Tennessee
  • UW: University of Wisconsin
  • VT: Virginia Polytechnic Institute and State University


2022 Annual Meeting Tutorials

TitleDateMaterials (Slides and/or Recordings)Session Duration (Hours)Organizers/Speakers
Performance Tuning with the Roofline Model on GPUs and CPUs

The HPC community is on a never ending quest for better performance and scalability. Performance models and tools are an integral component in the optimization process as they quantify performance relative to machine capabilities, track progress towards optimality, and identify performance bottlenecks. The Roofline performance model offers an insightful and intuitive method for extracting the key computational characteristics of HPC applications and comparing them against the performance bounds of CPUs and GPUs. Its capability to abstract the complexity of memory hierarchies and identify the most profitable optimization techniques has made Roofline­-based analysis increasingly popular in the HPC community. This 180-minute tutorial is centered around four components. First, we will introduce the Roofline model and discuss how changes in data locality and arithmetic intensity visually manifest in the context of the Roofline model. Next, we will introduce and demonstrate the use of Roofline analysis in NVIDIA Nsight Compute and discuss several real-world use cases of Roofline in the NERSC NESAP effort. After a short break, we will return and introduce and demonstrate the use of AMD’s new Roofline tool for analysis of AMD GPU-accelerated applications running at OLCF. Finally, we will conclude by introducing and demonstrating Intel Advisor’s capability for Roofline analysis of applications running on Intel GPUs and CPUs as well as Roofline use cases at ALCF.

Mon, May 23.5Samuel Williams (LBNL), Neil Mehta (LBNL), Noah Wolfe (AMD), Xiaomin Lu (AMD), JaeHyuk Kwack (ANL)
Using HDF5 Efficiently on HPC Systems

HDF5 is a data model, file format, and I/O library that has become a de facto standard for HPC applications to achieve scalable I/O and for storing and managing big data from computer modeling, large physics experiments, and observations. Several ECP applications are currently using or planning to use HDF5 for I/O. Many new features, such as caching and prefetching, asynchronous I/O, log structured I/O, etc., have been developed in ECP for HDF5 applications’ taking advantage of exascale storage subsystems.

The tutorial will cover various best practices for using HDF5 efficiently, including performance profiling of HDF5, I/O patterns that obtain good I/O performance, and using new features such as asynchronous I/O, caching, prefetching effectively, log-structured I/O, and DAOS. The tutorial also covers UnifyFS to use the distributed node-local storage as a single file system. The tutorial presenters will use code examples and demonstrate performance benefits with efficient HDF5 usage.

Mon, May 2Recording3.5Suren Byna (LBNL), Scot Breitenfeld (The HDF Group), Houjun Tang (LBNL), Huihuo Zheng (ANL), Jordan Henderson (The HDF Group), Qiao Kang, Neil Fortner (The HDF Group), Wei-keng Liao (NU), Michael Brim (ORNL), Kaiyuan Hou (NU), Dana Robinson (The HDF Group)
GPU Capable Sparse Direct Solvers

In this tutorial we illustrate the use of the sparse direct solvers and factorization based preconditioners SuperLU and STRUMPACK on modern HPC systems with GPU accelerators. Sparse direct solvers rely on LU factorization and the accompanying triangular solution. They are indispensable tools for building various algebraic equation solvers. They can be used as direct solvers, as coarse-grid solvers in multigrid, or as preconditioners for iterative solvers. Particular benefits of direct solvers are their robustness and their efficiency when solving systems with multiple right-hand sides.

This tutorial will focus on the use of sparse direct solvers on GPU-capable HPC systems, either calling SuperLU/STRUMPACK directly, or through other solver libraries such as PETSc, Trilinos or MFEM. The focus will be on the hands-on session. Each participant will have a chance to work with SuperLU and STRUMPACK on a GPU cluster. We show how to get the best possible performance out of the solvers for different applications and different hardware platforms. Since a basic understanding of the underlying algorithms is required for performance tuning, this tutorial will also briefly introduce the algorithms used in SuperLU and STRUMPACK.

As we are approaching the exascale computing era, demand for algorithm innovation is increasingly high. Hence, we also discuss important advances in factorization based solvers towards the development of optimal-complexity scalable algorithms both in terms of flop count and data movement. We discuss several preconditioners based on incomplete sparse matrix factorization using rank-structured matrix approximations such as hierarchical matrix decomposition, block low rank and butterfly compression.

Mon, May 2Recording3.5Sherry Li (LBNL), Yang Liu (LBNL), Pieter Ghysels (LBNL)
CMake Tutorial

An interactive tutorial on how to write effective build systems using CMake.

Topics include:
* Variables
* Creating libraries
* Modern CMake (usage requirements)
* Improving build times
* Install rules
* Importing and exporting targets
* Testing with CTest
* Reporting results to CDash
* Find modules
* Generator expressions
* MPI support
* CUDA support
* Fortran support
* Features from recent CMake releases

Mon, May 23.5Zack Galbreath (Kitware, Inc.), Betsy McPhail (Kitware, Inc.), Julien Jomier (Kitware, Inc.)
E4S for ECP ST and AD teams

With the increasing complexity and diversity of the software stack and system architecture of high performance computing (HPC) systems, the traditional HPC community is facing a huge productivity challenge in software building, integration and deployment. Recently, this challenge has been addressed by software build management tools such as Spack that enable seamless software building and integration. Container based solutions provide a versatile way to package software and are increasingly being deployed on HPC systems. The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a curated, Spack-based, comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers (base, full featured for three GPU architectures) that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC and AI/ML users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. The tutorial will describe the community engagements and interactions that led to the many artifacts produced by E4S, and will introduce the E4S containers that are being deployed at the HPC systems at DOE national laboratories. Specifically, the participants will learn:

* How to install and use E4S containers.
* Using E4S Spack recipes on HPC systems and on cloud-based platforms.
* Using E4S build cache to speed up Spack based installs on bare-metal hardware and create optimized custom containers.
* Using the e4s-cl container launch to substitute container based MPI with the system MPI for efficient inter-node communication using the MPICH ABI and Wi4MPI for OpenMPI. The presenters will discuss the recent efforts and techniques to improve software integration and deployment for HPC platforms, and describe recent collaborative work on reproducible workflows focused on common vis/analysis operations and workflows of interest to application scientists (AD) and ST developers.

Mon, May 2Recording1.5Sameer Shende (UO)
Using Spack to Accelerate Developer Workflows

The modern scientific software stack includes thousands of packages, from C, C++, and Fortran libraries, to packages written in interpreted languages like Python and R. HPC applications may depend on hundreds of packages spanning all of these ecosystems. To achieve high performance, they must also leverage low-level and difficult-to-build libraries such as MPI, BLAS, and LAPACK. Integrating this stack is extremely challenging. The complexity can be an obstacle to deployment at HPC sites and deters developers from building on each other’s work.
Spack is an open source tool for HPC package management that simplifies building, installing, customizing, and sharing HPC software stacks. In the past few years, its adoption has grown rapidly: by end-users, by HPC developers, and by the world’s largest HPC centers. Spack provides a powerful and flexible dependency model, a simple Python syntax for writing package build recipes, and a repository of over 6,200 community-maintained packages. This tutorial provides a thorough introduction to Spack’s capabilities: installing and authoring packages, integrating Spack with development workflows, and using Spack for deployment at HPC facilities. Attendees will learn foundational skills for automating day-to-day tasks, as well as deeper knowledge of Spack for advanced use cases.

Mon, May 2Recording3.5Todd Gamblin (LLNL), Greg Becker (LLNL), Richarda Butler (LLNL), Tamara Dahlgren (LLNL), Adam Stewart (UIUC)
TAU Performance System

The TAU Performance System is a powerful and highly versatile profiling and tracing tool ecosystem for performance analysis of parallel programs at all scales. Developed over the last two decades, TAU has evolved with each new generation of HPC systems and supports performance evaluation of applications targeting GPU platforms from NVIDIA, Intel, and AMD. The tutorial will demonstrate how to generate profiles and traces with TAU using profiling interfaces including Kokkos, OpenACC, OpenMP Tools Interface, CUPTI (CUDA), OpenCL, Intel Level Zero, AMD Rocprofiler, and Roctracer. TAU's 3D profile browser, ParaProf and PerfExplorer tools will be used in the hands-on to analyze the profile data. The tutorial will also show integration with LLVM compiler toolchain, including the TAU plugin for selective instrumentation for compiler-based instrumentation at the routine and file level. It will also demonstrate TAU's use in container based environments using the Extreme-scale Scientific Software Stack (E4S) containers.
URLs: http://tau.uoregon.edu

Mon, May 21.5Sameer Shende (UO), Kevin Huck (UO)
AMD Software Tools for Exascale Computing

This tutorial will be a set of live sessions and tutorials demonstrating the capabilities of various software tools that support AMD CPU and GPU hardware. These tools will be available on OLCF's upcoming system, Frontier. The following Radeon Open Compute (ROCm) integrated tools will be presented in AMD HPC Tools session:

• HPCToolkit and DynInst
• TotalView
• Score-P – Vampir
• TraceCompass and Theia
• ARM Forge

Mon, May 2Recording3.5Bill Williams (TUD), Bert Wesarg, Giuseppe Congiu (ICL), Michel Dagenais (UDEM), Arnaud Fiorini (UDEM), Yoann Heitz (UDEM), John Mellor-Crummey (RU), Barton Miller (UW), Sameer Shende (UO), John Del-Signore (Perforce Software, Inc.), Beau Paisley, Louise Spellacy (Arm Ltd), Bill Burns (Perforce), Karlo Kraljic (HPE)
Performance Autotuning of ECP Applications with Gaussian Process-Based and Cloud Database-Enhanced GPTune Package

Agenda for the day: https://bit.ly/37MYJtf

GPTune is a multi-task, multi-fidelity, multi-objective performance autotuner, designed particularly for ECP applications that can involve many tuning parameters and require large core counts to execute. Compared to existing general-purpose autotuners, GPTune supports the following advanced features:
(1) dynamic process management for running applications with varying core counts, and reverse communication-style interface to support multiple MPI vendors, (2) incorporation of coarse performance models and hardware performance counters to improve the surrogate model, (3) multi-objective tuning of computation, memory, accuracy and/or communication, (4) multi-fidelity tuning to better utilize the limited resource budget, (5) effective handling of non-smooth objective functions, (6) reuse of existing performance data or performance models for transfer learning across machines and sensitivity analysis, and (7) checkpoints and reuse of historical performance data, (8) a shared autotuning database, where multiple users can share performance data.
GPTune has been applied to a variety of parallel (distributed-memory, shared-memory, and GPU-based) applications including math libraries: ScaLAPACK, PLAMSA, SLATE, SuperLU_DIST, STRUMPACK, Hypre and MFEM, fusion simulation codes: M3DC1 and NIMROD, and machine learning frameworks: CNN, GNN and kernel ridge regression.
In this tutorial, we will go over the basic workflow of GPTune including various algorithms, installation, application launching, and data analysis, as well representative examples demonstrating the above-mentioned and under-development features. Through hands-on exercises, the participants will learn how to run GPTune examples and apply GPTune to their own applications.

Mon, May 2Recording3.5Yang Liu (LBNL), Younghyun Cho (CAL), James Demmel (CAL), Sherry Li (LBNL)
Dense Linear Algebra and FFT Libraries: SLATE, MAGMA, heFFTe

This tutorial focuses on the SLATE and MAGMA linear algebra libraries and the heFFTe multidimensional FFT library. These libraries are designed to harness today's supercomputers with multicore CPUs and multiple GPU accelerators. The tutorial covers practical aspects in setting up matrices and calling the libraries in your application. No prior knowledge is required.

SLATE is a modern C++ library developed as part of ECP to replace ScaLAPACK for solving dense linear algebra problems on distributed systems. It supports multicore CPU nodes or hybrid CPU-GPU nodes, using MPI for communication and OpenMP tasks for node-level scheduling. It covers parallel BLAS, solving linear systems (LU, Cholesky, QR, symmetric indefinite), least squares, symmetric eigenvalue and SVD solvers. SLATE includes a ScaLAPACK compatible interface to aide transitioning existing applications.

MAGMA is a C library for accelerating dense and sparse linear algebra on a node using multiple GPUs. It covers a large part of LAPACK's functionality: LU, Cholesky, QR, symmetric indefinite, eigenvalue and singular value solvers. It has a batch component for solving many small problems in parallel, and tensor contractions for high-order finite element methods. The sparse component accelerates many iterative algorithms such as CG, GMRES, and LOBPCG. MAGMA also includes a Fortran interface.

The Highly Efficient FFTs for Exascale (heFFTe) library provides multidimensional Fast Fourier Transforms (FFTs) for Exascale platforms. FFTs are in the software stack for almost all ECP applications. The tutorial will cover heFFTe’s APIs, approach, performance, and use in applications. HeFFTe leverages existing FFT capabilities, including 1-D FFTs , FFTMPI, and SWFFT.

Mon, May 2Recording1.5Mark Gates (UT), Stan Tomov (UT)
Developing a Testing and Continuous Integration Strategy for your Team

A thorough and robust testing regime is central to the productive development, evolution, and maintenance of quality, trustworthy scientific software. Continuous integration, though much discussed, is just one element of such a testing regime. Most project teams feel that they could (and should) do a “better job” of testing. In many cases, designing and implementing a strong testing strategy can seem so daunting that it is hard to know where to start.

In this tutorial, which is aimed at those with beginner to intermediate levels of comfort with testing and continuous integration, we will briefly review the multiple motivations for testing, and the different types of tests that address them. We’ll discuss some strategies for testing complex software systems, and how continuous integration testing fits into the larger picture. Accompanying hands-on activities, available for self-study, will demonstrate how to get started with a very simple level of CI testing.

Mon, May 2Recording1.5Greg Watson (ORNL), David Rogers (ORNL)
Overview and Use of New Features in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers

SUNDIALS is a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers consisting of six packages which provide robust time integrators and nonlinear solvers. The suite is designed to be easily incorporated into existing simulation codes and to require minimal information from the user. The modular implementation allows users to easily supply their own data structures underneath SUNDIALS packages and to easily incorporate outside solver packages or user-supplied linear solvers and preconditioners. SUNDIALS consists of the following six packages: CVODE, solves initial value problems for ordinary differential equation (ODE) systems with linear multi-step methods; CVODES, solves ODE systems and includes sensitivity analysis capabilities (forward and adjoint); ARKODE, solves initial value ODE problems with explicit, implicit, or IMEX additive Runge-Kutta methods; IDA, solves initial value problems for differential-algebraic equation (DAE) systems with BDF methods; IDAS, solves DAE systems and includes sensitivity analysis capabilities (forward and adjoint); KINSOL, solves nonlinear algebraic systems.

This tutorial will include an overview of many of the new features added to SUNDIALS in the last 2-3 years, including greater GPU support especially for AMD and Intel GPUs, multirate time integrators, performance profiling and analysis, interfaces to new GPU-based solvers including some from MAGMA, OneMKL, and (soon) Gingko, a ManyVector capability, and a new more complex and scalable GPU-enabled demonstration program that shows use of many of these features. For these features, SUNDIALS capabilities and target uses will be presented followed by a discussion of user interfaces for the features. All discussions will include examples.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-831982.

Mon, May 2Recording1.5Carol S. Woodward (LLNL), Cody J. Balos (LLNL), David J. Gardner (LLNL), Daniel Reynolds (SMU)
OpenMP Tutorial for ECP

While most HPC developers have OpenMP experience, many are not familiar with the latest OpenMP features and how to use them in their codes. Modern OpenMP can be used in creative ways to map parallelism to current and emerging parallel architectures. Yet it is much broader and more complex than OpenMP was even just three years ago, and a significant amount of training is needed if programmers are to be able to exploit it fully.

We propose a tutorial that describes recent and new features of OpenMP, with a focus on those features that have proven important for ECP. We will cover emerging programming styles and best practices for early adopters so that they can efficiently program accelerators, orchestrate work across CPUs, GPUs and the network, and take advantage of the memory hierarchy.

The OpenMP 5.0 specification was released at SC’18 with exciting new functionality. OpenMP 5.1 was released at SC’20. OpenMP 5.2, a revision for clarifications in the Specification, was released at SC ‘21. Thus, it is important not only for developers to be aware of the current standard, and what is available today, but also what is coming next and what will be available in the exascale time frame.

Finally, we also want to use this opportunity to discuss with the application teams’ areas where they can influence the design choices and the open challenges that we haven’t solved yet but that we will address in the future.

The tutorial is expected to cover the following topics:
Overview of what is available today in OpenMP 5.1
An overview of the accelerator programming model
Examples on how to map data structures
How to exploit parallelism in the target regions
Latest features in OpenMP tasking
More Fortran examples and specific features
Hands-on exercises for OpenMP implementations: NVIDIA HPC SDK, IBM XL, HPE/Cray CCE, AMD ROCm, Intel oneAPI
How to use OpenMP target and tasking constructs to manage and orchestrate work and communication between the CPUs and accelerators and inter-node communication.
Some examples of success stories on how applications have used OpenMP to scale on leadership class systems.
Other uses: Using OpenMP with other frameworks like Raja/Kokkos
A deeper introduction to OpenMP 5.1 and preview of the latest features and advanced features in the new spec to manage memory for heterogeneous shared memory spaces, unified addresses/memories, deep copy, detach tasks, etc.

Mon, May 2Recording3.5Vivek Kale (BNL), Dossay Oryspayev (BNL), Johannes Doerfert (ANL), Thomas Scogland (LLNL), Colleen Bertoni (ANL)
Transfer learning and online tuning with PROTEAS-TUNE/ytopt

ytopt is a machine-learning-based search software package developed within the ECP PROTEAS-TUNE project. It consists of sampling a small number of input parameter configurations of a given kernel/library/application, evaluating them, and progressively fitting a surrogate model over the input-output space until exhausting the user-defined time or the maximum number of evaluations. In this tutorial, we will present ytopt’s new transfer learning and online tuning capabilities. The transfer learning in autotuning is useful to reduce the tuning time when dealing with related autotuning tasks. This is achieved by training a model to learn the promising regions of the search space in one autotuning setting and using the learnt model to bias the search for the related autotuning settings. The online tuning is useful when high-performing parameter configurations need to be generated with severe tuning budget constraints. To that end, in the offline mode, ytopt trains a model with the high-performing configurations on various representative inputs; in the online mode, given a new unseen input, the trained model is used to generate high-performing configurations directly without any search.

The 90 minutes tutorial will be divided into three 30 minutes slots: The first slot will be used to present the overview of the transfer learning and online tuning capabilities of the ytopt. Second slot will be used to present the transfer learning module using a representative autotuning example. The third slot will be used to present the online tuning module using a representative autotuning example.

Mon, May 2Recording1.5Prasanna Balaprakash (ANL), Xingfu Wu (ANL), Michael Kruse, Brice Videau (ANL), Paul Hovland (ANL), Mary Hall (UofU)
ECP Software for In Situ Visualization and Analysis

This 90-minute tutorial will cover ECP software for in situ visualization and analysis, which is an important capability for addressing slow I/O on modern supercomputers. The majority of the time will be spent on two products: Ascent and Cinema. For Ascent, an in situ library supported by ECP ALPINE, topics covered will include how to integrate into a simulation code, data model, build and linking issues, and an overview of capabilities. For Cinema, topics covered will include both examples of use cases and information on how to get started. Finally, we will give short presentations on other ECP software projects for in situ processing. One short presentation will be on Catalyst, which is the in situ form of the popular ParaView tool and which is also supported by ECP ALPINE. The other will be on VTK-m, which is a many-core visualization library used by Ascent and Catalyst, and can also be directly incorporated into simulation codes as well.

Mon, May 2Recording1.5Hank Childs (UO), Cyrus Harrison (LLNL), Terry Turton (LANL), David Rogers (ORNL), Kenneth Moreland, Berk Geveci (Kitware Inc.)
Automated Application Performance Analysis with Caliper, SPOT, and Hatchet

At LLNL, we have developed a workflow enabling users to automate application performance analysis and pinpoint bottlenecks in their codes. Our workflow leverages three open-source tools – Caliper, SPOT, and Hatchet – to provide a wholistic suite for integrated performance data. In this tutorial, we will provide an overview of each of the tools, and demonstrate how to profile your applications with Caliper, how to visualize your performance data in SPOT, and how to programmatically analyze your data with Hatchet.

Caliper is a performance analysis toolbox in a library. It provides performance profiling capabilities for HPC applications, making them available at runtime for any application run. This approach greatly simplifies performance profiling tasks for application end users, who can enable performance measurements for regular program runs without the complex setup steps often required by specialized performance debugging tools.

SPOT is a web-based tool for visualizing application performance data collected with Caliper. SPOT visualizes an application’s performance data across many runs. Users can track performance changes over time, compare the performance achieved by different users, or run scaling studies across MPI ranks. With a high-level overview of an application’s performance, users are also quickly able to identify data that they might be interested in analyzing in finer-grained detail.

Hatchet is a Python-based tool for analyzing and visualizing performance data generated by popular profiling tools, such as Caliper, HPCToolkit, and gprof. With Hatchet, users can write small code snippets to answer questions such as: What speedup am I getting from using the GPUs? Which portions of my code are scaling poorly? What differences exist in using one MPI implementation over another? To answer these questions, Hatchet provides operations (e.g., sub-selection, aggregation, arithmetic) to analyze and visualize calling context trees and call graphs from one or multiple executions.

Mon, May 2Recording1.5David Boehme (LLNL), Stephanie Brink (LLNL), Olga Pearce (LLNL)
The SENSEI Generic In Situ Interface – A Tutorial

SENSEI, which is part of the DOE Exascale Computing Project (ECP) Data-Vis SDK, is an open-source, generic in situ interface that allows parallel simulations or other data producers to code-couple to parallel third-party endpoints. These endpoints may be applications or tools/methods, including user-written code in C++ or Python and a growing class of data-intensive capabilities accessible through Python, capabilities such as AI/ML. Once a data producer is instrumented with the SENSEI interface, changing to a different endpoint is as simple as modifying an XML configuration file with a text editor. SENSEI fully supports configurations where the simulation and endpoint are run at differing levels of concurrency. SENSEI will manage the potentially tricky process of partitioning and moving data from the M producer ranks to the N consumer ranks. The movement of data may be bidirectional, which means that a simulation sends data to an endpoint and has access to results computed from the endpoint. While initially designed and implemented for analysis, visualization, and other data-intensive in situ tasks, SENSEI's design and implementation support the coupling of arbitrary code. This tutorial will present an overview of SENSEI, along with several hands-on/follow-along coding examples to introduce you to key capabilities.

Tue, May 3Recording1.5E. Wes Bethel (SFSU), Silvio Rizzi (ANL), Burlen Loring (LBNL)
Accelerating your application I/O with UnifyFS

UnifyFS is a user-level file system that highly-specialized for fast shared file access on HPC systems with distributed burst buffers. In this tutorial, we will present users with an introductory overview of the lightweight UnifyFS file system that can be used to improve the I/O performance of HPC applications. We will begin at a high level describing how UnifyFS works with burst buffers and how users can incorporate it into their jobs. Following this, we will dive into more detail on what kinds of I/O UnifyFS currently supports and what we expect it to support in the future. We will also discuss the interoperation of UnifyFS with HDF5 and MPI-IO.

Wed, May 4Recording1Michael Brim (ORNL), Cameron Stanavige (LLNL)
Tools for Floating-Point Debugging and Analysis: FPChecker and FLiT

With the proliferation of heterogeneous architectures, multiple compilers and programing models, numerical inconsistencies and errors are not uncommon. This tutorial will show how to debug numerical issues using two tools, FPChecker and FLiT. FPChecker performs detection of floating-point exceptions and other anomalies, reporting the location of such errors (file and line number) in a comprehensive report. It can check for errors in CPUs (via the clang/LLVM compiler) and in NVIDIA GPUs. FLiT identifies the location of code that is affected by compiler optimizations and can produce inconsistent numerical results with different compilation flags. The tutorial will provide hands-on exercises to demonstrate the tools.

Wed, May 4Recording.75Ignacio Laguna (LLNL), John Jacobson (UofU), Cayden Lund (UofU), Ganesh Gopalakrishnan (UofU)
FFTX: Next-Generation Open-Source Software for Fast Fourier Transforms

The FFTX project has two goals:
(1) The development of performance portable, open-source FFT software system for modern heterogeneous architectures (i.e. GPUs) to provide a baseline capability analogous to FFTW for CPU systems.
(2) The support applications-specific optimizations corresponding to integrating more of the algorithms into the analysis / code generation process.
Our approach is based on code generation using Spiral, an analysis and code generation tool chain for FFTs and tensor algebra algorithms; an FFTX user API implemented in standard C++; and a factored design that allows FFTX / Spiral to be more easily ported across multiple platforms.

In this tutorial we will describe the current progress of FFTX towards achieving goals (1) and (2). We will give an overview of the FFTX approach to obtaining performance portability; the current FFTX APIs for 1D and 3D FFTs on GPU systems, and how to use FFTX; and a discussion of the co-design process by which we undertaking to achieve our second goal.

Thu, May 5Recording1.5Phillip Colella (LBNL), Franz Franchetti (CMU), Patrick Broderick, Peter McCorquodale (LBNL)
Kokkos EcoSystem New Features Tutorial

This Kokkos Tutorial will teach attendees to leverage important new capabilities in the KokkosSystem. Attendees are assumed to have a basic understanding of the Kokkos EcoSystem, specifically this tutorial will not be an introduction to capabilities covered in the Kokkos lectures (https://kokkos.link/the-lectures).

First this tutorial will go into more detail of how the Kokkos Ecosystem now supports more asynchronous programming paradigms via execution space instances, which leverage concepts such as CUDA and HIP streams, and SYCL queues. Specifically we will demonstrate overlapping multiple kernels, as well as memory copy operations. We will also demonstrate integration of ExecutionSpace instances with KokkosKernels.

Furthermore, the KokkosKernels team will provide an introduction into using its math kernels from within a TeamPolicy parallel loops. The tutorial will also cover the use of the batched linear algebra interface in KokkosKernels.

Thu, May 51.5Christian Trott (SNL), Damien Lebrun-Grandie (ORNL), Siva Rajamanickam (SNL), Luc Berger-Vergiat (SNL)
ExaGraph: Combinatorial Methods for Enabling Exascale Applications

We are working on several applications in the domain of combinatorial scientific computing, spanning infection modeling, numerical analysis, and multi-omics analytics. Most of these applications are challenging to efficiently scale (both strong and weak cases) due to the irregular nature of computation required to process sparse inputs in distributed-memory. Therefore, it is vital to identify the trade-offs that can be exposed such that a user can select an option that can yield the desired performance at the expense of quality/accuracy. Also, a number of combinatorial applications are memory access intensive, and scaling up to larger inputs may require developing new algorithms or heuristics.

In this panel, the ExaGraph project team members across the DOE labs (PNNL, SNL and LBL), are going to discuss the cutting edge software/algorithmic innovations in the area of parallel combinatorial scientific computing. The topics of discussion include distributed-memory graph algorithms on extreme-scale architectures, scalable graph partitioning and designing efficient graph algorithms expressed via sparse linear algebra primitives.

Fri, May 61.5Sayan Ghosh (PNNL), Marco Minutoli (PNNL), Mahantesh M. Halappanavar (PNNL), Aydin Buluc (LBNL), Alex Pothen (PU), Erik G. Boman (SNL)
Updates to ADIOS 2.8: Storage and in situ I/O

As concurrency and complexity continue to increase on high-end machines, I/O performance is rapidly becoming a fundamental challenge to achieving exascale computing. To address this challenge, wider adoption of higher-level I/O abstractions will be critically important. The ADIOS I/O framework provides data models, portable APIs, storage abstractions, in situ infrastructure, and self-describing data containers. These capabilities enable reproducible science, allow for more effective data management throughout the data life cycle, and facilitate parallel I/O with high performance and scalability. In this tutorial, participants will learn about the ADIOS concept and APIs and we will show a pipeline of a simulation, analysis and visualization, using both storage I/O and in situ data staging. Participants will also learn how to use state-of-the art compression techniques with ADIOS. We will introduce the new BP5 file format and show new functionalities around handling many steps in a single dataset efficiently.

Finally, we will discuss how to use ADIOS on the LCFs for the best storage performance, in situ data analysis and code coupling. This is a short, 2-hour long tutorial that aims to teach the basic concepts, demonstrate the capabilities and provide pointers to interested parties to incorporate ADIOS into their science applications.

Fri, May 6Recording3.5Scott Klasky (ORNL), Norbert Podhorszki (ORNL)
Debugging and Performance Profiling for Frontier

This half-day tutorial will walk through practical examples of debugging and performance profiling MPI programs on Crusher, an OLCF computer available now with nodes identical to Frontier. The tutorial will cover the range of tools provided by both the HPE Cray Programming Environment and the AMD ROCm Platform. In particular, topics will include the following.

* Debugging
- Interpreting error messages.
- Finding errors using Abnormal Termination Processing and the Stack Trace Analysis Tool.
- Using runtime debug logs.
- Using an actual debugger on parallel GPU programs: gdb4hpc with rocgdb.
* Performance Profiling
- Building and running experiments with the Cray Performance Tool.
- Interpreting results with pat_report and Apprentice2.
- Profiling and tracing MPI programs with rocprof.
This tutorial is for ECP developers that will target Frontier. Attendees should already be familiar with parallelism on GPU-accelerated distributed-memory computers. The tutorial assumes a working knowledge of Linux and MPI.

Fri, May 6Recording3.5Trey White (HPE), Stephen Abbott (HPE), Constantinos Makrides (HPE)
Porting PETSc-based application codes to GPU-accelerated architectures

The Portable, Extensible Toolkit for Scientific Computation (PETSc) provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization. In this tutorial, we will demonstrate, through case studies, how to port PETSc application codes to perform the computations on GPUs and to understand the expected performance gains. We will present examples illustrating how users can employ their preferred GPU programming model for application “physics” code, independent of the GPU programming model used for PETSc solvers, and we will examine an end-to-end case study that combines finite element assembly (via the libCEED library) with PETSc algebraic solvers, all running on GPUs.

Fri, May 6Recording1.5Richard Mills (ANL), Mark F. Adams, Matthew Knepley (UB), Todd Munson, Barry Smith (Flatiron Institute), Junchao Zhang (ANL), Jed Brown (CU)
How To Leverage New MPI Features for Exascale Applications

To prepare for the upcoming Exascale supercomputers, an number of new features have been designed and implemented in MPI to better support Exascale. In this tutorial we will focus on partitioned communication, MPI Sessions, and Open MPI’s support for user-level threading. Partitioned communication in a new interface designed for efficient and performant data transfer when using multithreaded and accelerator-based communication. In this tutorial we will explain the new partitioned communication interface as it exists in MPI 4.0, present examples with the interface in use, present proposed and in development changes to these designs, and share insights based on experience converting codes to use partitioned communication. Next, we will present MPI sessions, a new initialization scheme that allows for multiple instances of MPI to be used concurrently in a single process. This feature enables scientific libraries to easily maintain data separation by initializing an independent ‘session’ of MPI. Additionally, we will present the recently added support of user-level threading libraries (ULT) significantly improves the interaction between Open MPI and threading runtimes that rely on cooperative multitasking. In this session we also showcase a hybrid application that relies on this functionality for correctness and performance and give insight into future development in this area.

Fri, May 6Recording1.5Matthew Dosanjh (SNL), Howard Prichard, W. Pepper Marts, Jan Ciesko (SNL)
Using Containers to Accelerate HPC

Within just the past few years, the use of containers has revolutionized the way in which industries and enterprises have developed and deployed computational software and distributed systems. The containerization model has gained traction within the HPC community as well with the promise of improved reliability, reproducibility, portability, and levels of customization that were previously not possible on supercomputers. This adoption has been enabled by a number of HPC Container runtimes that have emerged including Singularity, Shifter, Enroot, Charliecloud and others.

This hands-on tutorial aims to train users on the use of containers on HPC resources. We will provide a background on Linux containers, along with introductory hands-on experience building a container image, sharing the container and running it on a HPC cluster. Furthermore, the tutorial will provide more advanced information on how to run MPI-based and GPU-enabled HPC applications, how to optimize I/O intensive workflows, and other best practices. Users will leave the tutorial with a solid foundational understanding of how to utilize containers with HPC resources through Shifter and Singularity, as well as an in-depth knowledge to deploy custom containers on their own resources.

Fri, May 6Recording1.5Andrew Younge (SNL), Sameer Shende (UO), Shane Canon

2022 Annual Meeting Breakouts, BoFs, and Panels

TitleDateSession TypeMaterials (Slides and/or Recordings)Session Duration (Hours)Organizers/Speakers
Advances in Science and Engineering Enabled by ECP Applications I

Three ECP Application Development projects (GAMESS, Combustion-PELE, EXAALT) temporarily turn their focus away from code development and optimization to highlight recent scientific and engineering accomplishments in their respective domains. The motivating science drivers and the anticipated scientific impact of exascale computing on their application areas will be discussed.

Tue, May 3BreakoutRecording2Mark Gordon, Jackie Chen (SNL), Danny Perez (LANL)
CEED: High-Order Methods, Applications and Performance for Exascale

This session will present a progress update on the research and development activities in the CEED co-design center, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We will discuss the GPU developments in several components of the CEED software stack, including the MFEM, Nek, libCEED and libParanumal projects, and report performance and capability improvements in several CEED-enabled miniapps and applications on both NVIDIA and AMD GPU systems. We are interested in getting feedback from the whole community, so please join us if you'd like to learn more about GPU-performant high-order methods, or have questions for the CEED team.

Tue, May 3BreakoutRecording2Tzanio Kolev (LLNL), Paul Fischer (UIUC), Tim Warburton (VT), Misun Min (ANL), Noel Chalmers (AMD), Damon McDougall (AMD), Malachi Phillips (SNL), Jed Brown (CU)
ECP Broader Engagement Initiative

This session will begin with remarks by Dr. Julie A. Carruthers, Senior Science and Technology Advisor, Office of the Principal Deputy Director and Acting Director for the Office of Scientific Workforce Diversity, Equity, and Inclusion, DOE, who will speak about advancing policies and practices for promoting diversity, equity, and inclusion at DOE. We will then introduce the Broader Engagement Initiative, which has the mission of establishing a sustainable plan to recruit and retain a diverse workforce in the DOE HPC community by fostering a supportive and inclusive culture within the computing sciences at DOE national laboratories. We will describe key activities within three complementary thrusts: establishing a Workforce Development and Retention Action Group, creating accessible ‘Intro to HPC’ training materials, and establishing the Sustainable Research Pathways for High-Performance Computing (SRP-HPC) internship/mentoring program. We are leveraging ECP’s unique multi-lab partnership to work toward sustainable collaboration across the DOE community, with the long-term goal of changing the culture and demographic profile of DOE computing sciences. The session will include interactive (and fun!) activities so that attendees can learn more about the initiative’s goals and how each person can contribute to the success of the program.

Breakout discussion leaders:

William Godoy
Mark C. Miller
Slaven Peles
Damian Rouson
Terry Turton

Tue, May 3Breakout2Julie Carruthers (DOE), Ashley Barker (ORNL), Lois Curfman McInnes (ANL), Suzanne Parete-Koon (ORNL), Sreeranjani "Jini" Ramprakash (ANL), Mary Ann Leung (SHI), Daniel Martin (LBNL)
Facility Deployment of E4S at ALCF, OLCF, and NERSC

The Extreme Scale Scientific Software Stack (E4S) is a collection of open source software packages for development and running scientific software on HPC systems. E4S software collection is a subset of spack packages that are built and tested regularly. E4S software stack is available as a spack configuration (spack.yaml), container images in docker and singularity image format, a spack buildcache, and AWS AMI image.

OLCF, ALCF and NERSC have been deploying E4S software stack on their HPC systems. OLCF and NERSC have public facing documentation on how to use E4S stack at https://docs.olcf.ornl.gov/software/e4s.html and https://docs.nersc.gov/applications/e4s/. Amongst the three facilities we have deployed E4S on Spock, Summit, Crusher, Articus, Cori, and Perlmutter.

In this talk, we will present an update on the facility deployment process and how one goes about deploying the E4S stack at the facility. Finally, we will discuss best practices for deploying software stack and challenges when building large software stack such as E4S.

Tue, May 3Breakout1Shahzeb Siddiqui (LBNL), Jamie Finney (ORNL), Matt Belhorn (ORNL), Frank Willmore (ANL)
Checkpointing with VELOC: Challenges, Use Cases, Integration

Efficient checkpoint-restart is a capability that many ECP applications need. However, they rely on custom solutions that are burdened by many issues: (1) performance and scalability of the underlying parallel file system where the checkpoints need to be saved in a reliable fashion; (2) resource-efficiency concerns (e.g., extra memory overhead) needed to achieve high performance and scalability; (3) complexity of interacting with a heterogeneous storage hierarchy on future Exascale machines (e.g., many vendor-specific external storage options like burst buffers with custom APIs) that leads to loss of productivity. The ECP VeloC project aims to solve these issues through a specialized middleware that aims to deliver high performance and scalability with minimal overhead on resource utilization, while exposing a simple API that hides the complexity of interacting with a heterogeneous storage layer. Several ECP applications supported by VeloC and various other groups have been experimenting with VeloC in various checkpoint-restart scenarios tailored to their specific needs. This breakout session aims highlight the latest features available in VeloC and to share the experience VeloC users with the ECP community, in the hope of raising awareness of what benefits can be expected from VeloC and what is the best way to integrate with it.

Tue, May 3BreakoutRecording1Bogdan Nicolae (ANL), Salman Habib (ANL), Murat Keceli (ANL), Keita Terianishi (SNL)
The Programming Environments Landscape at the ASCR Facilities

The three ASCR computing facilities (ALCF, NERSC, and OLCF) are in the process of fielding their next generation of systems (Aurora, Perlmutter, and Frontier). All three will be based on GPU accelerators from different vendors, and those vendors are offering different “preferred” programming environments. However, there are a number of efforts underway to broaden the availability of those environments and provide an increased degree of commonality across the facilities to facilitate application portability.

This session will provide an integrated overview of the programming environments expected to be available at each of the facilities to give users a better sense of the tools that will be available to them, how they will be provided, and any special considerations in using them across different platforms.

Tue, May 3BreakoutRecording1David E. Bernholdt (ORNL), Jack Deslippe (LBNL), Scott Parker (ANL)
Everything Proxy Apps: Platform/Application Readiness and The FutureTue, May 3BreakoutRecording2Jeanine Cook (SNL), Vinay Ramakrishnaiah (LANL), Ramesh Balakrishnan, Saumil Patel (ANL), Jamaludin Mohd Yusof (LANL), Alvaro Mayagoitia (ANL)
Compression for Scientific Data for ECP applications (part 1)

Abstract: Large-scale numerical simulations, observations, and experiments are generating very large datasets that are difficult to analyze, store and transfer. Data compression is an attractive and efficient technique to significantly reduce the size of scientific datasets. The overall goal of the breakout session will be to present the technologies developed in ECP for compression of scientific data, to present success stories of lossy compression for ECP applications and discuss open questions and future directions. Specifically, this breakout session will review the state of the art in lossy compression of scientific datasets and discusses in detail the compression technologies developed in ECP: 1) the SZ, ZFP, MGARD compressors and 2) the Libpressio unified compression API, and the Z-checker and Foresight error analysis tools. The Breakout session will also present ECP application use cases in different science domains (e.g., Cosmology, Crystallography, Quantum Chemistry, Molecular Dynamics, and more). A large part of the presentations and discussion will be devoted to lossy compression error quantification, analysis, and understanding. The breakout session will examine how lossy compressors affect scientific data and posthoc analysis. The presentation will be given by compression technology and application experts. The break-out session will encourage discussions and interactions between the speakers and the audience.

Wed, May 4BreakoutRecording2Franck Cappello (ANL), Sheng Di (ANL), Robert Underwood (ANL), Peter Lindstrom (LLNL), Pascal Grosset (LANL), Ben Whitney, Chun Hong Yoon (SLAC), Danny Perez (LANL), Houjun Tang (LBNL), Qian Gong (ORNL), Katrin Heitmann (ANL)
Advances in Science and Engineering Enabled by ECP Applications II

Three ECP Application Development projects (WDMApp, ExaWind, ExaBiome) temporarily turn their focus away from code development and optimization to highlight recent scientific and engineering accomplishments in their respective domains. The motivating science drivers and the anticipated scientific impact of exascale computing on their application areas will be discussed.

Wed, May 4Breakout2Amitava Bhattacharjee (PNNL), Michael Sprague (NREL), Kathy Yelick (LBNL)
Preparing AMReX and Applications for Frontier and Aurora (part 1)

The goal of this breakout session is for the community of AMReX users and developers within ECP to discuss new features, plans for future development, and readiness for Frontier and Aurora. The session will start with a brief overview of the AMReX framework with an emphasis on new developments and integrations, followed by updates from several AMReX-supported application development projects within the ECP community. Status on the early access systems for Frontier and Aurora will be emphasized. The session will end with an informal discussion among participants, with the goal of identifying areas for future improvement in the AMReX framework, as well as areas for future collaboration and integration.

Wed, May 4BreakoutRecording2Andrew Myers (LBNL), Erik Palmer (MIT CCSE), Michael Zingale (SBU), Jean Sexton (LBNL), Axel Huebl (LBNL), Roberto Porcu (NETL), Akash Druv, Marc Day (LBNL), Jon Rood (NREL)
Compression for Scientific Data for ECP applications (part 2)

Abstract: Large-scale numerical simulations, observations, and experiments are generating very large datasets that are difficult to analyze, store and transfer. Data compression is an attractive and efficient technique to significantly reduce the size of scientific datasets. The overall goal of the breakout session will be to present the technologies developed in ECP for compression of scientific data, to present success stories of lossy compression for ECP applications and discuss open questions and future directions. Specifically, this breakout session will review the state of the art in lossy compression of scientific datasets and discusses in detail the compression technologies developed in ECP: 1) the SZ, ZFP, MGARD compressors and 2) the Libpressio unified compression API, and the Z-checker and Foresight error analysis tools. The Breakout session will also present ECP application use cases in different science domains (e.g., Cosmology, Crystallography, Quantum Chemistry, Molecular Dynamics, and more). A large part of the presentations and discussion will be devoted to lossy compression error quantification, analysis, and understanding. The breakout session will examine how lossy compressors affect scientific data and posthoc analysis. The presentation will be given by compression technology and application experts. The break-out session will encourage discussions and interactions between the speakers and the audience.

Wed, May 4BreakoutRecording2Franck Cappello (ANL), Sheng Di (ANL), Robert Underwood (ANL), Peter Lindstrom (LLNL), Pascal Grosset (LANL), Ben Whitney, Jullie Bessac (ANL), Katrin Heitmann (ANL), Chun Hong Yoon (SLAC), Danny Perez (LANL), Houjun Tang (LBNL), Qian Gong (ORNL)
Performance Tools for Emerging Exascale Platforms: Status and Challenges

The DOE’s emerging GPU-accelerated exascale platforms will have the potential to compute at enormous rates. To realize that potential, application developers will need to identify and ameliorate scaling bottlenecks and inefficiencies in their codes. Development of three performance tools has been funded as part of the Exascale Computing Project: Exa-PAPI, HPCToolkit, and TAU. In this breakout, the project teams will provide an update about the current status of these tools on ECP testbeds and the challenges ahead for these tools to meet the needs of application developers trying to harness the power of exascale platforms.

Thu, May 5BreakoutRecording2John Mellor-Crummey (RU), Xiaozhu Meng (RU), Heike Jagode (UT), Anthony Danalis (UT), Sameer Shende (UO)
NESAP Success Stories with ECP Apps

The NERSC Exascale Science Applications Program (NESAP) has partnered with 10 ECP applications teams as well additional ECP Software Technology projects to help prepare those teams for deployment on Frontier and Aurora utilizing the NERSC Perlmutter system as a waypoint. In this breakout section, we will hear from the leads of multiple NESAP-ECP collaborations on the outcomes of the readiness activities, science and performance results being achieved on Perlmutter as well as activities the collaboration is pursuing that target Frontier and Aurora directly.

Thu, May 5BreakoutRecording1Ronnie Chatterjee, Muaaz Awan (NERSC), Brandon Cook (LBNL), Rahul Gayatri, Neil Mehta (LBNL), Jack Deslippe (LBNL)
Preparing AMReX and Applications for Frontier and Aurora (part 2)

The goal of this breakout session is for the community of AMReX users and developers within ECP to discuss new features, plans for future development, and readiness for Frontier and Aurora. The session will start with a brief overview of the AMReX framework with an emphasis on new developments and integrations, followed by updates from several AMReX-supported application development projects within the ECP community. Status on the early access systems for Frontier and Aurora will be emphasized. The session will end with an informal discussion among participants, with the goal of identifying areas for future improvement in the AMReX framework, as well as areas for future collaboration and integration.

Wed, May 4BreakoutRecording2Andrew Myers (LBNL), Erik Palmer (MIT CCSE), Michael Zingale (SBU), Jean Sexton (LBNL), Axel Huebl (LBNL), Roberto Porcu (NETL), Akash Druv, Marc Day (LBNL), Jon Rood (NREL)
Performance of Particle Applications on Early Exascale Hardware

The ECP Co-design Center for Particle Applications (CoPA) addresses the challenges for particle-based applications to run on upcoming exascale computing architectures. Several applications from within CoPA will give updates on their progress and challenges preparing for Frontier, with emphasis on performance results from the OLCF Crusher testbed.

Wed, May 4BreakoutRecording2Stan Moore, Adrian Pope (ANL), Peter McCorquodale (LBNL), Michael Wall (LANL), Samuel Reeve (RONL), Aaron Scheinberg (Jubilee Development)
On-Ramp to ECP Software and Applications

This session will provide potential users of ECP software and applications an overview of obtaining, building and getting help with ECP packages. ST and AD leadership will describe their philosophy and approach to enabling wide use of ECP software and applications. After this, PIs from ST and AD projects will provide concrete examples of how their projects make it easy for those outside ECP to use their code. The session will end with a Q&A time for the audience to get more details and provide feedback.

Wed, May 4BreakoutRecording2Mike Heroux (SNL), Andrew Siegel (ANL), Pete Bradley (Raytheon Technologies)
The OpenMP Ecosystem in LLVM - Features, Functionality, and Flags

The LLVM/OpenMP environment has developed beyond a functional implementation of the OpenMP specification. Among the features introduced as part of ECP we find zero-overhead debugging capabilities (e.g., assertions), interoperability with CUDA code (device + host), link-time-optimizations (LTO), just-in-time (JIT) compilation, offloading to virtual and remote GPUs, and much more.

In this talk we will give an overview of the ecosystem, what is available, and how it is used. From past experience we know that many features are often unrecognized even though they would gravely benefit application developers and software technology teams alike. This talk will therefore not only introduce the existing capabilities but also give an outlook for what is to come, as well as ways for people to get in touch and stay informed.

Wed, May 4BreakoutRecording2Joseph Huber (ORNL), Johannes Doerfert (ANL), Shilei Tian (SBU), Michael Kruse, Giorgis Georgakoudis (LLNL)
Best Practices #somycodewillseethefuture with BSSw Fellows

Better Scientific Software does not happen without learning and implementing BETTER – Planning, Development, Performance, Reliability, Collaboration, and Skills. But how do we learn to be BETTER? The Better Scientific Software (BSSw) Fellowship program fosters and promotes practices, processes and tools to improve developer productivity and software sustainability of scientific codes. BSSw Fellows are chosen annually to develop a funded activity that promotes better scientific software, such as organizing a workshop, preparing a tutorial, or creating content to engage the scientific software community. Their expertise is diverse and they are often leaders in their own community. Learn from BSSw Fellows and discover how their work will help your scientific software project be BETTER.

Wed, May 4BreakoutRecording1Jeff Carver (UA), Amy Roberts (CU Denver)
SUNDIALS User Experiences

The goal of this breakout session is to help connect SUNDIALS software developers with stakeholders from ECP applications and software technology projects. The session will start with an overview of the SUNDIALS suite emphasizing new features. The session will continue with brief presentations from SUNDIALS application users, including Don Willcox of the AMReX team, Steve DeWitt on the ExaAM team, and Lucas Esclapez on the Pele team. The goals of this session will be:

1. Share successes and failures of SUNDIALS uses among applications
2. Provide an overview of SUNDIALS for ECP-AD projects interested in using the software
3. Discuss software technology collaborations and identify features needed for further ST collaborations

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-831964.

Wed, May 4BreakoutRecording2Carol S. Woodward (LLNL), David J. Gardner (LLNL), Stephen DeWitt (ORNL), Lucas Esclapez, Don Willcox
Applications Integration

In this breakout, members of the ECP application integration teams will provide an overview and update on applications activities at the facilities along with short talks on engagements with select applications development projects outlining some key technical achievements and lessons learned. This will be followed by a short discussion of applications integration focused topics, in which the participants can share their experiences on topics related to preparing applications for the pre-exascale and exa-scale platforms.

Thu, May 5BreakoutRecording1Scott Parker (ANL), Balint Joo, Deborah Bard (NERSC), Stephen Nichols (ORNL), Christopher J. Knight (ANL), Venkat Vishwanath, Murali Emani, John Gounley (ORNL), Yasaman Ghadar (ANL), Rahul Gayatri, Neil Mehta (LBNL), Abhishek Bagusetty (ANL), Hannah Ross, Matt Norman (ORNL)
US Federal Agency's Exascale Applications and Software Needs

This session will feature leaders from US Federal Agencies detailing their exascale needs and discussing ideas about working with ECP to move forward. This will be an opportunity for ECP projects to connect with potential users and understand how ECP software and applications might be used more widely.

The sessions will be structured as 3-5 short talks by agency leaders, followed by a roundtable discussion with ECP leadership.

Wed, May 4BreakoutRecording2Mike Heroux (SNL), Andrew Siegel (ANL), David Martin (ANL), Suzy Tichenor (ORNL), Jeff Durachta (NOAA/GFDL), Piyush Mehrotra (NASA), Frances Hill (DOD), Emily Greenspan (CBIT)
Crusher Early User Experiences

Crusher the Test and Development System has been made available to ECP Teams at the start of 2022 prior to the first Crusher Training Workshop on January 13. Since then we have had 2 Crusher Hackathons with more being planned for the future. This BoF session will provide a way to share some of the earlier Application and ST Project experiences with the wider ECP Community, by showcasing a mixture of ECP Teams chosen to represent a variety of different programming models and languages. We envisage 4 talks, of 10-12 minutes each followed by brief Q&A in the first hour and a brief panel discussion for 30 minutes at the end of the session with the panel comprised of the 4 presenters augmented with facilities and potentially Vendor Center of Excellence Staff.

Wed, May 4BoFRecording2Balint Joo
Updates and Roadmap for the SYCL Community

With an increased reliance on accelerators such as graphics processing units (GPUs), achieving performance portability across a diverse set of computing platforms is non-trivial. SYCL is an open, single-source, multi-vendor, multi-architecture high-level programming model builds on the underlying abstractions of OpenCL and embraces modern features of C++-17 standards. This BoF provides updates on the salient features of SYCL 2020 specifications, cross platform abstractions using SYCL for GPUs from Nvidia, AMD and Intel. GPU backend specification covering details about mapping of the platform, execution, memory models and interoperability will be discussed. Some potential proposals that can steer adaptation of the upcoming SYCL Next specifications will also be discussed.

Wed, May 4BoF1Thomas Applencourt (ANL), Nevin Liber (ANL), Kevin Harms (ANL), Brandon Cook (LBNL), David E. Bernholdt (ORNL), Abhishek Bagusetty (ANL)
Early Experiences Preparing for Aurora

As Argonne National Laboratory prepares for the arrival of Aurora, application and software developers are busy readying their codes to enable new scientific breakthroughs via advances in simulation, data, and learning capabilities. Throughout the development cycle, ALCF and Intel staff have been engaged with developers assisting with a myriad of tasks ranging from porting code to one of several GPU programming models, debugging issues in software and hardware, and helping to understand observed performance on the available Intel GPU testbeds. After a high-level overview of the Aurora system, presenters will discuss a range of topics focused on their experiences, lessons learned, and best-practices for preparing codes for Aurora and plans moving forward.

Wed, May 4BoFRecording2Christopher J. Knight (ANL), Yasaman Ghadar (ANL), Abhishek Bagusetty (ANL), Ye Luo (ANL), Timothy J. Williams
Early Experience of Application Developers With OpenMP Offloading

The next generation of supercomputers will consist of heterogeneous CPU+GPU systems from three different hardware vendors. To target these systems, OpenMP is one possible programming model that can be used to take advantage of the massive parallelism available on GPUs. OpenMP is a portable programming model which will be supported at all three DOE computing facilities, with the ability to run on GPUs from Intel, NVIDIA, and AMD. In this session, developers who are using OpenMP to target GPUs currently in their applications (spanning a variety of domains and including Fortran and C/C++) will give short talks about how they are using OpenMP, any lessons learned or best practices discovered, and any feedback they have for any of the current OpenMP implementations. From these case studies, attendees can learn which features of the OpenMP specification current developers are using, any specific strategies they employed, and best practices developed. We are also looking for feedback on important OpenMP 5.x features applications developers plan to use or are using on current implementations.

Wed, May 4BoF2JaeHyuk Kwack (ANL), Rahulkumar Gayatri (NERSC), Mauro Del Ben (LBNL), Austin Harris (ORNL), Reuben Budiardja (ORNL), Xingqiu Yuan (ANL), Jean-Luc Fattebert (ORNL), Ye Luo (ANL), John R. Tramm (ANL), Vivek Kale (BNL), Yun He (LBNL), Christopher Daley (LBNL), Yasaman Ghadar (ANL), Stephen Nichols (ORNL), Melisa Alkan (ISU), Buu Pham (AL), Dossay Oryspayev (BNL), Swaroop S. Pophale (ORNL), Tosaporn Sattasathuchana (ISU), Peng Xu (AL/ISU), Veronica Melesse Vergara (ORNL), Swen Boehm (ORNL), Nick Hagerty, Philip Thomas, Colleen Bertoni (ANL), Brandon Cook (LBNL), Meifeng Lin (BNL), Timothy J. Williams (ANL)
BSSw Fellowship BoF

This Birds-of-a-Feather session provides an opportunity to learn more about how BSSw Fellowship recipients and Honorable Mentions from 2018 to present are impacting better scientific software and engaging with the ECP community.

The Better Scientific Software (BSSw) Fellowship Program gives recognition and funding to leaders and advocates of high-quality scientific software. Since its launch in 2018, the BSSw Fellowship alums comprise a diverse community of leaders, mentors, and consultants to increase the visibility of those involved in scientific software production and sustainability in the pursuit of scientific discovery.

Wed, May 4BoFRecording1Elsa Gonsiorowski (LLNL), Hai Ah Nam (LBNL)
Stochastic Applications in HPC: Challenges and Solutions

Stochastic HPC applications face unique algorithmic and performance challenges. Parallel random number generation, frequent branching, non unit stride memory access patterns, load imbalance between threads, and reproducibility are just a few of the issues that stochastic applications can face when running on modern HPC architectures like GPUs. In this cross-cutting session, GPU application teams from a variety of scientific fields will give short presentations detailing the challenges they face stemming from stochastic methods and what strategies they are using to overcome them. While a particular emphasis is placed on GPU performance in many of the talks, some discussions will also cover more general issues faced by stochastic methods in HPC such as stochastic multiphysics convergence as well as pseudorandom number generation algorithms. Applications discussed will include: OpenMC, QMCPACK, HACC, Mercury, Imp, and Shift.

Thu, May 5BoFRecording2John R. Tramm (ANL), Ye Luo (ANL), Salman Habib (ANL), Michael McKinley (LLNL), Steven Hamilton (ORNL), Nick Frontiere (ANL)
NNSA/ATDM Application BoFThu, May 5BoF1.5Marianne Francois (LANL), Robert Rieben (LLNL), Chris Malone (LANL), Curtis Ober (SNL)
Panel on Sustainability of ECP Software and Applications

DOE’s expectation is that ECP software and applications will be used long past the end of the ECP project. To ensure this, ECP leadership is putting mechanisms in place to continue support, maintenance and updates for the long term. In this session, ECP leadership will share their philosophy toward long-term sustainability and discuss concrete mechanisms to ensure longevity. In addition, members of the NITRD High End Computing Interagency Working Group will share the group's analysis of sustainability work across a number of federal agencies. The session will be arranged as a panel with short talks by each of the speakers, followed by a moderated discussion with the speakers and audience.

Wed, May 4PanelRecording2Andrew Siegel (ANL), Mike Heroux (SNL), Hal Finkel (DOE), David Kepczynski (GE)
Application Experiences with Online Data Analysis and Reduction

This panel will present and discuss practical experiences with online data analysis and reduction methods from the perspectives of a range of ECP applications. Participants will hear about varied application motivations for online data analysis and reduction, the technologies that are being used to couple application components and to perform data reduction, and results achieved in different settings.

Wed, May 4PanelRecording1CS Chang (PPPL), Ian Foster (UC), Scott Klasky (ORNL), Axel Huebl (LBNL), Robert Jacob (ANL), Arvind Ramanathan (ANL)
Panel on Revisiting Predictions from the IESP (and other Exascale) Workshops

DOE is on the verge of deploying its Exascale systems, so what better time then now to revisit our Exascale predictions from a decade ago. In this panel, prominent members from the early Exascale reports will discuss their prediction hits, misses, and omissions. Our panelists will also venture their predictions for 2032. Audience questions are encouraged.

Tue, May 3Panel1.25Jeffrey Vetter (ORNL), Jack Dongarra (UT), Pete Beckman (ANL), Kathy Yelick (LBNL), Bob Lucas (Livermore Software Technology)
What Can Be Learned from Applying Team of Teams Principles to the ECP projects PETSc, Trillinos, xSDK, and E4S?

The ECP core mission is to develop a capable exascale computing ecosystem that accelerates scientific discovery and supports addressing critical challenges in energy, earth systems, materials, data, and national security. The very nature of this mission has drawn a wide range of talented and successful scientists with diverse backgrounds to work together in new ways toward this goal. In this breakout/panel session, we build on lessons learned after the “Team of Teams” and “Fostering a Culture of Passion and Productivity in ECP Teams” break-out sessions presented at past ECP Annual Meetings as well as the Collegeville 21 whitepapers, “The Community is the Infrastructure” and “Challenges of and Opportunities for a Large Diverse Software Team” to discuss the experiences of the PETSc, Trilinos, xSDK, and E4S communities as framed by the construct, “Team of Teams.” We consider how, why, and when each of these teams may or may not function as Teams of Teams and where the Team of Teams principles might provide a benefit to the projects. We present strategies centered around developing engaged and productive virtual software teams and offer a deeper dive into these communities. We explore how developing a capable exascale ecosystem depends on meeting technical, social, and cultural challenges.

Thu, May 5PanelRecording2Reed Milewicz (SNL), David Moulton (LANL), Miranda Mundt (SNL), Todd Munson, Elaine Raybourn (SNL), Benjamin Sims (LANL), Jim Willenbring (SNL), Greg Watson (ORNL), Ulrike Yang (LLNL)
National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo