The National Nuclear Security Administration (NNSA) supports the development of open-source software technologies that are important to the success of national security applications and are externally impactful for the rest of the Exascale Computing Project (ECP) and the broader community. These software technologies are managed as part of a larger Advanced Simulation and Computing (ASC) portfolio, which provides resources to develop and apply these technologies to issues important to national security. The software technologies at Lawrence Livermore National Laboratory (LLNL) span programming models and runtimes (RAJA/Umpire/CHAI), development tools (Debugging @ Scale), mathematical libraries (MFEM), productivity technologies (DevRAMP), and workflow scheduling (Flux/Power).
The RAJA team is providing software libraries that enable application and library developers to meet advanced architecture portability challenges. The project goals are to enable writing performance portable computational kernels and coordinate complex heterogeneous memory resources among components in large integrated applications. The software products provided by this project are three complementary and interoperable libraries: RAJA provides software abstractions that enable C++ developers to write performance portable numerical kernels, Umpire is a portable memory resource management, and CHAI contains C++ “managed array” abstractions that enable transparent, on-demand data migration.
Flux, LLNL’s next-generation resource manager, is a key enabler for many science workflows. Flux provides key scheduling capabilities for complex application workflows, such as MuMMi, which is used in cancer research; uncertainty quantification; Merlin, which is used for large machine learning; recent COVID-19 drug design workflows; ECP ExaAM; and others. Flux is also a critical technology that enables the Rabbit I/O technology planned for El Capitan. Traditional resource managers, such as SLURM, lack the required scalability and the flexible resource model.
The Debugging @ Scale project provides an advanced debugging, code-correctness, and testing tool set for exascale. The current capabilities include STAT, a highly scalable lightweight debugging tool; Archer, a low-overhead OpenMP data race detector; ReMPI/NINJA, a scalable record-and-replay and smart noise injector for message passing interface; and FLiT/FPChecker, a tool suite for checking floating-point correctness.
The MFEM library is focused on providing high-performance mathematical algorithms and finite element discretizations to next-generation, high-order applications. This effort includes the development of physics enhancements in the finite element algorithms in MFEM and the MFEM-based BLAST Arbitrary Lagrangian-Eulerian code to support ASC mission applications and the development of unique unstructured adaptive mesh refinement algorithms that focus on generality, parallel scalability, and ease of integration in unstructured mesh applications
The DevRAMP team is creating tools and services that multiply the productivity of developers through automation. The capabilities include Spack, a package manager for high-performance systems that automates the process of downloading, building, and installing different versions of software packages and their dependencies, and Sonar, a software stack for performance monitoring and analysis that enables developers to understand how high-performance computers and applications interact. To deal with the complexity for packaging software for accelerated architectures, the Spack team has been focused on enhancing robustness through testing and has completely reworked the concretizer, which is the NP-complete dependency solver at the core of Spack. The new concretizer is based on answer set programming, which allows Spack to solve complex systems of first-order logic constraints to optimize users’ build configurations. Spack is the foundation of the ECP’s Extreme-Scale Scientific Software Stack and the delivery mechanism for all software in the ECP.
Flux is a next-generation workload management framework for HPC. Flux maximizes scientific throughput by scheduling the scientific workloads as requested by HPC users.
The workload manager is responsible for efficiently delivering compute cycles of HPC systems to multiple users while considering their diverse resource types—e.g., compute racks and nodes, central and graphics processing units (CPUs and GPUs), multi-tiered disk storage.
Two technical trends are making even the best-in-class products significantly ineffective on exascale computing systems. The first trend is the evolution of workloads for HPC. With the convergence of conventional HPC with new simulation, data analysis, machine learning (ML), and artificial intelligence (AI) approaches, researchers are ushering in new scientific discoveries. But this evolution also produces computing workflows—often comprising many distinct tasks interacting with one another—that are far more complex than traditional products can sufficiently manage. Second, hardware vendors have steadily introduced new resource types and constraints into HPC systems. Multi-tiered disk storage, CPUs and GPUs, power efficiency advancements, and other hardware components have gained traction in an era in which no single configuration reigns. Many HPC architectures push the frontiers of compute power with hybrid (or heterogeneous) combinations of processors. The workload management software must manage and consider extremely heterogeneous computing resources and their relationships for scheduling in order to realize a system’s full potential.
The need to schedule work for modern workflows motivated the Exascale Computing Project (ECP) Flux project to create a highly scalable scheduling solution that supports high-performance communication and coordination across hundreds of thousands of jobs, which could not be accomplished with traditional HPC schedulers. Nearly all existing scheduling systems were designed when the workflows were much simpler. These problems have led users to develop their own ad hoc custom scheduling and resource management software or use tools that perform only workflow management or only scheduling.
Flux manages a massive number of processors, memory, GPUs, and other computing system resources—a key requirement for exascale computing and beyond – using its highly scalable fully scheduling and graph-based resource modeling approach.
Thus, Flux provides first-class support for job coordination and workload management, which avoided the legacy issue of groups individually developing and maintaining ad hoc management software. With Flux, a job script with multiple tasks submitted on a heterogeneous HPC system remains simple, requiring only a few more lines within the script.
Flux was the basis for a workflow of ECP ExaAM (ExaConstit) that improved throughput by 4×. The Flux-based ML drug design workflow was part of an SC20 COVID-19 Gordon Bell finalist submission.
See https://flux-framework.org/ for more examples of the impact of this flexible resource management framework.
Flux manages a massive number of processors, memory, GPUs, and other computing system resources. The ubiquity of these heterogenous architectures running at scale ensures a continues user base and active user community. Flux is also part of the ExaWorks SDK, which brings together four seed workflow technologies, specifically Flux, Parsl, RADICAL, and Swift/T.
Umpire is a portable library for memory resource management. It provides a unified, high-level API in C++, C, and Fortran for resource discovery, memory provisioning, allocation, transformation, and introspection.
The Exascale Computing Project (ECP) developed Umpire to target the porting issues faced by legacy codes. Other projects that address this porting issue include the ECP’s RAJA and Copy Hiding Application Interface (CHAI) projects. Where other performance portability frameworks may require a larger up-front investment in data structures and code restructuring, Umpire is noninvasive and allows codes to separately adopt strategies for loop parallelism, data layout tuning, and memory management. Legacy applications need not adopt all three at once; they can gradually integrate each framework at their own pace and with a minimal set of code modifications.
Umpire leverages the abstraction mechanisms available in modern C++ (C++11 and higher) compilers, such as lambdas, policy templates, and constructor/destructor patterns (e.g., resource acquisition is initialization) for resource management. The objective is to provide performance portability at the library level without special support from compilers.
Targeting this level of the software stack gives US Department of Energy developers the flexibility to leverage standard parallel programming models, such as CUDA and OpenMP, without strictly depending on robust compiler support for these APIs. If the necessary features are unavailable in compilers, then library authors need not wait for these programming models to be fully implemented. These libraries allow applications to work correctly and to perform well even if some functionality from OpenMP, CUDA, the threading model, and other models is missing.
This capability is possible because a flexible allocation process supports the Umpire user interface. The interface is the same regardless of which resource is housing the memory. Key operations are supported. For example, memory can be managed and viewed. An abstract operations interface is provided for modifying and moving data between Umpire and allocators, and custom algorithms can be applied with code-specific strategies.
Accessing the parallelism of the hardware platform is essential for achieving high performance on today’s hardware platforms.
Umpire is open source. It also provides user guides and tutorials.