The National Nuclear Security Administration (NNSA) supports the development of open-source software technologies that are important to the success of national security applications and are externally impactful for the rest of the Exascale Computing Project (ECP) and the broader community. These software technologies are managed as part of a larger Advanced Simulation and Computing (ASC) portfolio, which provides resources to develop and apply these technologies to issues important to national security. The software technologies at Lawrence Livermore National Laboratory (LLNL) span programming models and runtimes (RAJA/Umpire/CHAI), development tools (Debugging @ Scale), mathematical libraries (MFEM), productivity technologies (DevRAMP), and workflow scheduling (Flux/Power).
The RAJA team is providing software libraries that enable application and library developers to meet advanced architecture portability challenges. The project goals are to enable writing performance portable computational kernels and coordinate complex heterogeneous memory resources among components in large integrated applications. The software products provided by this project are three complementary and interoperable libraries: RAJA provides software abstractions that enable C++ developers to write performance portable numerical kernels, Umpire is a portable memory resource management, and CHAI contains C++ “managed array” abstractions that enable transparent, on-demand data migration.
Flux, LLNL’s next-generation resource manager, is a key enabler for many science workflows. Flux provides key scheduling capabilities for complex application workflows, such as MuMMi, which is used in cancer research; uncertainty quantification; Merlin, which is used for large machine learning; recent COVID-19 drug design workflows; ECP ExaAM; and others. Flux is also a critical technology that enables the Rabbit I/O technology planned for El Capitan. Traditional resource managers, such as SLURM, lack the required scalability and the flexible resource model.
The Debugging @ Scale project provides an advanced debugging, code-correctness, and testing tool set for exascale. The current capabilities include STAT, a highly scalable lightweight debugging tool; Archer, a low-overhead OpenMP data race detector; ReMPI/NINJA, a scalable record-and-replay and smart noise injector for message passing interface; and FLiT/FPChecker, a tool suite for checking floating-point correctness.
The MFEM library is focused on providing high-performance mathematical algorithms and finite element discretizations to next-generation, high-order applications. This effort includes the development of physics enhancements in the finite element algorithms in MFEM and the MFEM-based BLAST Arbitrary Lagrangian-Eulerian code to support ASC mission applications and the development of unique unstructured adaptive mesh refinement algorithms that focus on generality, parallel scalability, and ease of integration in unstructured mesh applications
The DevRAMP team is creating tools and services that multiply the productivity of developers through automation. The capabilities include Spack, a package manager for high-performance systems that automates the process of downloading, building, and installing different versions of software packages and their dependencies, and Sonar, a software stack for performance monitoring and analysis that enables developers to understand how high-performance computers and applications interact. To deal with the complexity for packaging software for accelerated architectures, the Spack team has been focused on enhancing robustness through testing and has completely reworked the concretizer, which is the NP-complete dependency solver at the core of Spack. The new concretizer is based on answer set programming, which allows Spack to solve complex systems of first-order logic constraints to optimize users’ build configurations. Spack is the foundation of the ECP’s Extreme-Scale Scientific Software Stack and the delivery mechanism for all software in the ECP.