Many application codes rely on high-performance mathematical libraries to solve the systems of equations generated during their simulation. Because these solvers often dominate the computation time of such simulations, these libraries must be efficient and scalable on the upcoming complex exascale hardware architectures for the application codes to perform well. The Portable Extensible Toolkit for Scientific Computations/Toolkit for Advanced Optimization (PETSc/TAO) project delivers efficient mathematical libraries to application developers for sparse linear and nonlinear systems of equations, time integration, and parallel discretization. It also provides libEnsemble to manage the running of large collections of related simulations necessary for numerical optimization, sensitivity analysis, and uncertainty quantification (the so-called “outer-loop”).
The Portable, Extensible Toolkit for Scientific Computation (PETSc) reflects a long-term investment in software infrastructure for the scientific community. The toolkit provides scalable solvers for nonlinear time-dependent differential and algebraic equations that includes numerical optimization. PETSc is also referred to as PETSc/TAO because it also contains TAO, the Toolkit for Advanced Optimization, software library.
The overall strategy for accelerator support in PETSc/TAO is based on a separation of concerns – a powerful computer science design principle that allows PETSc users who program in C/C++, Fortran, or Python the ability to employ their preferred GPU programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL. Support for GPU devices required innovative solutions as discussed in the paper “Toward Performance-Portable PETSc for GPU-based Exascale Systems”.
The ECP PETSc/TAO project added or augmented a plethora of capabilities in the toolkit. Work included:
Within the ECP, the PETSc/TAO team has been working with applications such as Chombo-Crunch, which addresses carbon sequestration, and the whole device model application (WDMApp) for fusion reactors. The US Department of Energy identified WDM as a priority for “assessments of reactor performance in order to minimize risk and qualify operating scenarios for next-step burning plasma experiments”.
The production-quality PETSc/TAO toolkit illustrates a time-proven use case in connecting users and developers, who together have added capabilities and adapted the software to the unforeseen and radical system architectures that have been installed in datacenters over a period of decades. To remain fresh, the project provides biannual releases.
PETSc/TAO provides an exemplary use case in how to incorporate good software practices that worthy of study as the PETSc/TAO effort has successfully navigated the HPC landscape to earn both funding and extensive community support over many generations of supercomputers. All three are essential and need to be taken as an object lesson for any aspiring software effort (e.g., good software practices, funding, and community support).
Algebraic solvers, generally nonlinear solvers that use sparse linear solvers via Newton’s method, and integrators form the core computation of many scientific simulations. The PETSc/TAO is a scalable mathematical library that runs portably on everything from laptops to the existing high-performance machines. The PETSc/TAO project is extending and enhancing the library to ensure that it will be performant on exascale architectures, is delivering the libEnsemble tool to manage collections of related simulation for outer-loop methods, and is working with exascale application developers to satisfy their solver needs.
There are no scalable “black box” sparse solvers or integrators that work for all applications or single implementations that work well for all scales of problem size. Hence, algebraic solver libraries provide a wide variety of algorithms and implementations that can be customized for the application and range of problem sizes at hand. The PETSc/TAO team is currently focusing on enhancing the PETSc/TAO library to include scalable solvers that efficiently use many-core and GPU-based systems. This work includes adding support for the range of GPUs that will be deployed and for the Kokkos performance portability layer, optimizing the team’s GPU-aware communications, implementing data structure optimizations to better use many-core and GPU-based systems, and developing algorithms that scale to larger concurrency and provide scalability to the exascale.
The availability of systems with over 100 times the processing power of today’s machines compels the use of these systems not just for a single simulation but rather within a tight outer loop of numerical optimization, sensitivity analysis, and uncertainty quantification. This requires the implementation of a scalable library to manage a dynamic hierarchical collection of running, possibly interacting, scalable simulations. The libEnsemble library directs such multiple concurrent simulations. In this area, the team is focused developing libEnsemble, integrating libEnsemble with the PETSc/TAO library, and extending the PETSc/TAO library to include new algorithms capable of using libEnsemble.