PETSc/TAO

Many application codes rely on high-performance mathematical libraries to solve the systems of equations generated during their simulation. Because these solvers often dominate the computation time of such simulations, these libraries must be efficient and scalable on the upcoming complex exascale hardware architectures for the application codes to perform well. The Portable Extensible Toolkit for Scientific Computations/Toolkit for Advanced Optimization (PETSc/TAO) project delivers efficient mathematical libraries to application developers for sparse linear and nonlinear systems of equations, time integration, and parallel discretization. It also provides libEnsemble to manage the running of large collections of related simulations necessary for numerical optimization, sensitivity analysis, and uncertainty quantification (the so-called “outer-loop”).

Project Details

Algebraic solvers, generally nonlinear solvers that use sparse linear solvers via Newton’s method, and integrators form the core computation of many scientific simulations. The PETSc/TAO is a scalable mathematical library that runs portably on everything from laptops to the existing high-performance machines. The PETSc/TAO project is extending and enhancing the library to ensure that it will be performant on exascale architectures, is delivering the libEnsemble tool to manage collections of related simulation for outer-loop methods, and is working with exascale application developers to satisfy their solver needs.

There are no scalable “black box” sparse  solvers or integrators that work for all applications or single implementations that work well for all scales of problem size. Hence, algebraic solver libraries provide a wide variety of algorithms and implementations that can be customized for the application and range of problem sizes at hand. The PETSc/TAO team is currently focusing on enhancing the PETSc/TAO library to include scalable solvers that efficiently use many-core and GPU-based systems. This work includes adding support for the range of GPUs that will be deployed and for the Kokkos performance portability layer, optimizing the team’s GPU-aware communications, implementing data structure optimizations to better use many-core and GPU-based systems, and developing algorithms that scale to larger concurrency and provide scalability to the exascale.

The availability of systems with over 100 times the processing power of today’s machines compels the use of these systems not just for a single simulation but rather within a tight outer loop of numerical optimization, sensitivity analysis, and uncertainty quantification. This requires the implementation of a scalable library to manage a dynamic hierarchical collection of running, possibly interacting, scalable simulations. The libEnsemble library directs such multiple concurrent simulations. In this area, the team is focused developing libEnsemble, integrating libEnsemble with the PETSc/TAO library, and extending the PETSc/TAO library to include new algorithms capable of using libEnsemble.

Principal Investigator(s):

Todd Munson, Argonne National Laboratory

Collaborators:

Argonne National Laboratory; Lawrence Berkeley National Laboratory

Progress to date

  • The PETSc/TAO team delivered PETSc/TAO version 3.19, which includes full support for the Kokkos version 4.0 performance portability layer and the Kokkos Kernels version 4.0 linear algebra, the CUDA-12 and HIP backends, and the GPU-aware message passing interface.  The release also includes experimental support for stream aware MPI from MPICH version 4.1.  The team also optimized GPU-aware communication via PetscSF, performed new pipeline Krylov implementations, and improved GPU support for its algebraic multigrid solver.
  • The team delivered libEnsemble version 0.9.3, which includes an option to run libEnsemble in central or distributed configurations, and updated tests, examples, and documentation.
  • The team also completed preliminary testing and benchmarking to confirm that the GPU back ends in PETSc/TAO are working correctly and delivering performance gains on Crusher, the Frontier test and development system.  The team continues to benchmark libEnsemble on these machines.
  • The PETSc/TAO team shares overlapping membership with the Exascale Computing Project (ECP) Center for Efficient Exascale Discretizations (CEED) co-design center, and together they are working closely on common issues, including the use of high-order matrix-free discretizations (libCEED) and scalable mesh management techniques. Additionally, the team is collaborating with the AMReX co-design center on using this work’s solvers from AMReX applications. PETSc/TAO/libEnsemble is currently used by at least nine software components being developed by the ECP application teams.
  • The PETSc/TAO and libEnsemble software are included in the xSDK numerical libraries software development kit and are distributed with the Extreme-Scale Scientific Software Stack (E4S) metapackage.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo