Many simulation and data analysis codes need to solve sparse systems of equations. The high-fidelity simulations being solved by exascale application teams involve large-scale multiphysics and multiscale modeling problems that generate highly ill-conditioned and indefinite systems, for which iterative methods struggle. The STRUMPACK/SuperLU project is delivering robust and scalable factorization-based algorithms that are indispensable building blocks for solving these numerically challenging problems.

Project Details

Scalable factorization-based methods are important components in solvers for ill-conditioned and indefinite systems of equations that arise in many exascale applications. The STRUMPACK/SuperLU project is producing robust and scalable factorization-based algorithms that can be used as direct solvers or preconditioners for linear systems of equations.

The team is delivering factorization-based sparse solvers that encompass two widely used algorithm variants: the supernodal SuperLU library and the multifrontal STRUMPACK library. The team is also adding scalable preconditioning functionality to the STRUMPACK library via hierarchical matrix algebra. Both libraries are applicable to a large variety of application domains. These scalable libraries are being enhanced to ensure that they will be performant on pre-exascale and exascale architectures.

Both SuperLU and STRUMPACK can be used as stand-alone solvers. More importantly, for ECP applications, critical subsolvers are being used in the higher level solver libraries, such as coarse grid solvers in a multigrid solver, subdomain solvers in a domain decomposition solver, block diagonal preconditioners in a Krylov iterative solver, or general approximate factorization preconditioners for an iterative solver.

Principal Investigator(s):

Sherry Li, Lawrence Berkeley National Laboratory


Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, Stevens Institute of Technology

Progress to date

  • The team released SuperLU_DIST version 8.1.2. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. The communication-avoiding 3D algorithms with sparse LU that is up to 27 times faster on 32,000 processes, and the sparse triangular solves up to 7 times faster on 12,000 processes. The mixed-single-double-precision solver achieves up to 1.65x speedup over the pure double-precision solver. The details of the new release is documented in this paper:
  • The team released STRUMPACK version 7.0.1. The new features include (1) multi-GPU support for both NVIDIA GPUs and AMD GPUs, (2) a composite multifrontal solver that employs the HODBF format for large sized fronts, a reduced-memory version of the non-hierarchical Block Low-Rank format for medium sized fronts and a lossy compression format for small sized fronts. This allows us to solve sparse linear systems of dimension up to 2.7× larger than before and leads to a memory consumption that is reduced by 70% while ensuring the same execution time, and (3) leveraging MAGMA’s variable-size dense LU on GPU for the frontal matrix factorizations achieves up to 6.9x speedup on AMD MI100 GPU. The details of this result is documented in this paper:

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo