Many simulation and data analysis codes need to solve sparse systems of equations. The high-fidelity simulations being solved by exascale application teams involve large-scale multiphysics and multiscale modeling problems that generate highly ill-conditioned and indefinite systems, for which iterative methods struggle. The STRUMPACK/SuperLU project is delivering robust and scalable factorization-based algorithms that are indispensable building blocks for solving these numerically challenging problems.

Project Details

Scalable factorization-based methods are important components in solvers for ill-conditioned and indefinite systems of equations that arise in many exascale applications. The STRUMPACK/SuperLU project is producing robust and scalable factorization-based algorithms that can be used as direct solvers or preconditioners for linear systems of equations.

The team is delivering factorization-based sparse solvers that encompass two widely used algorithm variants: the supernodal SuperLU library and the multifrontal STRUMPACK library. The team is also adding scalable preconditioning functionality to the STRUMPACK library via hierarchical matrix algebra. Both libraries are applicable to a large variety of application domains. These scalable libraries are being enhanced to ensure that they will be performant on pre-exascale and exascale architectures.

Both SuperLU and STRUMPACK can be used as stand-alone solvers. More importantly, for ECP applications, critical subsolvers are being used in the higher level solver libraries, such as coarse grid solvers in a multigrid solver, subdomain solvers in a domain decomposition solver, block diagonal preconditioners in a Krylov iterative solver, or general approximate factorization preconditioners for an iterative solver.

Principal Investigator(s):

Sherry Li, Lawrence Berkeley National Laboratory


Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, Stevens Institute of Technology

Progress to date

  • The team released SuperLU DIST version 6.4.0 with improvements in the strong scaling and threading performance of the triangular solve—up to 4 times faster than version 5.x on more than 4,000 cores. It includes a new software structure so that both real and complex versions can be used simultaneously in one application. The prerelease version 7.0 contains communication-avoiding 3D algorithms with sparse LU that is up to 27 times faster on 32,000 processes, and the sparse triangular solves up to 7 times faster on 12,000 processes. The additional 3–4× speedups were achieved when Nvidia GPUs were used.
  • The team released STRUMPACK version 5.0.0. It includes two new low-rank formats: Butterfly and Block Low Rank, which is used as distributed-memory preconditioners. It includes the improved scalability of the hierarchical matrix algorithms—the dense hierarchical matrix compression is up to 4.7 times faster on eight nodes of Cori-Haswell, whereas the hierarchical sparse factorization is up to 2.2 times faster on eight Cori-KNL nodes. It also includes GPU support using CUDA and SLATE, resulting in over a 10× performance improvement compared with CPU-only code running on four Summit nodes.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo