We want to draw attention to a large class of linear systems arising in engineering design computations for which there are no adequate scalable linear solvers. We also want to share recent progress in this area achieved within Exascale Computing Project.
Nonlinear analyses in design computations are typically implemented as iterative methods that solve a linear system at each iteration. Solving the underlying linear system contributes to a major part of the overall computational cost and without proper parallel strategy may pose a computational bottleneck for the entire analysis.
Design computations for complex engineering systems typically generate extremely sparse and ill-conditioned linear systems. The nature of these systems makes it challenging to solve them efficiently in a fine-grain parallel fashion. Iterative linear solvers, which can be parallelized effectively, are not performing well on such ill-conditioned problems. Traditional supernodal and multifrontal parallel strategies used in direct linear solvers are not effective with extremely sparse systems, because dense blocks created by these methods are too small to take advantage of fast dense linear algebra. These challenges impede deployment of design computations to heterogeneous compute platforms, which use hardware accelerators such as GPUs. This was also a major obstacle for ExaSGD project whose objective was to perform optimal power flow analysis for transmission grids at exascale.
Collaboration between Software Technology and Application Development teams at Exascale Computing Project, driven by use cases from ExaSGD subproject, have identified several promising directions how to develop scalable linear solvers that can handle ill-conditioned and very sparse systems. A combination of sound ordering, static pivoting (re)factorization, and iterative refinement has delivered first meaningful speed-up (2x) of optimal power flow analysis on a GPU compared to the state-of-the-art CPU-based solvers. A hybrid direct-iterative linear solver showed 3-10x speedup on a GPU when applied on Karush-Kuhn-Tucker linear systems. Random matrix methods, such as random butterfly transform, provide alternative to pivoting and are suitable for fine-grain parallel implementations. Each of these approaches shows a promise and may lead to an efficient general-purpose solver that meets requirements of complex system design and runs efficiently on hardware accelerators. We will discus further mathematical research needed to address the needs of engineering design.
Ginkgo library has been essential for prototyping solutions for ill-conditioned and extremely sparse linear systems. It provides modular, flexible, and portable framework for implementing linear solvers. We will use solutions in Ginkgo library to motivate discussion on computational and software development solutions needed to support linear solvers development on emerging hardware platforms.
- Slaven Peles (Oak Ridge National Laboratory)
- Xiaoye Sherry Li (Lawrence Berkeley National Laboratory)
- Hartwig Anzt (University of Tennessee)
- Kasia Swirydowicz (Pacific Northwest National Laboratory)
- Jonathan Maack (National Renewable Energy Laboratory)