Solving Multiphysics Problems at Scale on Today’s Most Powerful Supercomputers

By Scott Gibson

The mathematical library development portfolio of the Software Technology (ST) research focus area of the US Department of Energy’s (DOE) Exascale Computing Project (ECP) provides general tools to implement complex algorithms. These algorithms are designed to scale up for supercomputers so that ECP teams can then use them to accelerate the development and improve the performance of science applications on DOE high-performance computing architectures. One of the projects in the portfolio, called Data Transfer Kit (DTK) and ArborX, is the subject of the latest episode of ECP’s podcast, Let’s Talk Exascale.

Stuart Slattery and Damien Lebrun-Grandie of Oak Ridge National Laboratory

From left, Damien Lebrun-Grandie and Stuart Slattery of Oak Ridge National Laboratory

The guests for this episode are Stuart Slattery and Damien Lebrun-Grandie of Oak Ridge National Laboratory (ORNL). Slattery is the originator of the DTK library and is ORNL principal investigator of the Co-Design Center for Particle Applications (CoPA). Lebrun-Grandie is lead developer of ArborX as well as a developer of DTK, CoPA, and a technology called Kokkos. The interview was conducted in Denver at SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, which took place this past November.

The DTK library transfers computed solutions between grids and geometries with different domain compositions on parallel accelerated architectures. ArborX is designed to provide performance-portable algorithms for geometric search for parallel-accelerated architectures. Both DTK and ArborX are open-source technologies available on GitHub.

“DTK and ArborX sit inside of a larger software technology project in ECP, centered at Oak Ridge National Laboratory, where we’re developing a variety of accelerated libraries to use at exascale,” Slattery said. In terms of the libraries, “accelerated” refers to the GPUs that are in most of DOE’s supercomputers today.

DTK was initiated as part of the Consortium for Advanced Simulation of Light Water Reactors, a DOE Energy Innovation Hub. The objective of DTK was to enable different physics applications that were designed to run at scale on the supercomputers and enable them to be coupled together to solve multiphysics science problems. Slattery described the coupling in multiphysics simulations as code A calculating something for code B and vice versa and that calculated information shared between codes on a regular basis.

Examples of multiphysics applications include coupling neutron physics with fluid dynamics to understand the operating behavior of nuclear reactors as well as the coupling of computational mechanics simulations with heat transfer calculations for the accurate modeling of 3D printing of metals.

Out of DTK’s ability to interpolate data between things such as two computational meshes or geometries came what Slattery referred to as “performance-portable technology.” “I mean an application which can run on the plethora of DOE architectures that we have, but do it at scale, and in a single codebase,” he added.

The DTK and ArborX team works within the Accelerated Libraries for Exascale (ALExa) project of ECP’s Mathematical Libraries effort. The team’s activities predominantly involve writing libraries that perform mathematical operations. So, rather than directly dealing with data input/output technologies, they provide tools that do in-memory mathematical operations and interpolations via tools such as the Message Passing Interface (MPI).

“Because we are called a data transfer kit library, people think we’re just transferring data, but we are not,” Lebrun-Grandie said. “We are actually doing some real mathematical work when we determine which parts of the computational meshes or geometries we are coupling may be in contact with each other.”

The large variety of computing architectures that the DTK and ArborX project must address is at once a big challenge and a source of opportunities. Adapting to different accelerators requires ingenuity with respect to programming for optimal performance and functionality. The library interfaces have to be designed to work efficiently with ECP applications and to enable execution at scale, Slattery said.

The DTK project had existed about 5 years before joining ECP when it started in 2016. The DTK team’s biggest innovation as a part of ECP has been the creation of ArborX, with its ability to provide all of the core pieces for searches and communication between the different nodes on a supercomputer.

“We realized if we were able to extract ArborX from DTK that a lot of people would want to use ArborX outside of the context of multiphysics, in which DTK lives,” Slattery said. “Damien and his team did a pretty significant performance-engineering campaign in ArborX.” They focused on getting the tool to run at scale on all of the different accelerated architectures and thus were able to make it valuable to a much wider set of users than would have been the case had it remained in DTK, he added.

The presence of DTK and ArborX on GitHub will serve to foster a legacy for the technology of enabling researchers to focus more on their science than on low-level algorithms. Slattery said this will be true not only for ECP applications but also for others.

Lebrun-Grandie said collaborations with others, such as the Kokkos team, provides a synergism that bridges different projects within ST and impacts a broad set of users through the open-source products.