Many scientific applications must access scalable algorithms for efficiency, must pass data between different grids with different parallel distributions, or need reduced representations of high-dimensional data, for example, to optimize storage. The Accelerated Libraries for Exascale (ALExa) project is providing technologies to address these needs for exascale applications, including applications written in Fortran.

Project Details

Complex scientific applications might need to combine results from different computational grids to perform their required simulations, where each computational grid represents only part of the physics. Moreover, the simulations on each grid might be written in Fortran and require access to scalable solvers in C++. The ALExa project is developing four components to address these issues and enable applications to better use exascale systems: the Data Transfer Kit (DTK), ArborX, Tasmanian, and ForTrilinos.

The DTK provides the ability to transfer computed solutions between grids with different layouts on parallel accelerated architectures, enabling simulations to seamlessly combine results from different computational grids to perform their required simulations. The team is focused on adding new features needed by applications and ensuring that the library is performant on the pre-exascale and exascale architectures.

ArborX provides performance portable geometric search algorithms—such as finding all objects within certain distance (rNN) or a fixed number of closest objects (kNN)—similar to nanoflann and Boost.Geometry.Index libraries but in a high-performance computing environment. The team focuses on providing functionality required by other Exascale Computing Project projects (e.g., ExaWind and ExaSky), including clustering algorithms. ArborX is a required dependency of DTK.

Tasmanian provides the ability to construct surrogate models with low memory footprint, low cost, and optimal computational throughput, enabling optimization and uncertainty quantification for large-scale engineering problems, as well as efficient multiphysics simulations. The team is focused on reducing the GPU memory overhead and accelerating the simulation of the surrogate models produced.

ForTrilinos provides a software capability for the easy automatic generation of Fortran interfaces to any C/C++ library, as well as a seamless pathway for large and complex Fortran-based codes to access the Trilinos library through automatically generated interface code.

Principal Investigator(s):

Andrey Prokopenko, Oak Ridge National Laboratory

Progress to date

  • The team developed a performance-portable indexing structure based on bounding volume hierarchy, including support for accelerators (GPUs) and distributed computations (message passing interface). A novel approach to clustering data on GPU by using the DBSCAN algorithm was designed, resulting in a 150× speedup on a single NVIDIA V100 GPU over a serial baseline implementation for finding halos for a 36 million data points problem provided by the ExaSky project.
  • The ALExa team developed GPU-accelerated moving least squares and spline interpolation algorithms. These algorithms are used by the ExaAM project for the Multiphysics solution transfer between heat transfer, mechanics, and solidification codes to enable coupled simulations of additive manufacturing processes with metals.
  • The team enabled GPU-accelerated surrogate model simulations in TASMANIAN, developed new algorithms for asynchronous surrogate construction that exploit extreme concurrency, and demonstrated a 100× reduction of memory footprint in sparse representation of neutrino opacities for the ExaStar project.
  • The team developed a SWIG/Fortran tool that automatically generates Fortran object-oriented interfaces and necessary wrapper code for any given C/C++ interface, demonstrated advanced inversion-of-control functionality that allows a C++ solver to invoke user-provided Fortran routines, and used this tool to provide Fortran access to a wide variety of linear and nonlinear solvers in the Trilinos library.
  • All GPU-accelerated algorithms are implemented for both CUDA and HIP environments, ensuring performance portability to NVIDIA and AMD architectures.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo