Exascale Machine Learning Technologies

Project Details

The ExaLearn Co-design Center continues to advance how artificial intelligence (AI) and machine learning (ML) are developed to run on the world’s fastest supercomputers. In addition to providing scalable AI/ML tools that enhance Exascale Computing Project (ECP) applications, the center is improving the efficiency and effectiveness of US Department of Energy (DOE) leadership-class computing resources and large-scale experimental user facilities. For its overall focus, ExaLearn selected four classes of learning problems, specifically using ML to develop surrogate models, inverse solvers, control policies, and design strategies. Each class is being demonstrated on a different ECP application area, employing a focused co-design process that targets common learning methods using deep neural networks, transformer methods, kernel and tensor methods, decision trees, ensemble methods, graphical models, and reinforcement learning.

To understand the limitations posed by constraints related to application development costs, application fidelity, performance portability, scalability, and power efficiency, ExaLearn has engaged directly with developers of ECP hardware, system software, programming models, learning algorithms, and applications. These collaborations have enhanced the program by:

  • Reducing the development risk of ML software for ECP application teams by investigating crucial performance trade-offs related to implementing and applying learning methods in science and engineering
  • Producing high-performance implementations of learning methods
  • Enabling simple, efficient integration of those methods with applications
  • Contributing to the co-design of effective exascale applications, software, and hardware.

To replace costly simulation methods, ExaLearn uses the latest techniques found in generative adversarial networks and variational autoencoders to construct fast, accurate surrogate models, notably in applications involving computational cosmology, replacing complex N-body and hydrodynamics algorithms with fast neural network emulators. Working with ML-based inverse solvers, ExaLearn is applying these inverse methods to solve problems to extract complex materials structure from neutron scattering data at Oak Ridge National Laboratory’s Spallation Neutron Source. In areas of optimal control and steering of complex computer simulation workflows, ExaLearn also provides scalable ML software for various ECP applications.

EXARL, a software framework that enables exascale reinforcement learning for science and benchmarking, is demonstrating and testing an initial use case for the temperature control of block copolymer self-annealing in light source experiments on DOE Leadership Computing facilities. In the design area, ExaLearn is tailoring reinforcement learning algorithms with physics-aware ML algorithms to develop interpretable ML models for use with graph-based models of atomic/molecular structure (e.g., generating novel electrolyte molecules and water cluster models). ExaLearn’s design and control groups also have created a reinforcement learning pipeline for graph-based networks.

To facilitate reproducible experiments by organizing and distributing data, ExaLearn has established and is populating a searchable catalog of ML training data. This system (https://petreldata.net/exalearn/) enables large quantities of data organized in forms suitable for training and testing ML models to be browsed, searched, and accessed at high speeds.

ExaLearn is succeeding in its goal to build a software tool set that can be applied to multiple problems within the DOE mission space, use exascale platforms directly, and provide essential components to an exascale workflow. This AI/ML tool set does not replicate capabilities easily obtainable from existing, widely available packages and builds in domain knowledge (e.g., physics, chemistry, biology) wherever possible. Uncertainty is quantified in a predictive manner and is interpretable, reproducible, and based on solid mathematical methods.

Principal Investigator(s):

Francis Alexander, Brookhaven National Laboratory

Collaborators:

Brookhaven National Laboratory, Argonne National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Pacific Northwest National Laboratory, Princeton University, Sandia National Laboratories

National Nuclear Security Administration logo Exascale Computing Project logo small U.S. Department of Energy Office of Science logo