Exascale Machine Learning Technologies

Project Details

The ExaLearn Co-design Center continues to advance how artificial intelligence (AI) and machine learning (ML) are developed to run on the world’s fastest supercomputers. In addition to providing scalable AI and ML tools that enhance Exascale Computing Project (ECP) applications, the center is improving the efficiency and effectiveness of US Department of Energy (DOE) leadership-class computing resources and large-scale experimental user facilities. For its overall focus, ExaLearn has selected four classes of learning problems, specifically using ML to develop surrogate models, inverse solvers, control policies, and design strategies. Each class is demonstrated on a different ECP application area, employing a focused co-design process that targets common learning methods by using deep neural networks, kernel and tensor methods, decision trees, ensemble methods, graphical models, and reinforcement learning.

To understand the limitations posed by constraints related to application development costs, application fidelity, performance portability, scalability, and power efficiency, ExaLearn engages directly with developers of ECP hardware, system software, programming models, learning algorithms, and applications. These collaborations factor in the program’s current goals to:

  • reduce the development risk of ML software for ECP application teams by investigating crucial performance trade-offs related to implementing and applying learning methods in science and engineering,
  • produce high-performance implementations of learning methods,
  • enable simple, efficient integration of those methods with applications, and
  • contribute to the co-design of effective exascale applications, software, and hardware.

To replace costly simulation methods, ExaLearn is using the latest techniques found in generative adversarial networks and variational autoencoders to construct fast, accurate surrogate models. Initial applications have been in computational cosmology, replacing complex N-body and hydrodynamics algorithms with fast neural network emulators. Working with ML-based inverse solvers, ExaLearn is applying these inverse methods for solving problems to back out complex materials structure from neutron scattering data at Oak Ridge National Laboratory’s Spallation Neutron Source. In the areas of optimal control and steering of complex computer simulation workflows, ExaLearn is providing scalable ML software for various ECP applications.

EXARL, a software framework that enables exascale reinforcement learning for science and benchmarking, is demonstrating and testing an initial use case for the temperature control of block copolymer self-annealing in light source experiments on DOE Leadership Computing facilities. In the design area, ExaLearn is tailoring reinforcement learning algorithms with physics-aware ML algorithms to develop interpretable ML models for use with graph-based models of atomic/molecular structure (e.g., generating novel electrolyte molecules and water cluster models). ExaLearn’s design and control groups have also combined to create a reinforcement learning pipeline for graph-based networks.

To facilitate reproducible experiments by organizing and distributing data, ExaLearn has established and is populating a searchable catalog of ML training data. This system (https://petreldata.net/exalearn/) enables large quantities of data organized in forms suitable for training and testing ML models to be browsed, searched, and accessed at high speeds.

ExaLearn is building a software tool set that can be applied to multiple problems within the DOE mission space, use exascale platforms directly, and provide essential components to an exascale workflow. This AI/ML tool set does not replicate capabilities easily obtainable from existing, widely available packages and builds in domain knowledge (e.g., physics, chemistry, biology) wherever possible. Uncertainty is quantified in a predictive manner and is interpretable, reproducible, and based on solid mathematical methods.

Principal Investigator(s):

Francis Alexander, Brookhaven National Laboratory

Collaborators:

Brookhaven National Laboratory, Argonne National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Pacific Northwest National Laboratory, Princeton University, Sandia National Laboratories

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo