By Scott Gibson
Hi, and welcome. This is where we explore the efforts of the Department of Energy’s (DOE’s) Exascale Computing Project (ECP)—from the development challenges and achievements to the ultimate expected impact of exascale computing on society.
As data prevalence, computational methods, and computer power have taken off in the last several years, so has machine learning, or ML, a subset of artificial intelligence. With ML, computers, software, and electronic devices can work in a manner similar to the human brain, performing natural language processing, making cars self-driving, personalizing our news feeds, and myriad other conveniences.
Moreover, ML technologies could be deeply significant in the realm of computational and experimental science and engineering, where breakthroughs that change our lives happen. ML is sure to play a part in the research that will be performed on the first exascale supercomputers deployed by DOE.
Not only are ML technologies creating inspiring new opportunity vistas for scientific discovery, but they also hold promise for the design and use of exascale computers themselves. Both HPC for ML and ML for HPC appear to be on the horizon.
ECP initiated the ExaLearn co-design center in 2018 to harness ML. ExaLearn’s main goal is to provide exascale ML software for ECP applications, other ECP co-design centers, and DOE experimental facilities and leadership-class computing facilities.
ExaLearn, a collaboration of multiple DOE labs, is led by Frank Alexander, deputy director of the Computational Science Initiative at Brookhaven National Laboratory. An article titled “Co-design Center for Exascale Machine Learning Technologies (ExaLearn” in The International Journal of High Performance Computing Applications by Alexander and ExaLearn collaborators provides an in-depth view of what ExaLearn is all about.
Our guest in this episode, Argonne National Laboratory materials scientist Logan Ward, is a researcher from the ExaLearn team. ExaLearn activities are varied. The team Logan is on does work involving intelligent workflows that use AI to decide which simulations to run. Examples of other types of ExaLearn research are making surrogates to operate in lieu of experiments and simulations, and reinforcement learning, which trains models to make a sequence of decisions.
We’ll get Logan’s perspective on the following: the broad context of the intersection of ML and HPC, ExaLearn’s different groups and main objectives, the specific work Logan’s involved in, challenges, successes, and more.
“The lines between what is machine learning and what is conventional HPC have blurred a lot. There are ways to use machine learning to tell you what simulations you should actually be running. There are ways of integrating machine learning directly into the simulation codes. And, all of these have really changed what it is to envision the high-performance computing applications of the future. There’s probably going to be machine learning at some level, and where ExaLearn comes in is in figuring out how that’s going to happen in a way that the broad HPC community can take into account.” —Logan Ward, assistant scientist at Argonne National Laboratory
(4:09) The broad context for the connection between ML and HPC
(5:35) A summary description of the ExaLearn Co-design Center
(6:50) Logan’s history with ExaLearn
(7:57) What the ExaLearn team of which Logan is a part is working on
(9:52) The challenges team has faced
(12:29) The team’s successes
(13:45) The implications of what’s been accomplished
(15:25) What about exascale excites Logan
(17:02) The next steps
- The ExaLearn page on the ECP website
- A technical talk about the ExaLearn work of Logan Ward and colleagues
- IEEE paper: “Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing”
- Article: “Co-design Center for Exascale Machine Learning Technologies (ExaLearn)” in the International Journal of High Performance Computing Applications
- Previous ECP podcast episode featuring ExaLearn—Episode 63: Delivering Exascale Machine Learning Algorithms and Tools for Scientific Research