By Scott Gibson
Many people who follow the activities of the US Department of Energy’s (DOE) Exascale Computing Project (ECP), may already be familiar with the co-design process. It targets crosscutting algorithmic methods that capture the most common patterns of computation and communication (known as motifs) in ECP’s science applications.
ECP has six co-design centers, one of which is CODAR: Co-Design for Online Data Analysis and Reduction at the Exascale. Ian Foster, CODAR’s principal investigator, is a guest on the Let’s Talk Exascale podcast. The interview took place this past November in Denver at SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis.
Foster has been a scientist at Argonne National Laboratory (ANL) for thirty years and is director of the lab’s Data Science and Learning Division. He was excited to work on the eight-processor parallel computers of the early 1990s and experience the progression from terascale to petascale capabilities. Now he’s ready to help bring in the next big technological advance—exascale computing.
“I’m interested in the methods required to allow those systems to be used for challenging scientific problems,” Foster said.
Although exascale supercomputers will be much more powerful than today’s systems and open the door to solve intractable science problems, the traditional approach of writing out all the data produced and analyzing it later won’t be possible.
“There just isn’t enough I/O capacity on those systems,” Foster said. “So we need to perform analysis online on the supercomputer as the simulation is being performed.”
Analyzing data in situ, or on the compute nodes, requires investigating new methods of data analysis and reduction and developing mechanisms to couple them with simulations. The CODAR team is addressing that need and integrating the new methods into ECP science applications.
The team works closely with different groups across ECP, including application teams facing data-analysis and reduction problems, groups developing various system software components, and people developing compression methods. They also collaborate with vendors to understand how to implement these methods efficiently on their computers. In addition, feedback transmits between CODAR and the vendors as they identify new requirements for specific areas such as system software.
CODAR is in a tight collaboration with ECP’s WDMApp project, which aims to use exascale computing to provide a whole device modeling capability for magnetically confined fusion plasmas.
“That’s a fascinating application project scientifically, certainly, but technically, they need to couple together multiple simulation components and then they have to do reduction and analysis online,” Foster said. “We’ve developed mechanisms that allow them to run all components online and reduce their data by two orders of magnitude, at this point.”
When compressing data produced by a simulation, the idea is to keep the parts that are scientifically interesting and toss those that are not. However, every application and, perhaps, every scientist, has a different definition of what “interesting” means in that context. So, CODAR has developed a system called Z-checker to enable users to monitor the compression method.
“It allows people to check the quality of a compression method from potentially dozens of different perspectives,” Foster said. “And we’re starting to see that technique being deployed pretty widely across different applications.”
Foster envisions high-performance computing simulations moving from the traditional mode of a single simulation running across an entire computer and outputting its data for later use to a far more dynamic world in which many components may be running simultaneously.
“Reduction and analysis are being performed at the same time as the simulation,” he said. “In some cases, many different simulations are being performed simultaneously, and we want to be the people who provide the infrastructure that will make that possible. So, we see ourselves as ushering in a new approach to scientific computing that we think will become the norm on exascale computers.”