Reducing the Memory Footprint and Data Movement on Exascale Systems

04/16/20

By Scott Gibson

As computers have become increasingly powerful and capable of doing ever-greater computations per second, the technology for transferring data and moving it around the memory hierarchy has not kept pace. Consequently, one of the main hurdles to achieving high performance at exascale—the next big leap in computing—is determining how to limit data movement between processing cores, memory banks, distributed compute nodes, and offline storage media. An effort called ZFP: Compressed Floating-Point Arrays within the US Department of Energy’s (DOE) Exascale Computing Project (ECP) is directed at surmounting that challenge.

Peter Lindstrom of Lawrence Livermore National Laboratory

ZFP is a compressed representation of multidimensional floating-point arrays that are ubiquitous in high-performance computing. “It was designed to reduce the memory footprint and offline storage of large arrays as well as to reduce data movement by effectively using fewer bits to represent numerical values,” said Peter Lindstrom of Lawrence Livermore National Laboratory, who manages all activities related to the ZFP project, which has been folded into ECP’s ALPINE in-situ visualization project. Principal investigator of ALPINE/ZFP is James Ahrens of Los Alamos National Laboratory.

Lindstrom’s background is in scientific visualization and computer graphics; however, during the last ten to fifteen years, he has primarily focused on data compression and related techniques for reducing data movement.

He took time to share insights about ZFP on ECP’s podcast, Let’s Talk Exascale, during an interview this past November in Denver at SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis.

The ZFP project aims to provide a compressed multidimensional array primitive that can be used for computations. “To perform those computations, we oftentimes need random access to individual array elements,” Lindstrom said. “Doing that, coupled with data compression, is extremely challenging.”

While the ZFP team members are working with applications, they don’t necessarily know how loss through data compression might affect the simulation code and the subsequent data analysis. “So we need to develop or work closely with the applications to understand what kinds of things matter to them in terms of how to maintain accuracy,” Lindstrom said.

One of the key developments of the ZFP effort is what the team calls variable-rate compressed arrays. As data evolves during floating-point computations, it both shrinks and expands. “Maintaining where the information is stored when we’re having to shuffle it around and so forth is a great challenge, and that’s something that we’ve accomplished very recently,” he said. “It allows us to essentially provide much higher accuracy for the same bit budget, if you will.

“We have shown in one case that in some of these numerical computations we can provide six to seven orders of magnitude higher accuracy than conventional floating-point representations for the same, or even less, amount of storage. So that’s one of the big highlights of this project so far.”

Lindstrom said that because the hardware of the forthcoming exascale machines will be unprecedented, when ECP teams port codes to these unconventional systems they will have to ensure not only that the codes run but also that they do so efficiently. And he said that since many of the applications have been developed over the last several decades and were written in programming languages such as Fortran, they will have to be mapped to today’s popular languages, such as C or C++.

“So we’re having to develop new language bindings to make sure that they can take advantage of this from different programming languages as well as running on different kinds of accelerators like the GPUs that will be available to us,” Lindstrom said.

He said that over the next few years, the ZFP project will focus on completely parallelizing all components of ZFP and ensuring that it runs efficiently on the nation’s first exascale supercomputers—Aurora, at the Argonne Leadership Computing Facility, and Frontier, at the Oak Ridge Leadership Computing Facility (both coming 2021); and El Capitan, at Lawrence Livermore National Laboratory (coming in 2023). And he added that the team will develop new features in support of ECP applications and work toward filling existing gaps in ZFP functionality.

In Lindstrom’s view, the ZFP project’s enduring legacy will be its having opened new ways of doing science that were not available before.

He said ECP application teams have used ZFP to greatly reduce their input/output (I/O) dumps so that they can store data at much higher resolution than they otherwise could, which is an important advancement for severe weather simulations, for example.

Researchers at the University of Wisconsin who are doing high-resolution tornado simulations and want to analyze data at sub-second resolution don’t have the I/O bandwidth or storage to output the resulting data. “By using ZFP to reduce the data by one or two orders of magnitude, that actually enables new insight into these tornado events,” Lindstrom said. “So that’s one example of what data compression can do for you. It’s not always about reducing the data, but may be providing higher fidelity, higher resolution, higher accuracy, and so forth for the same amount of storage.”