Alpine/ZFP

Computational science applications generate massive amounts of data from which scientists must extract information and visualize the results. Performing the visualization and analysis tasks in situ while the simulation is running can result in the improved use of computational resources and reduce the time that the scientists must wait for their results. The ALPINE/ZFP project is delivering in situ visualization and analysis infrastructure and algorithms, including a data compression capability for floating-point arrays to reduce memory, communication, I/O, and offline storage costs.

Summary

Visualizations are some of the easiest and most impactful ways to convey information and explore scientific data. These aren’t just limited to plots and graphs but include everything from weather radar maps to three dimensional models of molecules.

As scientific datasets get larger, there is a growing need to minimize the amount of data saved to disk. The Exascale Computing Project (ECP) developed a set of products—ALPINE and zfp—to address this need. zfp provides lossy data compression where the user can control the accuracy and the amount of compression. ALPINE provides a set of tools to do visualization and analysis on simulations in situ—in other words, as the simulation is running on the supercomputer.

Under ECP, ALPINE built upon and improved long-standing visualization applications that were previously developed as part of Department of Energy projects—ParaView, Catalyst, and VisIt—in addition to developing a new in situ application, Ascent. ALPINE analysis algorithms can be used to filter data, or to identify interesting features that require additional investigation and collect more data around these features around all in one go. This bypasses having to wait for a post hoc analysis before running another study.

Meanwhile, zfp provides a complementary capability. By integrating directly with data input and output tools, zfp can compress data as it is being written to disk. It can also compress data in memory while the simulation is running, rather than waiting to do so post hoc. These capabilities help make scientific simulations more efficient by enabling access to data at any given step.

ALPINE and zfp are both easily integrated with other ECP products and can be taken as-is or plugged into other applications, depending on the needs of each specific use case.

Open source and publicly released, ALPINE and zfp can both run on a variety of central processing units (CPU) and graphical processing units (GPU—a type of accelerator for improving computational efficiency). The two have been used for a diverse set of scientific applications, including earthquake simulations, studying electrons in a particle accelerator, and modeling a wind turbine farm. The teams anticipate the software will continue to be integrated and leveraged by the scientific community on this new generation of exascale computers.

Technical Discussion

Many high-performance simulation codes write data to disk to visualize and analyze it after the simulation is completed. Given the exascale I/O bandwidth constraints, this process must be performed in situ to fully use the exascale resources. In situ data analysis and visualization selects, analyzes, reduces, and generates extracts from a scientific simulation while the simulation is running to overcome bandwidth and storage bottlenecks associated with writing the full simulation results to the file system. The ALPINE/ ZFP project produces in situ visualization and analysis infrastructure that will be used by the exascale applications along with a lossy compression capability for floating point arrays.

The ALPINE development effort focuses on delivering exascale visualization and analysis algorithms that will be critical for exascale applications; developing an exascale-capable infrastructure for in situ algorithms and deploying it into existing applications, libraries, and tools; and engaging with exascale application teams to integrate ALPINE with their software. This capability will leverage existing, successful software, ParaView/Catalyst, VisIt, and a new lightweight infrastructure, Ascent. ALPINE capabilities will be integrated into these infrastructures for deployment in exascale science codes to address exascale challenges.

Overcoming the performance cost of data movement is also critical. With deepening memory hierarchies and dwindling per-core memory bandwidth due to increasing parallelism, even on-node data motion causes significant performance bottlenecks and primary source of power consumption. The ZFP software is a floating-point array primitive that mitigates this problem by using very high-speed, lossy (but optionally error-bounded) compression to significantly reduce data volumes and I/O times. The ZFP development effort focuses on (1) extending ZFP to make it more readily usable in an exascale computing setting by parallelizing it on both CPU and GPU while ensuring thread safety, (2) providing bindings for multiple programming languages, (3) adding new functionality, (4) hardening the software and adopting best practices for software development, and (5) integrating ZFP with a variety of exascale applications, I/O libraries, and software tools.

Summary

Technical Discussion

Principal Investigator(s)

Collaborators