Alpine/ZFP

Computational science applications generate massive amounts of data from which scientists must extract information and visualize the results. Performing the visualization and analysis tasks in situ while the simulation is running can result in the improved use of computational resources and reduce the time that the scientists must wait for their results. The ALPINE/ZFP project is delivering in situ visualization and analysis infrastructure and algorithms, including a data compression capability for floating-point arrays to reduce memory, communication, I/O, and offline storage costs.

Project Details

Many high-performance simulation codes write data to disk to visualize and analyze it after the simulation is completed. Given the exascale I/O bandwidth constraints, this process must be performed in situ to fully use the exascale resources. In situ data analysis and visualization selects, analyzes, reduces, and generates extracts from a scientific simulation while the simulation is running to overcome bandwidth and storage bottlenecks associated with writing the full simulation results to the file system. The ALPINE/ ZFP project produces in situ visualization and analysis infrastructure that will be used by the exascale applications along with a lossy compression capability for floating point arrays.

The ALPINE development effort focuses on delivering exascale visualization and analysis algorithms that will be critical for exascale applications; developing an exascale-capable infrastructure for in situ algorithms and deploying it into existing applications, libraries, and tools; and engaging with exascale application teams to integrate ALPINE with their software. This capability will leverage existing, successful software, ParaView/Catalyst, VisIt, and a new lightweight infrastructure, Ascent. ALPINE capabilities will be integrated into these infrastructures for deployment in exascale science codes to address exascale challenges.

Overcoming the performance cost of data movement is also critical. With deepening memory hierarchies and dwindling per-core memory bandwidth due to increasing parallelism, even on-node data motion causes significant performance bottlenecks and primary source of power consumption. The ZFP software is a floating-point array primitive that mitigates this problem by using very high-speed, lossy (but optionally error-bounded) compression to significantly reduce data volumes and I/O times. The ZFP development effort focuses on (1) extending ZFP to make it more readily usable in an exascale computing setting by parallelizing it on both CPU and GPU while ensuring thread safety, (2) providing bindings for multiple programming languages, (3) adding new functionality, (4) hardening the software and adopting best practices for software development, and (5) integrating ZFP with a variety of exascale applications, I/O libraries, and software tools.

Principal Investigator(s):

Jim Ahrens, Los Alamos National Laboratory

Collaborators:

Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, Argonne National Laboratory, Sandia National Laboratories, University of Oregon, University of Utah, University of Leeds, Kitware Inc.

Progress to date

  • The ALPINE/ZFP team continued to expand the functionality of its core infrastructures: Ascent, VisIt, and ParaView/Catalyst. This included adding a new Catalyst adaptor for easier data description, adding derived quantities to Ascent, developing Exascale Computing Project (ECP) continuous integration workflows, and improving performance. The current focus is on integrating with key ECP clients and porting to ECP proxy architectures.
  • The team is continuing to develop a suite of in situ algorithms and capabilities that produce reduced data abstracts and visualizations needed by exascale partners. These include statistical feature exploration, Lagrangian flow analysis, importance-driven sampling, task-based feature detection, scalable topology, and optimal viewpoint. Algorithms leverage VTK-m for cross platform portability and for integrating into ALPINE infrastructure.
  • The team developed new ZFP capabilities, including variable-rate CUDA compression support, an HIP backend, support for 4D arrays, and new C and Python application programming interfaces for interacting with ZFP’s C++ compressed-array classes. A significant code refactoring effort consolidated redundant code to simplify maintenance and development, allowing ZFP’s fixed-rate, variable-rate, and read-only array and view classes to share a common code base. There are also New/updated Spack, Conda, and PIP packages.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo