The ECP’s software portfolio has a large collection of data management and visualization products that provides essential capabilities for compressing, analyzing, moving, and managing data. These tools are becoming even more important as the volume of simulation data that is produced grows faster than the ability to capture and interpret it.
Objective: Support efficient I/O and code coupling services
Exascale architectures will have complex, heterogeneous memory hierarchies, ranging from node-level caches and main memory all the way to persistent storage via the file system, that applications need to effectively achieve their science goals. At the same time, exascale applications are becoming more complex in their data flows, from multiscale and multiphysics simulations that need to exchange data between separate codes to simulations that invoke data analysis and visualization services.
Principal Investigators: Scott Klasky, Oak Ridge National Laboratory
Objective: Support efficient I/O, I/O monitoring and data services
Exascale applications generate massive amounts of data that need to be analyzed and stored to achieve their science goals. The speed at which the data can be written to the storage system is a critical factor in achieving these goals. As exascale architectures become more complex, with multiple compute nodes and accelerators and heterogenous memory systems, the storage technologies must evolve to support these architectural features.
Principal Investigators: Rob Ross, Argonne National Laboratory
Objective: Provide VTK- based scientific visualization software that supports shared memory parallelism
As exascale simulations generate data, scientists need to extract information and understand their results. One of the primary mechanisms for understanding these results is to produce visualizations that can be viewed and manipulated. The VTK-m project is developing and deploying scientific visualization software capable of efficiently using exascale architectural features, such as the shared-memory parallelism available on many-core CPUs and GPUs.
Principal Investigators: Ken Moreland, Oak Ridge National Laboratory
Objective: Develop two software products: VeloC checkpoint restart and SZ lossy compression with strict error bounds
Long-running large-scale simulations and high-resolution, high-frequency instrument detectors are generating extremely large volumes of data at a high rate. While reliable scientific computing is routinely achieved at small scale, it becomes remarkably difficult at exascale due to both an increased number of disruptions as the machines become larger and more complex from, for example, component failures and the big data challenge.
Principal Investigators: Franck Cappello, Argonne National Laboratory
Objective: Develop an efficient system topology and storage hierarchy-aware HDF5 and UnifyFS parallel I/O libraries
In pursuit of more accurate modeling of real-world systems, scientific applications at exascale will generate and analyze massive amounts of data. A critical requirement of these applications to complete their science mission is the capability to access and manage these data efficiently on exascale systems. Parallel I/O, the key technology behind moving data between compute nodes and storage, faces monumental challenges from new application workflows as well as the memory.
Principal Investigators: John Wu, Lawrence Berkeley National Laboratory; Suren Byna, Ohio State University
Objective: Deliver in situ visualization and analysis algorithms, infrastructure and data reduction of floating-point arrays
Computational science applications generate massive amounts of data from which scientists need to extract information and visualize the results. Performing the visualization and analysis tasks in situ, while the simulation is running, can lead to improved use of computational resources and reduce the time the scientists must wait for their results.
Principal Investigators: Jim Ahrens, Los Alamos National Laboratory