DataLib

Exascale applications generate massive amounts of data that must be analyzed and stored to achieve their science goals. The speed at which the data can be written to and retrieved from the storage system is a critical factor for achieving these goals. As exascale architectures become more complex with multiple compute nodes and accelerators and heterogenous memory systems, the storage technologies must evolve to support these architectural features. The DataLib project is focused on three distinct and critical aspects of successful storage and I/O technologies for exascale applications: (1) enhancing and enabling traditional I/O libraries on pre-exascale and exascale architectures, (2) supporting a new paradigm of data services specialized for exascale codes, and (3) working closely with Facilities to ensure the successful deployment of their tools.

Technical Discussion

The ability to efficiently store data to the file system is a key requirement for all scientific applications. The DataLib project is providing both standards-based and custom storage and I/O solutions for exascale applications on upcoming platforms. The primary goals of this effort are to enable users of the Hierarchical Data Format 5 (HDF5) standard to achieve the levels of performance seen from custom codes and tools, facilitate the productization and porting of data services and I/O middleware using Mochi technologies, and continue to support application and Facility interactions by using DataLib technologies Darshan, Parallel Network Common Data Form (netCDF), and ROMIO.

HDF5 is the most popular high-level application programming interface (API) for interacting with the storage system on high-performance computers. The DataLib team is undertaking a systematic software development activity to deliver an HDF5 API implementation that achieves the highest possible performance on exascale platforms. By adopting the HDF5 API, the team is able to support the I/O needs of all the exascale applications that already use this standard.

The Mochi software tool is a building block for user-level distributed data services that addresses performance, programmability, and portability. The Mochi framework components are being used by multiple exascale library and application developers, and the team is engaging with them to customize data services for its needs.

Darshan, Parallel netCDF, and ROMIO also continue to be important storage system software components. DataLib is extending Darshan to cover emerging underlying storage, such as the Intel Distributed Asynchronous Object Store (DAOS); enhancing Parallel netCDF to meet Exascale Performance Computing (ECP) application needs; and making fundamental improvements in ROMIO to improve performance and address new requirements from underlying storage technologies, such as UnifyFS.

PRODUCT SUMMARY: ExaHDF5

The Motivation

The Hierarchical Data Format version 5 (HDF5) is the most popular high-level I/O library for scientific applications to write and read data files. The HDF Group released the first version of HDF5 in 1998. Over the past 20 years, numerous applications in scientific domains and in finance, space technologies, and many other business and engineering fields have used HDF5. It is the most used library for performing parallel I/O on existing high-performance computing (HPC) systems at US Department of Energy (DOE) supercomputing facilities.

Modern hardware has added more levels of storage and memory. Therefore, the ExaHDF5 team of the Exascale Computing Project (ECP) aimed to enhance the HDF5 library so that it can more effectively use the hardware capabilities of these new platforms. The ExaHDF5 team also added new capabilities to the HDF5 data management interface to support other formats and storage systems.

The Solution

To address changes in modern supercomputer storage systems and account for increases in the volume and complexity of application data, the ExaHDF5 team added new features to leverage commonly available capabilities in the latest exascale architectures.

The ExaHDF5 team has worked with numerous ECP application teams to integrate asynchronous I/O, subfiling in the Cabana framework, and caching. The Virtual Object Layer (VOL) and interoperability features with PnetCDF (Parallel Network Common Data Form) and ADIOS (Adaptable I/O Systems) data open up the rich HDF5 data management interface to science data stored in other file formats. The team created the dynamically pluggable Virtual File Drivers (VFD) feature, which allows the HDF5 file format to be extended to new sources and destinations of I/O, including GPU memory and cloud storage.

The Impact

The ExaHDF5 team has released these new features in HDF5 for broad deployment on HPC systems. The technologies developed to use the massively parallel storage hierarchies being built into pre-exascale and current exascale systems enhance performance and scalability. The enhanced HDF5 software has also been refactored to achieve efficient parallel I/O on these exascale (and other) systems. Owing to the popularity of HDF5, the greater versatility and performance will positively impact many DOE science and industrial applications.

The benefits span many applications, workloads, and users. NASA, for example, gives HDF5 software the highest technology readiness level (TRL 9), which is a category that contains actual systems that have been flight proven through successful mission operations. In the ECP ecosystem, numerous applications have depended on HDF5 for performing I/O.

Sustainability

The critical nature and broad user community ensure that ExaHDF5 will support application needs far into the future. HDF5 and other subproducts released by the ExaHDF5 team are now readily available via the Extreme-Scale Scientific Software Stack (E4S), making them available to any user wishing to use HDF5 on an HPC platform or cloud provider.

Additional Information and References

ExaHDF5 installation through E4S binaries, containers, or custom source code builds via Spack: https://e4s.io
HDF5 GitHub repository: https://github.com/HDFGroup/hdf5/
The HDF Group: https://www.hdfgroup.org/
Asynchronous I/O VOL connector: https://github.com/hpc-io/vol-async/
Cache VOL connector: https://github.com/hpc-io/vol-cache/
H5bench (HDF5 benchmark suite): https://github.com/hpc-io/h5bench/

PRODUCT SUMMARY: PnetCDF

The Motivation

PnetCDF (Parallel netCDF) is a high-performance parallel I/O library for writing multidimensional, typed data in the NetCDF portable format. PnetCDF is widely used in the weather and climate communities, among other fields, for its high performance and standard format. Both computing facilities and vendors recognize PnetCDF’s role in computational science and provide it as a prebuild module. The ECP included PnetCDF in the E4S software collection due to its importance.

Libraries such as HDF, ROMIO, and PnetCDF have a significant history with over a decade of production use, yet significant changes were needed to address the scale, heterogeneity, and latency requirements of upcoming applications. This motivated the targeting of specific use cases in pre-exascale systems, such as E3SM that require output at scale using the netCDF-4/PIO/PnetCDF preferred code path. The PnetCDF team continues to address concerns, to maintain portability and performance, and may develop new capabilities as needs arise.

The Solution

The ECP PnetCDF development has taken place over the past 5 years with a level of effort sufficient to demonstrate performance and scalability as an integrated software component on new HPC platforms. The project has focused on ensuring broad use, along with assisting in performance debugging on new platforms and augmented existing implementations to support new storage models (e.g., burst buffers).

PnetCDF can be directly called by applications and via the ROMIO MPI-IO implementation, or indirectly as an integrated part of popular libraries such in HDF5 and netCDF-4. Under the ECP, PnetCDF has developed several new capabilities:

Significant performance enhancements to both PnetCDF itself and the ROMIO MPI-IO implementation.
New “burst buffer” feature to use novel storage hierarchies.
Interoperability with HDF5 through a VOL component allowing HDF5 applications to read NetCDF formatted datasets.

Capability Development and Approach

The PnetCDF project followed two integration approaches. First, as a portable and widely used library, PnetCDF is routinely installed by facilities as part of the default software environment. PnetCDF’s facility and vendor partners recognize its utility to multiple applications. With PnetCDF’s focus on portability and minimal external dependencies, operators can typically install it with only a small amount of time and effort. Second, PnetCDF has a long track record of responsiveness and community involvement.

Consistent with this approach, PnetCDF has integrated with the following clients:

Characterizing PnetCDF in ECP applications via Darshan. Darshan is a lightweight I/O characterization tool that captures concise views of HPC application I/O behavior.
Integration into the Frontier exascale supercomputer software stack.
Integration into the Sunspot software stack (as proxy for the Aurora exascale supercomputer).
Integration into E3SM, which is a fully coupled, state-of-the-science Earth system model capable of global high-definition configuration. The E3SM code has long relied on PnetCDF.

The Impact

The impact can be recognized via these integration efforts:

Darshan: The Darshan integration project enabled a much richer data capture capability via an enhanced PnetCDF module implementation. In particular, the DataLib team implemented fine-grained metrics for data I/O using the PnetCDF format and associated library. This provided important performance insight to users of PnetCDF, which is a frequently used data format in the ECP ecosystem, particularly for atmospheric modeling.
Frontier: Facilities and their users value PnetCDF highly enough to include it in the software environment on the Frontier platform. This integration deploys PnetCDF on Frontier, making it available to ECP clients and to clients outside and beyond the ECP.
PnetCDF also provides a test suite for verifying previously installed versions of itself. The team documented correct behavior with the installed PnetCDF library but also identified (and worked around) an issue in the Lustre file system on Frontier. The PnetCDF validation suite packages the 105 (as of this writing) C and Fortran tests to create and read NetCDF datasets under various conditions. The suite targets these tests at existing installations of PnetCDF. The tests check whether PnetCDF functions run and compare data with known-good values.
Sunspot: Similar to the benefits for Frontier using proxy hardware for the Aurora exascale supercomputer.
E3SM: E3SM developers regularly run climate experiments on cutting-edge platforms. They are now running on Frontier using PnetCDF as the default module.

Sustainability

The integration efforts demonstrate that PnetCDF will continue to be used successfully by application and library teams.

Darshan: the new PnetCDF capabilities have been integrated into the Darshan repository as of version 3.4.1, and they can be included in a Darshan build using standard Spack parameters. Since PnetCDF is a heavy-weight library, it is appropriate to have this capability integrated as an optional extension, which preserves the option for a leaner and simplified Darshan build.
Frontier: Extensive performance evaluation has been reported on Frontier supercomputer at scale, comparing performance of HDF VOL against PnetCDF and the traditional HDF IO system. The PnetCDF Darshan capabilities are accessed via an lmod module. Multiple versions are available, which demonstrates continued update and deployment activities.
Sunspot: As with Frontier, the PnetCDF Darshan capabilities are accessed via an lmod module. The software passed the PnetCDF correctness testing, indicating it represents a viable capability.
E3SM: By making the PnetCDF integration a default I/O on Frontier, the team has the potential to get adoption from diverse user communities on Frontier. In addition, the performance data being automatically ingested into PACE enables the possibility of continuous performance improvement in a way transparent to the users.

For more information and references:

PnetCDF can be installed though E4S binaries, containers, or via custom source code builds via SPACK: https://e4s.io.
http://cucis.ece.northwestern.edu/projects/PnetCDF .
Project GitHub repository: https://github.com/Parallel-NetCDF/PnetCDF/
Project website: https://parallel-netcdf.github.io/

See the ECP DataLib project for more data related ECP projects. An overview can be found in “Datalib Innovates with Some Of The Most Successful Storage and I/O Software In The DOE Complex”.

Technical Discussion

PRODUCT SUMMARY: ExaHDF5

The Motivation

The Solution

The Impact

Sustainability

Additional Information and References

PRODUCT SUMMARY: PnetCDF

The Motivation

The Solution

Capability Development and Approach

The Impact

Sustainability

For more information and references:

Principal Investigator(s)

Collaborators