DataLib

Exascale applications generate massive amounts of data that must be analyzed and stored to achieve their science goals. The speed at which the data can be written to and retrieved from the storage system is a critical factor for achieving these goals. As exascale architectures become more complex with multiple compute nodes and accelerators and heterogenous memory systems, the storage technologies must evolve to support these architectural features. The DataLib project is focused on three distinct and critical aspects of successful storage and I/O technologies for exascale applications: (1) enhancing and enabling traditional I/O libraries on pre-exascale and exascale architectures, (2) supporting a new paradigm of data services specialized for exascale codes, and (3) working closely with Facilities to ensure the successful deployment of their tools.

Project Details

The ability to efficiently store data to the file system is a key requirement for all scientific applications. The DataLib project is providing both standards-based and custom storage and I/O solutions for exascale applications on upcoming platforms. The primary goals of this effort are to enable users of the Hierarchical Data Format 5 (HDF5) standard to achieve the levels of performance seen from custom codes and tools, facilitate the productization and porting of data services and I/O middleware using Mochi technologies, and continue to support application and Facility interactions by using DataLib technologies Darshan, Parallel Network Common Data Form (netCDF), and ROMIO.

HDF5 is the most popular high-level application programming interface (API) for interacting with the storage system on high-performance computers. The DataLib team is undertaking a systematic software development activity to deliver an HDF5 API implementation that achieves the highest possible performance on exascale platforms. By adopting the HDF5 API, the team is able to support the I/O needs of all the exascale applications that already use this standard.

The Mochi software tool is a building block for user-level distributed data services that addresses performance, programmability, and portability. The Mochi framework components are being used by multiple exascale library and application developers, and the team is engaging with them to customize data services for its needs.

Darshan, Parallel netCDF, and ROMIO also continue to be important storage system software components. DataLib is extending Darshan to cover emerging underlying storage, such as the Intel Distributed Asynchronous Object Store (DAOS); enhancing Parallel netCDF to meet Exascale Performance Computing (ECP) application needs; and making fundamental improvements in ROMIO to improve performance and address new requirements from underlying storage technologies, such as UnifyFS.

Principal Investigator(s):

Rob Ross, Argonne National Laboratory

Collaborators:

Argonne National Laboratory, Los Alamos National Laboratory, Northwestern University

Progress to date

  • A new Darshan module that supports the capture of basic HDF5 call characterization was implemented and is being tested. Initial results demonstrate that HDF5 overheads can be captured within typical HDF5 scenarios. This new capability facilitates the more rapid performance debugging of ECP applications by using HDF5.
  • A performance analysis of I/O behavior for the Ristra application was performed by using a Flexible Computational Science Infrastructure synthetic I/O benchmark. Several improvements were made, including the tuning of collective metadata for HDF5.
  • Working with the CODAR team, the development of the Chimbuko performance analysis service was accelerated by using Mochi components. This new prototype service was presented at the In-Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization workshop in conjunction with SC20.
  • The project team implemented a new HDF5 virtual object layer module that enables the use of a log-structured format for storing data written via the HDF5 API. The latest performance results indicate that this module is competitive with Parallel netCDF for some key use cases.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo