In pursuit of more accurate real-world systems modeling, scientific applications at exascale will generate and analyze massive amounts of data. One critical requirement for these applications to be able to complete their science mission is the capability to access and manage these data efficiently on exascale systems. Parallel I/O, the key technology behind moving data between compute nodes and storage, faces monumental challenges from new application workflows, as well as the memory, interconnect, and storage architectures considered in the designs of exascale systems. The ExaIO project is delivering the Hierarchical Data Format version 5 (HDF5) library and the UnifyFS file system to efficiently address these storage challenges.
In the future, parallel I/O libraries and file systems must be able to handle file sizes of many terabytes and I/O performance much greater than what is currently achievable to satisfy the storage requirement of exascale applications and enable them to achieve their science goals. As the storage hierarchy expands to include node-local persistent memory, solid-state storage, and traditional disk and tape-based storage, data movement among these layers must become much more efficient and capable. The ExaIO project is addressing these parallel I/O performance and data management challenges by enhancing the HDF5 library and developing UnifyFS to use exascale storage devices.
HDF5 is the most popular high-level I/O library for scientific applications to write and read data files at supercomputing facilities, and it has been used by numerous applications. The ExaIO team is developing various HDF5 features to address efficiency and other challenges posed by data management and parallel I/O on exascale architectures. The ExaIO team is productizing HDF5 features and techniques that were previously prototyped, exploring optimizations on upcoming architectures, and maintaining and optimizing existing HDF5 features tailored for the exascale applications. They are also adding new features, including transparent data caching in the multilevel storage hierarchy, topology-aware I/O-related data movement, full single-writer and multi-reader for workflows, asynchronous I/O, and I/O from accelerator processors (i.e., GPUs).
The ExaIO team is developing UnifyFS, a user-level file system highly specialized for shared file access on high-performance systems with distributed node-local storage. Although distributed node-local storage offers significant performance advantages, it is extremely challenging to use it for applications that operate on shared files. UnifyFS creates a shared file system namespace across the distributed storage devices in a job, greatly simplifying their use. Thus, UnifyFS addresses a major usability factor of pre-exascale and exascale systems. UnifyFS transparently intercepts I/O calls from applications and I/O libraries, allowing UnifyFS to be cleanly integrated with applications and other software, including I/O and checkpoint/restart libraries.