Researchers working with the ExaIO project have developed an asynchronous input/output (I/O) framework for HDF5 (Hierarchical Data Format version 5), a popular parallel I/O library used by many Exascale Computing Project (ECP) applications across scientific domains. Their method improves upon synchronous I/O, where computation resources sit idle while I/O operations move data to storage, by moving data to the memory or storage layer using background threads to facilitate simultaneous data storage, processing, and computation. The research was published in the April 2022 issue of IEEE Transactions on Parallel & Distributed Systems.
Traditional disk-based storage cannot keep pace with today’s data volume and processor speed, creating a performance gap between fast memory and slow disk-based storage. Supercomputing architects have responded by adding multiple levels of nonvolatile storage devices to handle bursty I/O. However, moving data across the storage hierarchy is complex and can still take longer than data generation or analysis. Asynchronous I/O methods can significantly reduce the impact of I/O latency by allowing applications to schedule I/O early, improving observed I/O time and overall execution time as well as potentially allowing users to store or analyze more data in the same duration. The ExaIO team’s approach supports all types of read, write, and metadata operations; manages data dependencies transparently and automatically; provides implicit and explicit modes to perform asynchronous I/O; and gives an interface for error information retrieval.
The researchers used the Summit and Cori supercomputers, located at Oak Ridge and Lawrence Berkeley National Laboratories, respectively, to test their method with several representative I/O kernels and real application workloads, including Nyx from the ExaSky project and Castro from the ExaStar project to evaluate asynchronous I/O operations in ECP software frameworks AMReX and Flash-X. The team has consulted with NASA’s Ames Research Center on adopting asynchronous I/O methods, and there is significant interest from the HDF5 user community. The researchers have been working with both ECP and non-ECP applications that spend a significant amount of time executing I/O operations to speed up their execution time. They are developing the use of locality-aware I/O operations, testing asynchronous I/O with HDF5-based libraries such as h5py, and exploring caching and prefetching methods for broader applicability.
Houjun Tang, Quincey Koziol, Suren Byna, John Ravi, “Transparent Asynchronous Parallel I/O Using Background Threads.” 2022. IEEE Transactions on Parallel & Distributed Systems (April). https://doi.org/10.1109/TPDS.2021.3090322.
Performance advantage for a Castro configuration using asynchronous I/O is shown in the plot. Compared with the original synchronous I/O, using asynchronous I/O delivers nearly five times the I/O time speedup when writing or reading 5 steps of data, effectively hiding almost all the I/O latency.