Software Enables Use of Distributed In-System Storage and Parallel File System

By Scott Gibson

For decades high-performance computing (HPC) applications used a shared parallel file system for input/output (I/O), but the recent addition of new storage devices in systems—burst buffers, for example—has enabled the creation of a storage hierarchy. The hierarchy can be arranged such that in-system storage is the first level, and the parallel file system the second. For optimal performance, applications must change along with the architectures.

A software product from the Exascale Computing Project (ECP) called UnifyFS can provide I/O performance portability for applications, enabling them to use distributed in-system storage and the parallel file system.

Kathryn Mohror of Lawrence Livermore National Laboratory (LLNL) and Sarp Oral of Oak Ridge National Laboratory (ORNL) are co-principal investigators of ECP’s ExaIO project and leads of the UnifyFS effort. They are guests on the latest edition of the Let’s Talk Exascale podcast, which was recorded in Denver at SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis.

Fast and Easy

The UnifyFS product is a file system for burst buffers. It makes using burst buffers on exascale systems fast and easy.

“It is easy because it presents a shared namespace, or mountpoint, across burst buffers that an application can use just like it does the parallel file system,” Mohror said. “It can be used directly by applications or through higher-level libraries like HDF5, ADIOS, MPI IO, pnetCDF, or checkpoint libraries like VeloC.”

Sarp Oral of Oak Ridge National Laboratory and Kathryn Mohror of Lawrence Livermore National Laboratory

From left, Sarp Oral of Oak Ridge National Laboratory and Kathryn Mohror of Lawrence Livermore National Laboratory

UnifyFS is fast because it is tailored specifically for HPC workloads, including checkpoint/restart and visualization output. “It’s also fast because an instance of UnifyFS is spun up specifically for your job; there is no contention for the resources from other jobs,” Mohror said. “UnifyFS is targeted for ECP systems and currently runs on Summit and Sierra. We are closely monitoring the developments for upcoming machines—for example, Frontier, El Capitan, and Aurora—and will be porting UnifyFS so it runs on them as well.”

Part of the ECP ExaIO project, led by Suren Byna at Lawrence Berkeley National Laboratory, UnifyFS is a joint development effort between LLNL and ORNL, with LLNL as lead. The team is composed of 10 members, 5 from ORNL and 5 from LLNL. The project began in 2017 and is scheduled to go through the end of ECP and beyond.

UnifyFS supports applications, I/O libraries (HDF5), and checkpoint/restart libraries (VeloC) so that they run efficiently over distributed burst buffers, or distributed in-system storage. Applications or I/O libraries that use UnifyFS can gain performance and shared file support for distributed in-system storage.

“Without UnifyFS, it is very challenging to use the distributed storage,” Mohror said. “You would have to implement a lot of software to be able to use them efficiently for anything other than a local store. But UnifyFS makes the process as easy as using a parallel file system. Using UnifyFS is transparent to the application—you use a different mount point just as you would use the parallel file system. So instead of writing to /scratch, you would write to /UnifyFS.”

Tailored Support

The UnifyFS team is working closely with the HDF5 I/O library team to ensure HDF5 is an optional interface between the application and UnifyFS. “We are tailoring our support for HDF5 since many ECP applications use it,” Mohror said. “We are making sure full support and good performance are provided. That said, applications can use UnifyFS directly or through other I/O libraries as well. The main idea is that UnifyFS provides I/O performance portability for ECP applications.”

Even though I/O is such an important aspect of application performance and portability, it doesn’t always get the attention it deserves. “Applications generate output that scientists analyze to come to new scientific insights; so, the output is important, but it can take significant time to do the I/O,” Oral said. “Our goal is to create a seamless, high-performance bridge between new I/O system architectures and the way applications perform I/O in a way that is transparent to applications.”

Hierarchical storage systems are new, and they vary from system to system, posing a challenge to software design. “There is no telling with certainty what the storage systems of future systems will be like, so we have designed UnifyFS to be flexible to accommodate any kind of future in-system storage,” Oral said.

The UnifyFS project has completed the first 3-year phase of ECP and recently made available its 0.9 release, which is beta. “Over the past 3 years, we took a research prototype and modified it quite a bit,” Mohror said. “Some of the major changes we have made are to port it to run on Summit [at the Oak Ridge Leadership Computing Facility] and Sierra [at LLNL]. We added support to make it easier for users to integrate UnifyFS into their job scripts. And under the covers, we have been making quite a lot of changes to improve portability and speed. For example, in our 0.9 release, we improved the performance of read operations by as much as 37 percent.”

UnifyFS can provide ECP applications performance-portable I/O across changing storage system architectures, including the upcoming Aurora, Frontier, and El Capitan exascale machines. “It is critically important that we provide this portability so that application developers don’t need to spend their time changing their I/O code for every system,” Oral said. “Also, UnifyFS can provide performance above and beyond that of general-purpose file systems. The reason is that UnifyFS relaxes normal file system policies for performance. We make assumptions about what applications will do based on decades of experience with HPC I/O. Then instead of enforcing strict and expensive policies on I/O operations, we relax those policies and get better performance. For example, we assume that reads and writes on a file will not be done at the same time but will occur in distinct read and write application phases. This is true for the vast majority of applications. By doing this, we get much better performance than is possible with file systems that strictly enforce the policies.”

The change in how policies are enforced is a departure from the way I/O has been done for decades in HPC. “The temporary in-system storage provides an opportunity to maintain high performance while taking advantage of relaxed POSIX consistency semantics, and UnifyFS can provide a seamless interface with this new in-system storage hardware layer and ECP applications,” Oral said.

The main focus for the UnifyFS team in the coming years will be to integrate their work with other ECP projects. “In particular, we are working closely with the HDF5 team to make sure that it is fully supported by UnifyFS,” Mohror said. “We are also working with the VeloC checkpoint/restart team. They will use UnifyFS for applications that write shared files, which are not supported by VeloC on its own. And we are looking for early adopters who want to have portable high-performance I/O on in-system storage. We will be very happy to talk to any application teams that are interested in working with us to improve their I/O on today’s systems and on the exascale platforms.”