Principal Investigator: Daniel Laney, Lawrence Livermore National Laboratory
This project combines two projects at Lawrence Livermore National Laboratory (LLNL). The first is focused on building an ecosystem of re-usable components for user workflows that enhance the end-to-end productivity of ASC HPC users and to allow them to introduce modern data analytics technologies to their workflows. The project is focused on three areas prioritized by the user community: problem setup, simulation management, and data management and analytics. The project focuses on an ecosystem of tools and libraries because there is no single overarching workflow system that can meet the needs of users in the wide variety of LLNL domains. The team mitigates risk by providing the user community with a set of components that encompasses best practices, common operations, and next-generation capabilities.
The second project is focused on storage and I/O for next generation supercomputing systems. This project targets the design and implementation of a next-generation software stack for storage and I/O and includes work on checkpointing, user-level file systems, and burst buffer management. In particular, we will improve the Scalable Checkpoint/Restart Library (SCR) to support ASC applications and to utilize next-generation storage hierarchies, e.g. node-local burst buffers on CORAL Sierra and Summit. We will also continue our work on the Unify user-level file system framework to support emerging analytics I/O workloads.