Software Technology

Data and Visualization

VeloC: Very Low Overhead Transparent Multilevel Checkpoint/Restart

Principal Investigators: Franck Cappello, Argonne National Laboratory; Kathryn Mohror, Lawrence Livermore National Laboratory

This project is centered on providing an optimized checkpoint/restart library for applications and workflows. VeloC will increase programmer productivity by dramatically reducing the difficulty of handling varied and complex storage architectures and the need for performance/reliability optimizations. This Multi-level checkpoint/restart environment will reduce drastically the checkpoint /restart overhead and provide transparent optimizations for the ECP relevant systems.

• Provide a single API for data structure oriented and file oriented checkpoint/restart
• Provide an active back-end performing the checkpoint movements concurrently and asynchronously to the application execution
• Optimize multi-level checkpointing for the available CORAL and ECP relevant systems with deep/complex storage hierarchies: local memory, NVM, local burst-buffers, remote burst buffers, parallel file systems.
• Integrate VeloC in I/O libraries (HDF5, ADIOS, PnetCDF), advanced batch scheduler, Vendor data movement software.