An Introduction to HDF5 in HPC Environments Supporting Materials Webinar

In this presentation, we introduce the concept and practices of data management based on HDF5. Our main goal is to let users with no previous HDF5 experience be productive in an HPC environment as quickly as possible. As a secondary goal, we want them to be aware of the resources that will let them take their mastery of HDF5 to the next level. Attendees with a working knowledge of C/C++, Fortran, or Python, plus basic MPI programming, will get the most out of this introduction.

We have organized this presentation into five sections. We begin with a few motivating examples and heuristics for mapping between ideas and their manifestations in storage structures. We will mention viable solutions without the use of HDF5, but point out their “atomistic” character as opposed to HDF5’s holistic approach. We then show the fastest known path, in terms of user effort /and/ run time, to transform in-memory structures into bytes in storage. Having seen HDF5 in action, we take a step back to reflect on our initial problem set and what HDF5 has to offer. We then make the transition into “proper” HPC with parallel HDF5. We will discuss the inevitable challenges of an environment in which there are many more moving parts above and below the HDF5 library. It’s all about finding balance, and we will present a few proven techniques without which no user of HDF5 should be.

In the last section of this presentation, we will survey the supporting ecosystem around HDF5 and preview the intermediate topics that will be covered in a future event.