This Community BOF will provide an overview of Data Management and Data Analysis & Visualization capabilities under development as part of the Exascale Computing Project (ECP). We will cover the Data Management portfolio and the Data Analysis and Visualization portfolio.
Changes in the hardware architecture of Exascale supercomputers will render current approaches to data management, analysis and visualization obsolete, resulting in disruptive changes to the scientific workflow and rendering traditional checkpoint/restart methods infeasible. Exascale system concurrency is expected to grow by five or six orders of magnitude, yet system memory and I/O bandwidth/persistent capacity are only expected to grow by one and two orders of magnitude, respectively. The reduced memory footprint per FLOP further complicates these problems, as does the move to a hierarchical memory structure. Scientific workflow currently depends on exporting simulation data off the supercomputer to persistent storage for post hoc analysis.
On Exascale systems, the power cost of data movement and the worsening I/O bottleneck will make it necessary for most simulation data to be analyzed in situ, or while the simulation is running. Furthermore, to meet power consumption and data bandwidth constraints, it will be necessary to sharply reduce the volume of data moved on the machine and especially the data that are exported to persistent storage. The combination of sharp data reduction and new analysis approaches heighten the importance of capturing data provenance (i.e., the record of what has been done to data) to support validation of results and post hoc data analysis and visualization. In this BOF, we will provide an overview of the efforts in ECP for supporting efficient and portable data management, analysis, and visualization on Exascale platforms.
The data management activities in ECP address the severe I/O bottleneck and challenges of data movement by providing and improving storage system software; workflow support including provenance capture; and methods of data collection, reduction, organization and discovery. We will report on the efforts and status of I/O libraries used by applications, HDF5, ADIOS, MPI-IO, and PnetCDF; the checkpointing library VeloC; the burst buffer file system UnifyFS; and data management support software developed as part of the DataLib project. All of these approaches are being designed and implemented in collaboration as part of ECP to address the needs of users of Exascale platforms.
Data analytics and visualization (DAV) are capabilities that enable scientific knowledge discovery. The ECP DAV portfolio is building an exascale-capable ecosystem to support scientific discovery while addressing the challenges of analyzing, reducing, and visualizing data. In this section of the BOF, we will overview the ECP in situ infrastructures, Ascent and ParaView/Catalyst from the ALPINE project. ALPINE is also developing a suite of in situ algorithms to enable automatic feature detection, selection and data reduction while the data is still resident in memory. These capabilities are built on top of VTK-m, the many-core visualization library used for cross-platform portability across the ECP DAV projects. We will cover SENSEI, an open-source, generic in situ interface that allows parallel simulations to code-couple to parallel third-party endpoints. We will also overview Interactive post hoc approaches that work on data extracts produced in situ to support post hoc scientific workflows through visualization tools such as ParaView, VisIt and Cinema. We will report on the status of the ZFP and SZ compression capabilities under development as part of ECP. All of these I/O and DAV functionalities are available in the Data & Visualization Software Development Kit.
Attendees of our BOF will come away with an understanding of different approaches they can use to tackle their data management needs, avoid I/O bottlenecks and develop scientific workflows supported by both in situ and post hoc data analytics and visualization tools. Many of the software technologies overviewed will have in-depth BOFs later in the week.
- Terry Turton (LANL)
- Rob Ross (ANL)
- Jim Ahrens (LANL)