HPC System and Software Testing via Buildtest
This talk was held on April 15, 2021 as part of the 2021 ECP Annual Meeting.
HPC computing environment is a tightly coupled system that includes a cluster of nodes and accelerators interconnected with a high-speed interconnect, a parallel file system, multiple storage tiers, a job scheduler and a software stack for users to run their workflows. This environment is highly interdependent, therefore it is essential to regularly test various components of the HPC system and the software stack. There is significant progress in software build frameworks (spack, easybuild) for installing software packages for HPC systems, however there is little consensus on the testing front.
In this talk, we presented buildtest (https://buildtest.readthedocs.io/en/devel/index.html), an acceptance testing framework for HPC systems. In buildtest, tests are written in YAML called ‘buildspecs’ which are processed by buildtest into shell-scripts. These tests can be run locally or via a job scheduler (Slurm, LSF and Cobalt). Buildtest supports a rich YAML structure for writing buildspecs which is defined in JSON Schema for validating buildspecs. Currently, buildtest supports two major schema types (compiler and script) for writing shell and python scripts as well as single source compilation tests.
In this talk, we covered the core framework, its features and writing tests (i.e. buildspecs) using script and compiler schema. In addition, we presented a summary of Cori testsuite (https://github.com/buildtesters/buildtest-cori) that includes real tests for Cori system at NERSC.
In Jan 2021, we deployed Spack E4S 20.10 stack (https://docs.nersc.gov/applications/e4s/) on Cori for the NERSC user community. As part of this initiative, we test E4S stack via E4S testsuite (https://github.com/E4S-Project/testsuite) using buildtest with Gitlab scheduled pipelines. We concluded this talk with a brief demo of buildtest and additional resources to get started.