Reducing Technical Debt with Reproducible Containers

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The November webinar is titled Reducing Technical Debt with Reproducible Containers, and will be presented by Tanu Malik (DePaul University). The webinar will take place on Wednesday, November 4, 2020 at 1:00 pm ET.

Abstract:

Computational experiments can be challenging to reproduce; researchers have to choose between pursuing a fast-paced research agenda and developing well-organized, sufficiently documented, and easily reproducible software. Like incurring fiscal debt, there are often tactical reasons to take on technical debt in scientific software—such as deferring documentation, organization, refactoring, and unit tests when pursuing a new idea or meeting a conference deadline. However, more often than not, researchers do not repay this technical debt, leading to irreproducible experiments.

The webinar will describe different levels of technical debt and quantify the cost of not repaying the technical debt. The presenter will introduce isolation in containers as a powerful mechanism for reducing portability debt and describe limitations of current container tools. The presenter will introduce a vision of a reproducible container that aims to automate repayment of different types of technical debt, and will describe the current state of this vision with three tools that use isolation, encapsulation, and monitoring to include necessary and sufficient content in the container—both in terms of software and data, and describe the contents of the container. Finally, the presenter will show results of using reproducible containers on domain science and HPC use cases, and provide guidance.