Tackling the Complex Task of Software Deployment and Continuous Integration at Facilities

By Scott Gibson

Guest on this episode of the Let’s Talk Exascale podcast is Ryan Adamson, a Core Operations Group leader at Oak Ridge National Laboratory. His role in the US Department of Energy’s (DOE) Exascale Computing Project (ECP) revolves around software deployment and continuous integration at DOE facilities.

Ryan Adamson of Oak Ridge National Laboratory

Ryan Adamson of Oak Ridge National Laboratory

For ECP to succeed, software the project is developing must be in place at the facilities and achieve speedup on the exascale machines, which requires meeting figures of merit determined by ECP’s Application Development and Software Technology research focus areas and ECP’s director. “We don’t want to spend a lot of money and then not have these codes scale up, so we have a testing phase in the last year of the project to show that we’re making the improvements that we’re paying for,” Adamson said.

Software deployment and continuous integration at the facilities is a very complex process. “Each of the scientific applications that we have depends on libraries and underlying vendor software,” Adamson said. “So managing dependencies and versions of all of these different components can be a nightmare. We’re embracing Spack, a package manager or app store, for scientific users to define what they want, whether this be compilers or libraries. Spack manages the entire build stack for you.”

Establishing a common understanding across the facilities relative to software deployment depends on excellent collaborative communication.

“Getting all of the six facilities—Argonne, Oak Ridge, NERSC at Berkeley Lab, Sandia, Livermore, and Los Alamos—all on the same page and supporting Spack and other tools in the same way is a challenge,” Adamson said. “So we decided to tackle that through a series of working group calls and quarterly meetups where we talk about the problems that we’re having at each of these facilities and the steps to take to rectify the situation. I think things are going really well.”

The maturity of deployment currently varies across the facilities, but efforts are directed at ensuring they are all brought up to the proper level. “We work really closely with the Software Technology area of ECP,” Adamson said. “They’re the ones who are developing Spack. They’re responsible for container support and container delivery for these software applications. And right now, we’re getting them established with continuous integration at each of the facilities so that as they develop new codes, they can test them right away on these interesting pieces of hardware that we’re delivering as part of ECP to have a more real-time feedback loop concerning whether their products and solutions are working. So we’re really close to that, and we’re about to turn the corner.”

Code repositories are integral to the continuous integration process. “We have site-local GitLab repositories that ECP Application Development and Software Technology project teams can use right now,” Adamson said. “We want to federate all of those code repositories from one central location so that scientific users and applications developers can go to one spot to make their application changes. The changes get automatically tested at each of the facilities without having to require them to log in everywhere. We can do continuous integration and software deployment right now. But we’re working to make it better, and we expect to reach the level we’re striving for in the next couple of months.”

Adamson is collaborating with ECP’s Software Technology research focus area to identify the most important applications. “We have a set of liaisons that work with those project teams to integrate them with one continuous integration effort as well as get Spack tool kits and other items built for their projects,” he said.