Migrating to Heterogeneous Computing: Lessons Learned in the Sierra and El Capitan Centers of Excellence

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The October webinar is titled Migrating to Heterogeneous Computing: Lessons Learned in the Sierra and El Capitan Centers of Excellence, and will be presented by David Richards (Lawrence Livermore National Laboratory). The webinar will take place on Wednesday, October 13, 2021 at 1:00 pm ET.

Abstract:

The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. The Sierra Center of Excellence (COE) brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. To prepare for El Capitan, a new COE is currently operating in collaboration with HPE and AMD. This webinar will describe the operation of these COEs and document lessons learned, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to be best practices for managing the vendor collaborations, migrating algorithms and source code, working with the system software stack and tools, and optimizing application performance.