Aug
27
Fri
Variorum Lecture Series
Aug 27 @ 7:00 pm – 8:30 pm

1st Variorum Lecture Series August 2021

The Variorum team will provide its first Variorum Lecture Series, where attendees will learn everything necessary to start using Variorum on various platforms to write portable power management code. The team will provide support through GitHub and Variorum mailing list during and after the lecture series. This Variorum Lecture Series will consist of two modules, each of 1.5 hours each. We will hold two sessions to accommodate different time zones as well as attendee schedules.

    Module 1: August 6, 8:30am-10:00am PT / 11:30am-1:00pm ET, targeting US/European attendees, and August 20, 4:00pm-5:30pm PT / 7:00pm-8:30pm ET, targeting US/Asian attendees.
    Module 2: August 13, 8:30am-10:00am PT / 11:30am-1:00pm ET, targeting US/European attendees, and August 27, 4:00pm-5:30pm PT / 7:00pm-8:30pm ET, targeting US/Asian attendees.

What is Variorum?

Variorum is a production-grade, open-source, vendor-neutral software infrastructure for exposing low-level control and monitoring of a system’s underlying hardware features. It can easily be ported to different hardware devices, as well as different generations within a particular device. This allows users to manage power, performance and thermal information seamlessly across hardware from different vendors. More specifically, Variorum’s flexible design supports a set of features that may exist on one generation of hardware, but not on another. Variorum can also be included as part of the system software stack for power management: such as runtime systems, resource managers, and other profiling tools. At present, Variorum supports 5 platforms (IBM, Intel, AMD, ARM and NVIDIA) and a total of ten microarchitectures across these platforms.

Contents of the Lectures

Module 1: Introduction to Variorum

  • Challenges in Power Management and The HPC Power Stack
  • Understanding Power Management Knobs on Intel, IBM, NVIDIA, ARM, and AMD platforms
  • Variorum Library
    • Build, dependencies, and setup
    • Monitoring user applications non-intrusively
    • Vendor-neutral Variorum API across diverse architectures
    • Using Variorum for finer-grained monitoring, power capping, and management

Module 2: Integrating Variorum with System Software and Tools

  • The HPC Power Stack revisited: need for power management at various levels
  • GEOPM: job-level power management
  • Kokkos and Caliper: application and workflow power management
  • SLURM (Research Extensions): system-level power management
  • Upcoming Features in Variorum
  • The HPC Power Stack Roadmap

How to Attend

  • The lecture series is available to everyone, and participants are welcome to attend any/all sessions.
  • No-cost registration is necessary, meeting link and password will be sent to registrants. See “Tickets” above.
  • Presenters will show in-depth demos during the lecture series. Presenters can provide support during and after the lecture series with setup and usage on supported architectures.

Presenters

  • Stephanie Brink, Tapasya Patki, Aniruddha Marathe and Barry Rountree (Lawrence Livermore National Laboratory)

Module 1

Module 2

Sep
14
Tue
Facility Testing of E4S via E4S Testsuite, Spack Test, and buildtest
Sep 14 @ 12:00 pm – 2:00 pm

Abstract:

Extreme-Scale Scientific Software Stack (E4S) is a collection of open-source software packages for running scientific application typically run on HPC systems. E4S is a collection of spack packages that is built collectively with a fixed version of spack on a quarterly basis. So far, we have deployed E4S on Cori and Summit, and in the coming months we will have E4S deployed on Perlmutter, Spock, and Aurora. As part of the facility deployment, we need a mechanism to test the software stack and increase test coverage that properly tests the software at the facility.

E4S Testsuite is a validation testsuite for E4S products, which is a collection of shell scripts, makefiles, source code that runs tests on a facility deployed spack stack. Spack recently added support to run standalone tests for spack packages via spack test command which allows one to specify tests in their spack packages that can be run post-deployment.

Buildtest is a HPC testing framework designed to help facilities develop and run tests to validate their system and software stack. In buildtest, tests are written in YAML template called buildspecs which are processed by buildtest into shell-scripts. Buildtest has support for job submission to Slurm, LSF, PBS and Cobalt scheduler. Buildtest supports a rich YAML structure for writing buildspecs that is validated by a JSON Schema. Furthermore, buildtest provides numerous commands to query test results, inspect a particular test, validate buildspecs and find buildspecs through its buildspec cache. In v0.10.0, buildtest added support for spack schema to allow one to write buildspecs that can leverage spack to install specs, manage spack environments, and run spack test.

The presentation will provide a brief overview of buildtest commands and how to write tests in buildspecs, followed by a demo. The presentation will include an overview of Cori testsuite, which is a repository that contains sanity test for the Cori system including E4S tests using E4S tests. Gitlab is used to help automate execution of tests which are pushed to CDASH for post-processing. The presentation will end with a summary of E4S tests that are run on Cori, as well as current challenges.

References:

Presenter: Shahzeb Mohammed Siddiqui (NERSC)

SLIDES

Sep
15
Wed
What I Learned from 20 Years of Leading Open Source Projects
Sep 15 @ 2:00 pm – 3:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The September webinar is titled What I Learned from 20 Years of Leading Open Source Projects, and will be presented by Wolfgang Bangerth (Colorado State University). The webinar will take place on Wednesday, September 15, 2021 at 2:00 pm ET. This webinar will start one hour later than the usual time.

Abstract:

Scientific software has grown from loose collections of individual routines working on relatively simple data structures to very large packages of 100,000s to millions of lines of code, with dozens of contributors, and hundreds or thousands of users. In the process, the approaches to software development have also drastically changed: both the software packages as well as their development are professionally managed, with version control, extensive test suites, and automatic regression checks for every patch. Maybe more interestingly, the approaches to managing the *community* of software developers and users have also dramatically changed.

Having led two large, open source software projects (the finite element package deal.II, and the Advanced Simulator for Problems in Earth ConvecTion ASPECT) for more than 20 years, the presenter will share lessons learned about both the technical management of scientific software projects, as well as the social side of these projects.

Sep
23
Thu
Strategies for Working Remotely Panel Series – Training Virtualization
Sep 23 @ 3:00 pm – 4:15 pm

In response to the COVID-19 pandemic and transition to remote work, ECP and the IDEAS Productivity project launched the panel series Strategies for Working Remotely, which explores important topics in this area.

Abstract:

  • Many organizations abruptly transitioned from a primarily on-site to a primarily remote work experience last spring.  However, organizations still have training needs that were once largely accomplished through in-person events such as workshops, hackathons, and tutorials.  This panel shared what they learned during the past year in their efforts to bring more virtualization to what historically has worked for in-person training events.  What worked well?  What did not work?  This panel shared their insights about lessons learned over the past year and how those  experiences will inform plans moving forward when organizations can safely offer in-person training again.

Panelists:

  • Kelly Barnes, The Carpentries
  • Helen He, NERSC, Lawrence Berkeley National Laboratory
  • Julia Levites, Nvidia Corporation
  • Thomas Papatheodore, OLCF, Oak Ridge National Laboratory

Moderators:

  • Ashley Barker, Oak Ridge National Laboratory
  • Osni Marques, Lawrence Berkeley National Laboratory
Sep
24
Fri
Webinar: New Features in the HDF5 1.13.0 Release
Sep 24 @ 12:00 pm – 1:00 pm

New Features in the HDF5 1.13.0 Release

This webinar will cover the the major new features of the HDF5 1.13.0 release. It will cover pluggable virtual file drivers (VFDs) and changes to the virtual object layer (VOL), and show how to build and use the async, pass-through, and cache VOL connectors.

More information about the webinar, including registration, can be found here.

Oct
4
Mon
E4S at DOE Facilities with Deep Dive at NERSC
Oct 4 @ 12:00 pm – 3:00 pm

Abstract:

The Extreme-scale Scientific Software Stack (E4S) is a collection of open source packages for running scientific applications on high performance computing platforms. The E4S stack comes with 80+ applications including programming models, MPI, development tools such as HPCToolkit, TAU and PAPI, and math libraries, including PETSC and Trilinos. E4S is available for use via containers, buildcache, AWS EC2 image, and facility tuned spack environments in the form of spack.yaml. NERSC has deployed three versions of E4S (20.10, 21.02, 21.05) on Cori system using the spack package manager. NERSC plans to use E4S as the vehicle for installing and supporting much of the software provided for users on Perlmutter.

E4S is an ECP funded project that includes software products from Software Technology (ST) and Application Development (AD) teams. In this session, Mike Heroux (Director of Software Technology) will provide an overview of the ST focus area, future roadmap of ECP and E4S.

The Software Deployment group is responsible for deploying ECP software at the DOE facilities by partnering with AD and ST projects to properly tune their software for each facility. This group is responsible for providing CI infrastructure to help AD/ST teams automate their workflows. Ryan Adamson will provide an overview of Software Deployment group including current challenges and future roadmap.

Sameer Shende will present the components of E4S, how to use E4S containers, replacing MPI in an E4S container with the host MPI, creating custom containers for your application, using E4S on AWS and DOE facilities, and building applications using E4S with a bare-metal installation. He will highlight the use of E4S on Cori and answer questions about applying E4S to your projects.

Shahzeb Siddiqui will present an overview of E4S stacks installed at NERSC that will be a mix of hands-on and walkthrough the NERSC E4S Documentation. Participants are encouraged to follow the hands-on session if you have access to NERSC systems. We will conclude this session with an overview of E4S testing at NERSC and building a Spack Gitlab Pipeline for nightly builds of E4S.

Agenda:

  • ST Overview (Mike Heroux)
  • Introduction to E4S (Sameer Shende)
  • Software Deployment at the Facilities (Ryan Adamson)
  • E4S at NERSC (Shahzeb Siddiqui)
  • Q&A
Oct
12
Tue
2021 HDF5 User Group Meeting
Oct 12 @ 10:00 am – 2:30 pm

2021 HDF5 User Group Meeting

The 2021 HDF5 Users Group (HUG) will be held virtually on October 12-14, 2021. More information about the agenda and registration (required) can be found here.

Oct
13
Wed
2021 HDF5 User Group Meeting
Oct 13 @ 10:00 am – 2:30 pm

2021 HDF5 User Group Meeting

The 2021 HDF5 Users Group (HUG) will be held virtually on October 12-14, 2021. More information about the agenda and registration (required) can be found here.

Migrating to Heterogeneous Computing: Lessons Learned in the Sierra and El Capitan Centers of Excellence
Oct 13 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The October webinar is titled Migrating to Heterogeneous Computing: Lessons Learned in the Sierra and El Capitan Centers of Excellence, and will be presented by David Richards (Lawrence Livermore National Laboratory). The webinar will take place on Wednesday, October 13, 2021 at 1:00 pm ET.

Abstract:

The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. The Sierra Center of Excellence (COE) brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. To prepare for El Capitan, a new COE is currently operating in collaboration with HPE and AMD. This webinar will describe the operation of these COEs and document lessons learned, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to be best practices for managing the vendor collaborations, migrating algorithms and source code, working with the system software stack and tools, and optimizing application performance.

Strategies for Working Remotely at the DOE Laboratories of the Future Workshop on Effective Teaming and Virtual Collaboration
Oct 13 @ 1:00 pm – 3:00 pm

Strategies for Working Remotely will be a topic of discussion in the next DOE Laboratories of the Future (LOTF) workshop. We hope you will join us!

DOE laboratories are globally recognized to be masters of science at scale, interdisciplinary research, and operating national user facilities. At the same time, the laboratory complex is entering its eighth decade of existence and retains vestiges of its World War II roots. As the stewards of this national treasure, it is our job to ensure the laboratories have the resources and structures to thrive for the next 70 years and beyond.

Purpose of Workshop: The next event in the DOE Laboratories of the Future (LOTF) workshop series will be focused on effective teaming across the DOE laboratories and how we can best integrate new tools and mechanisms for virtual collaboration. Panel speakers will address:

  • How do decision-making teams work together effectively?
  • How can virtual collaborative tools help stimulate innovative collaborations?
  • What are strategies for working remotely at the national labs?
  • What new models of collaborative teaming can be adopted?

Panelists:

  • Dr. Nancy Cooke, Arizona State University – Effective Teamwork for DOE Laboratories of the Future

    Dr. Gary OlsonUniversity of California, Irvine – Virtual Collaborative Scholarship

    Dr. Elaine Raybourn, Sandia National Laboratories – Strategies for Working Remotely

    Dr. Francesca Poli, Oppenheimer Science and Energy Leadership Program  –  Championing New Models of Flexibility to Enhance Scientific Impact

Moderator:

  • Susannah Howieson, Office of Strategic Planning and Interagency Coordination (SPAIC), Department of Energy