Jan
15
Wed
Refactoring EXAALT MD for Emerging Architectures
Jan 15 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The next webinar is titled Refactoring EXAALT MD for Emerging Architectures, and will be presented Aidan Thompson (Sandia National Laboratories), Stan Moore (Sandia National Laboratories), and Rahulkumar Gayatri (NERSC). The webinar will take place on Wednesday, January 15, 2020 at 1:00 pm ET.

Abstract:

As part of the DOE Exascale Computing Project, members of the EXAALT project are working to increase the accuracy, time, and length scales of molecular dynamics simulations of materials for fusion energy. Simulations rely on the SNAP machine-learning interatomic potential to accurately capture material properties. The SNAP kernel recursively evaluates a set of complex polynomial functions, requiring many deeply nested loops with irregular loop bounds. Last year, a worrisome trend in the SNAP force kernel was identified. With each new generation of emerging architectures, performance relative to theoretical peak was decreasing, particularly on GPUs. This webinar will discuss the approach used to rewrite the SNAP kernel from the ground up, using more compact memory representation, refactoring the main loop, using sub-kernels to reduce pressure on GPU threads, and improving coalesced memory accesses on the GPU. This work has enabled a spectacular increase of roughly 10x in performance over the baseline implementation of the SNAP benchmark running on NVIDIA V100 GPUs. Extrapolated to the full machine, this predicts an increase of over 100x in the Figure of Merit over the baseline on the ALCF/Mira system, putting EXAALT on track to meeting, and even exceeding performance targets on exascale systems. The webinar will emphasize key strategies and lessons learned in code transitions for emerging architectures.

Feb
19
Wed
Introduction to Kokkos
Feb 19 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The next webinar is titled Introduction to Kokkos, and will be presented by Christian Trott (Sandia National Laboratories). The webinar will take place on Wednesday, February 19, 2020 at 1:00 pm ET.

Abstract:

The Kokkos C++ Performance Portability Ecosystem is a production-level solution for writing modern C++ applications in an hardware-agnostic way. It is part of the US Department of Energy’s Exascale Computing Project—the leading effort in the US to prepare the HPC community for the next generation of supercomputing platforms. Kokkos is now used by more than a hundred HPC projects, and Kokkos-based codes are running regularly at-scale on at least five of the top ten supercomputers in the world. In this webinar, we will give a short overview of what the Kokkos Ecosystem provides, including its programming model, math kernels library, tools, and training resources, before providing an overview of the Kokkos team’s efforts surrounding the ISO-C++ standard, and how Kokkos both influences future standards and aligns with developments occurring in them. The webinar will include a status update on the progress in supporting the upcoming exascale class HPC systems announced by DOE.

Mar
18
Wed
Testing: Strategies When Learning Programming Models and Using High-Performance Libraries
Mar 18 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The next webinar was titled Testing: Strategies When Learning Programming Models and Using High-Performance Libraries, and was presented by Balint Joo (Jefferson Lab). The webinar took place on Wednesday, March 18, 2020 at 1:00 pm ET.

Abstract:

Software testing is an invaluable practice, albeit the level of testing in scientific applications can vary widely, from no testing at all to full continuous integration (as discussed in earlier webinars of the HPC-BP series). In this webinar I will consider a specific case: the use of unit-testing when developing a mini-app as an approach to learn about new programming models such as Kokkos and SYCL, or when using (or contributing to) high-performance libraries. I will illustrate with an example from Lattice QCD, focusing on the integration of the QUDA optimized library with the Chroma application. The webinar will focus on lessons learned and generally applicable strategies.

Mar
25
Wed
DAOS: Next-Generation Data Management for Exascale
Mar 25 @ 12:00 pm – 1:00 pm

Abstract

The Distributed Asynchronous Object Storage (DAOS) is an open-source, scale-out object store designed from the ground up for massively distributed Non-Volatile Memory (NVM). DAOS takes advantage of next-generation NVM technology, like Storage Class Memory (SCM) and NVM express (NVMe), and is extremely lightweight since it operates end-to-end in user space with full OS bypass. DAOS offers a shift away from an I/O model designed for block-based and high-latency storage to one that inherently supports fine-grained data access and unlocks the performance of the next-generation storage technologies. This presentation introduced the key concepts behind DAOS and the software ecosystem enabling this technology. The presentation provided details on the DAOS deployment on Aurora and how applications can benefit from this new storage tier.

Organizers

  • Ray Loy (ALCF)
  • Yasaman Ghadar (ALCF)

Presentation materials

Apr
3
Fri
Strategies for Working Remotely: Advice from Colleagues with Experience
Apr 3 @ 3:00 pm – 4:30 pm

Registration for this event is now closed.

Abstract: Working remotely has suddenly become a near-universal experience for staff members of research organizations, but for some it has been a way of life for years.  In this panel discussion, we brought together five staff members of U.S. Department of Energy (DOE) laboratories, all members of the DOE Exascale Computing Project (ECP), with years of varied experience working remotely.   Topics included advice to people just getting started with working remotely, challenges, unforeseen benefits, and opportunities to look for from this experience, with emphasis on issues faced by collaborating teams in computational research.  Panelists made brief introductory comments followed by open discussion.  We invite you to check out the slides, video, and Q&A document from the webinar below.

Moderator: Mike Heroux, Sandia National Laboratories

Panelists:

  • Mike Bernhardt, Oak Ridge National Laboratory
  • Lois Curfman McInnes, Argonne National Laboratory
  • Mark Miller, Lawrence Livermore National Laboratory
  • Kathryn Mohror, Lawrence Livermore National Laboratory
  • Elaine Raybourn, Sandia National Laboratories
Apr
15
Wed
Best Practices for Using Proxy Applications as Benchmarks
Apr 15 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The next webinar in the series was titled Best Practices for Using Proxy Applications as Benchmarks, and was presented by David Richards (Lawrence Livermore National Laboratory) and Joe Glenski (Hewlett Packard Enterprise). The webinar will took place on Wednesday, April 15, 2020 at 1:00 pm ET.

Abstract:

Proxy applications have many uses in software development and hardware/software co-design. Because most proxies are easy to build, run, and understand, they are especially appealing for use in benchmark suites and studies. This webinar examined the role of proxy apps as benchmarks and explained why run rules and a figure of merit are essential for a proxy application to function as an effective benchmark. The presenters showed how to evaluate the fidelity of benchmarks as a model for actual workloads and provided tips on creating problem specifications and other run rules. The presenters discussed what DOE facilities are looking for when they assemble benchmark suites for use in procurements. Finally, the presenters explained how system vendors use their benchmark suites and what practices they view as most (and least) effective.

Apr
24
Fri
Strategies for Working Remotely: Challenges Faced by Parents Who are Working Remotely, and Overcoming Them
Apr 24 @ 3:00 pm – 4:30 pm

The IDEAS ECP Productivity project has launched an informal working remotely panel series with a target of covering a new topic every other Friday.  The next installment in the series took place April 24th and was titled, “Challenges Faced by Parents Who are Working Remotely, and Overcoming Them”.

Abstract:  While working remotely is challenging enough, many are currently experiencing unique complexities involved with parenting, transitioning to online school at home, and working productively while practicing social distancing in response to COVID-19.  In the second installment of the IDEAS-ECP panel discussion series, we brought together four ECP staff members, a new staff member on-boarding with a National Lab, and a Montessori educator to share ideas and resources.  Panelists made brief introductory comments followed by open discussion.

You can find the the video of the panel discussion as well as additional resources that were shared during the webinar in the area below titled “Materials from the Webinar”.

May
13
Wed
Accelerating Numerical Software Libraries with Multi-Precision Algorithms
May 13 @ 1:00 pm – 2:00 pm

The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC and the DOE Exascale Computing Project (ECP) has resumed the webinar series on Best Practices for HPC Software Developers, which we began in 2016.

As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The May webinar was titled Accelerating Numerical Software Libraries with Multi-Precision Algorithms, and was presented by Hartwig Anzt (Karlsruhe Institute of Technology) and Piotr Luszczek (University of Tennessee). The webinar took place on Wednesday, May 13, 2020 at 1:00 pm ET.

Abstract:

With the rise of machine learning, more hardware manufacturers are introducing low-precision special function units in processor designs, often achieving up to an order or magnitude higher performance than in the IEEE double precision that is typically used as working precision in scientific computing. At the same time, a rapidly expanding landscape of mixed- and multi-precision methods generate high-quality solutions that leverage the higher compute power of reduced precision. This webinar introduced the concept of floating point formats and the IEEE standard. The speakers demonstrated how using an iterative or direct solver in lower precision impacts the solution quality. The speakers outlined several strategies that aim to preserve numerical stability and high solution quality while still computing, at least partially, in lower precision. The speakers presented several multi-precision algorithms that have proven particularly successful and elaborate on their realization and usage. The speakers also introduced open source production-quality multi-precision software packages and showed their integration and efficiency for scientific applications. The webinar focused on lessons learned and generally applicable strategies.

May
21
Thu
Strategies for Working Remotely: Making the Transition to Virtual Software Teams
May 21 @ 3:00 pm – 4:30 pm

In response to the COVID-19 pandemic and need for many to transition to unplanned remote work, the IDEAS-ECP project has launched the panel series Strategies for Working Remotely, which explores important topics in this area. The next panel discussion in the series was titled, “Making the Transition to Virtual Software Teams”.

Abstract: Scientific software teams are now working remotely and collaborating virtually in response to COVID-19 social distancing practices. In many cases, teams were co-located, and their transition unplanned. As working remotely has suddenly become a near-universal experience for staff members of research organizations, many software teams are now functioning as completely virtual teams—geographically dispersed and interacting only through electronic communication rather than in person.  In the third installment of this IDEAS-ECP panel discussion series, we brought together several staff members of DOE laboratories, who spoke about experiences in recent transitions from co-located and partially distributed software teams to fully virtual software teams. Topics included challenges, lessons learned, unforeseen benefits, and opportunities to look for from this experience. Panelists made brief introductory comments followed by open discussion.

Panelists:

  • Jay Jay Billings, ORNL
  • Mark Gates, University of Tennessee
  • Mahantesh Halappanavar, PPNL
  • Angela Herring, LANL
  • Axel Huebl, LBNL

Moderators:

  • Ashley Barker, ORNL
  • Elaine Raybourn, SNL
May
27
Wed
ALCF/ECP UPC++ Webinar
May 27 @ 12:00 pm – 3:00 pm

UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications

UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.

UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).

In this webinar, hosted by DOE’s Exascale Computing Project and the ALCF, we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.

This training requires registration so please click the “Tickets” link above to register.