As we traverse the second half of the 2018 calendar year, the Exascale Computing Project (ECP) continues to execute confidently toward our mission of accelerating the delivery of a capable exascale computing ecosystem* to coincide with the nation’s first exascale platforms in the early 2020s.
Our efforts in the project’s critical research areas of application development, software technology, and hardware and integration, supported by about 1,000 researchers, scientists, vendor participants, and project management experts further intensify as we make significant strides in addressing the four major challenges of exascale computing: parallelism, memory and storage, reliability, and energy consumption.
These four challenges were identified in the Mission Need statement for the ECP in March, 2016 and represent challenges that must be addressed to bridge the capability gap between existing HPC and exascale HPC. Drawing upon the original descriptions in ECP’s Mission Need document, let me expand on these challenges just a bit.
Parallelism: Exascale systems will have parallelism (also referred to as concurrency), a thousand-fold greater than petascale systems. Developing systems and applications software is already challenging at the petascale and increasing concurrency by a thousand will make software development efforts even more difficutl. To mitigate this complexity, a portion of the project’s R&D investments will be on tools that improve the programmability of exascale systems.
Memory and Storage: In today’s HPC systems, moving data from computer memory into the CPU consumes the greatest amount of time (compared to basic math operations.) This data movement challenge is already an issue in petascale systems and it will become a critical issue in exascale systems. R&D is required to develop memory and storage architectures to provide timely access to and storage of information at the anticipated computational rates.
Reliability: Exascale systems will contain significantly more components than today’s petascale systems. Achieving system-level reliability, especially with designs based on projected reductions in power, will require R&D to enable the systems to dynamically adapt to a possible constant stream of transient and permanent failures of components and the applications to remain resilient, in spite of system and device failures, in order to produce accurate results.
Energy Consumption: To state the obvious, the operating cost of an exascale system built on current technology would be prohibitive. Through pre-ECP programs like Fast Forward and Design Forward and current ECP elements like PathForward, engineering improvements identified with the vendor partners have potential to reduce the power required significantly. Current estimates indicate initial exascale systems could operate in the range of 20-40 megawatts (MW). Achieving this efficiency level by the mid-2020s requires R&D beyond what the industry vendors had projected on their product roadmaps.
How ECP Breaks It All Down—to Bring it All Together
ECP is a large, complex, visible, and high-priority DOE project. Managing a project as complex as the ECP requires an extraordinary, diverse team of dedicated professionals working in close collaboration. We are fortunate to have recruited such an experienced and widely respected team, from the leadership level all the way through the depths of this organization. The ECP’s principal and co-principal investigators, Control Account Managers (CAMs), researchers, and scientists span a research expertise spectrum that covers mathematics, energy sciences, earth sciences, nuclear science disciplines, computational chemistry, additive manufacturing, precision medicine, cosmology, astrophysics, metagenomics, and the entire range of software tools and libraries necessary to bring a capable exascale ecosystem online.
This chart depicts the Work Breakdown Structure of the ECP showing the logical segmentation of ECP’s projects under our key focus areas.
As with any large project, coordination, collaboration and communications are essential to keep us all working in harmony, and at the heart of this infrastructure is the ECP Deputy Director.
A New Member of the ECP Leadership Team
I am pleased to announce the selection of the new ECP Deputy Director who replaces Stephen Lee, as he has decided to retire after a stellar 31-year career at Los Alamos National Laboratory (LANL). Effective August 7, 2018, Lori Diachin from Lawrence Livermore National Laboratory (LLNL) has taken over as the ECP’s Deputy Director.
Lori has been serving as the Deputy Associate Director for Science and Technology in the Computation Directorate at LLNL since 2017. She has been at LLNL for 15 years and previously at Sandia National Laboratories and Argonne National Laboratory. She has held leadership roles in HPC for over 15 years, with experiences ranging from serving as the Director for the Center for Applied Scientific Computing at LLNL to leading multi-laboratory teams such as the FASTMath Institute in the DOE SciDAC program and serving as the Program Director for the HPC4Manufacting and HPC4Materials programs for DOE’s Office of Energy Efficiency and Renewable Energy and Office of Fossil Energy.
We are thrilled to have Lori joining our team, but I’d also like to say a few words about Lori’s predecessor, Stephen Lee. Not only has Stephen had an amazing career at LANL, he has been a significant contributor to the growth of the ECP. Stephen was dedicated to this effort from day one and approached his role as a team leader, a hands-on contributor, a brilliant strategist, and a mentor to many of the team members. Stephen was the architect of the ECP’s Preliminary Design Report, a critical, foundational document that was key to solidifying the credibility and conviction among project reviewers that ECP was determined to succeed and moving forward as a well-integrated machine. I believe I speak for all the ECP team members when I say Stephen Lee will be missed and we wish him well in retirement.
We are extremely fortunate to have Lori taking over this role at such a critical time for the ECP. Lori brings the experience and leadership skills to drive us forward, and on behalf of the entire team, we welcome Lori to this important project role and we look forward to her leadership and contributions as she assumes the role of ECP Deputy Director.
Recent Accomplishments and Project Highlights
Along with this exciting news of announcing our new ECP Deputy Director, I recently sat for a video interview with Mike Bernhardt our ECP Communications Lead to talk about some of our most recent accomplishments. During that conversation we discussed the newest ECP Co-Design Center, ExaLearn, which is focused on Machine Learning (ML) Technologies and being led by Frank Alexander at Brookhaven National Laboratory. ExaLearn is a timely announcement and is a collaboration initially consisting of experts from eight multipurpose DOE labs.
We also covered the recently published ECP Software Technology Capability Assessment Report—this is an important document that will serve both our own ECP research community as well as the broader HPC community. Linking on the Capability Assessment Report on the ECP public website will give our followers a good overview of the document, an overview explanation from our Software Technology Director, Mike Heroux, and we’ve provided a link for downloading the report.
Another item we discussed is a recent highlight on the ExaSMR project. SMR stands for small modular reactor. This is a project aimed at high-fidelity modeling of coupled neutronics and fluid dynamics to create virtual experimental datasets for SMRs under varying operational scenarios. This capability will help to validate fundamental design parameters including the turbulent mixing conditions necessary for natural circulation and steady-state critical heat flux margins between the moderator and fuel. It will also provide validation for low-order engineering simulations and reduce conservative operational margins resulting in higher updates and longer fuel cycles. The ExaSMR product can be thought of a virtual test reactor for advanced designs via experimental-quality simulations of reactor behavior. In addition to the highlight document, ECP’s Scott Gibson sat down with the ExaSMR principal investigator, Steven Hamilton (ORNL), to discuss this highlight in more detail.
We wrapped up by chatting about the key role performance measurement plays for a project such as ECP, and we addressed ECP’s efforts in support of software deployment as it relates to the Hardware and Integration focus of ECP.
We hope you enjoy this video update and we encourage you to send us your thoughts on our newsletter and ECP Communications overall, as well as ideas on topics you’d like to see covered in the future.
We’re excited to see such strong momentum, and we sincerely appreciate the support of our sponsors, collaborators, and followers throughout the HPC community.
I look forward to meeting many of you at upcoming events during the second half of this year.
*The exascale ecosystem encompasses exascale computing systems, high-end data capabilities, efficient software at scale, libraries, tools, and other capabilities. This information is stated in the US Department of Energy document Crosscut Report, an Office of Science review sponsored by Advanced Scientific Computing Research, Basic Energy Sciences, Biological and Environmental Research, Fusion Energy Sciences, High Energy Physics, Nuclear Physics, March 9–10, 2017.
Argonne National Laboratory
The High-Tech Evolution of Scientific Computing
Realizing the promise of exascale computing, the Argonne Leadership Computing Facility is developing the framework by which to harness this immense computing power to an advanced combination of simulation, data analysis, and machine learning. This effort will undoubtedly reframe the way science is conducted, and do so on a global scale.
Lawrence Berkeley National Laboratory
Educating for Exascale: Berkeley Lab Hosts Summer School for Next Generation of Computational Chemists
Some 25 graduate and post-graduate students recently spent four intense days preparing for the next generation of parallel supercomputers and exascale at the Parallel Computing in Molecular Sciences (ParCompMolSci) Summer School and Workshop hosted by Berkeley Lab.
Held August 6–9 at the Brower Center in downtown Berkeley, the event aimed to “prepare the next generation of computational molecular scientists to use new parallel hardware platforms, such as the [US Department of Energy’s (DOE’s)] exascale computer arriving in 2021,” said Berkeley Lab Senior Scientist Bert de Jong, an organizer of the summer school and one of the scientists behind the DOE Exascale Computing Project’s NWChemEx effort. NWChemEx belongs to the less talked about, but equally necessary half of building exascale systems: software.
Video Highlight: SLATE Project Aims to Provide Basic Dense Matrix Operations for Exascale
The objective of the Software for Linear Algebra Targeting Exascale (SLATE) project is to provide basic dense matrix operations in support of ECP’s efforts to build a capable exascale computing ecosystem. Jakub Kurzak of the University of Tennessee and SLATE shares insights about the project.
Podcast Episode 21: The Flexible Computational Science Infrastructure (FleCSI) Project: Supporting Multiphysics Application Development
The Flexible Computational Science Infrastructure (FleCSI) project provides a framework to support multiphysics application development. The leader of FleCSI, Ben Bergen of Los Alamos National Laboratory, is guest on Let’s Talk Exascale.
Kathy Yelick Testifies on ‘Big Data Challenges and Advanced Computing Solutions’
Kathy Yelick, Associate Laboratory Director for Computing Sciences at Berkeley Lab and Principal Investigator for ECP’s ExaBiome project, was one of four witnesses testifying before the U.S. House of Representatives’ Committee on Science, Space and Technology on July 12. The discussion focused on on big-data challenges and advanced computing solutions.
Scaling the Unknown: The CEED Co-Design Center
DEIXIS: Computational Science at the National Laboratories interviews Tzanio Kolev, director of the Center for Efficient Exascale Discretizations (CEED), one of five co-design centers in ECP about the quest for exascale and the role of CEED in the effort.
Podcast Episode 20: Influencing the Evolution of the MPI Standard for Optimal Exascale Scientific Applications
The Exascale MPI project is directed at affecting the evolution of the de facto programming model for parallel computing, message passing interface, so that optimal exascale scientific applications can be developed. Project Principal Investigator Pavan Balaji and Co-Principal Investigator Ken Raffenetti of Argonne National Laboratory share insights in the Let’s Talk Exascale podcast.
New Simulations Break Down Potential Impact of a Major Quake by Building Location and Size
With unprecedented resolution, scientists and engineers are simulating precisely how a large-magnitude earthquake along the Hayward Fault would affect different locations and buildings across the San Francisco Bay Area.
US Once Again Leads the TOP500 Supercomputer List with ORNL’s Summit
The US Department of Energy’s Oak Ridge National Laboratory is the home to the new Summit supercomputer. Summit took the number one spot on the TOP500 list with a performance of 122.3 petaflops on High Performance Linpack (HPL). Summit also captured the first position, and DOE’s other new system, Sierra at Lawrence Livermore National Laboratory, won the second, in the High-Performance Conjugate Gradient (HPCG) Benchmark results, which provide an alternative metric for assessing supercomputer performance and is meant to complement the HPL measurement. Summit achieved 2.93 HPCG-petaflops and Sierra delivered 1.80 HPCG-petaflops.
GPU Hackathon 2018
The Computational Science Initiative at Brookhaven National Laboratory (BNL) will host its 2nd GPU Hackathon, dubbed “Brookathon 2018,” on September 17–21 at BNL.