Distinguished Argonne Fellow
ECP ANNOUNCEMENT MARKS AN IMPORTANT MILESTONE
Community interest in the Exascale Computing Project has grown significantly over the past few months coincident with our leadership team’s steady engagement with hundreds of government, academic, and private sector researchers throughout the nation to build the foundation of an exascale ecosystem.
The project is well underway with significant efforts in the development of exascale-ready applications and a robust exascale software stack, as well as critical integration efforts involving the ECP-funded co-design centers.
With this newsletter, I am pleased to announce we have now reached another important milestone with the funding of key hardware technology research foundational to our exascale imperative.
On June 15, Secretary of Energy Rick Perry announced research funding awards to six leading US technology companies as part of the ECP PathForward program. These awards mark an important milestone for the Exascale Computing Project and for the nation’s exascale imperative overall. We are making six awards totaling $258 million.
The funding is allocated over a 3-year contract period, and the companies receiving the awards will provide additional funding amounting to at least 40 percent of their total project cost. This brings the total investment for the PathForward program to at least $430 million.
PathForward is critical to moving hardware technology forward at an accelerated pace—beyond what the vendor or computer manufacturer roadmaps currently have scheduled.
The ECP employs a co-design process critical to the project’s success, bringing together a wide range of sources to address the four key challenges of exascale:
- Memory and storage
- Energy consumption
The work funded under the PathForward program is strategically aligned to address those key challenges through:
- development of innovative memory architectures
- higher-speed interconnects
- improved reliability systems
- approaches for increasing computing power without prohibitive increases in energy demand
It is essential that private industry play a role in this work. The PathForward awards provide the framework for such research and enable industry participation in co-design activities with the ECP application development teams, the co-design centers, and the software technology projects.
The following US technology companies, listed in alphabetical order, are the PathForward award recipients:
- Advanced Micro Devices (AMD)
- Cray Inc. (CRAY)
- Hewlett Packard Enterprise (HPE)
- International Business Machines (IBM)
- Intel Corp. (Intel)
- NVIDIA Corp. (NVIDIA)
Lawrence Livermore National Laboratory issued the PathForward request for proposals June 16, 2016, and 14 vendors submitted responses on July 18, 2016. In the following months many members of the ECP team evaluated the proposals through a formal selection committee and LLNL procurement staff, resulting in the selection of six of the responses for funding. I take this opportunity to thank everyone who took part in these demanding tasks for their hard work and high-quality contributions to the successful outcome of this part of the Exascale Computing Project.
We will update you on many of the progress achievements in all ECP-funded research areas as we move ahead.
The PathForward news release can be found here.
We hope you will continue to follow the progress of the Exascale Computing Project.
Director, the Exascale Computing Project
Making It All Work: Developing a Software Stack for Exascale
(This is an introductory-level article targeted for readers somewhat new to HPC and exascale.)
Without the invisible infrastructure called the software stack, even the world’s fastest computer wouldn’t compute much of anything. Sitting between the hardware and the applications that users interact with, the software stack is the computer’s plumbing, power grid, and communications network combined. It makes the whole system usable. “Without it,” said Rajeev Thakur, “you literally have nothing.”
Thakur is the director of the Software Technology focus area for the Exascale Computing Project (ECP), an effort by two US Department of Energy organizations to develop computing systems that are at least 50 times faster than the most powerful supercomputers in use today. Thakur is overseeing the creation of the software stack that will undergird the wide range of applications that will run on the new systems. It is a monumental task. Just 5 months into the effort, it already encompasses several dozen projects involving hundreds of researchers at research and academic organizations throughout the country.
The brain of a desktop computer is a single CPU; the exascale system will have more than a million CPUs. To run on the exascale system, applications must be written in parallel; that is, problems must be divided into millions of pieces that are solved separately. Then the resulting data must be exchanged among the million processor cores to get the final result. The software stack enables developers to write or adapt their applications to run efficiently on those millions of cores and orchestrates communication among them, while minimizing energy consumption and ensuring that the system can recover from any failures that occur. And it provides the means for application developers to write and read their data, to analyze the data that is produced, and to visualize the results.
The ECP software technology effort is developing all of this, some of it from scratch and some by modifying and enhancing existing software, such as large libraries of mathematical computations. All of it must be usable by any application, from a program that helps design a nuclear reactor to one that models the life cycle of a star.
The software stack developers work closely with the applications developers, trying to accommodate their sometimes disparate requirements. Applications aimed at machine learning, for example, are extremely data intensive. Others might use an atypical programming language or a distinctive format for their data. “That’s one of the challenges,” Thakur said. “We have dozens of applications right now. They use different languages and have different ways of expressing their problem in a way that can run across millions of processors.” The underpinning software must be designed so that as many of the applications as possible work optimally on the new system.
Speed is a paramount concern. “The problems are not so simple that you can just divide them among the million processors and they will merrily do their own thing,” Thakur said. “And programs tend to run slower unless you really pay attention to what’s going on.” The processors need to ask one another for information. And as they do, other processors may be doing the supercomputing equivalent of “twiddling their thumbs.” One processor waiting 10 seconds for an answer being computed by another sets up a chain reaction that might slow the whole computation by an hour—not exactly high-performance computing. So devising a software version of an orchestra conductor to manage communication among the processors efficiently is a crucial part of the software stack project.
Managing the jobs of concurrent users is also a priority. As with any supercomputer, time on the exascale system will be shared by many users, some of them wanting to run their applications for days or weeks at a time. They will have to partition their problems into chunks of a few hours each, to run simultaneously with jobs of other users. Job scheduling software that comes built into the system allocates groups of processors and chunks of time. But it, too, will have to be adapted for the massively larger scale. So, too, will tools that diagnose and repair problems.
“All these things have to be sorted out,” Thakur said. “The scale makes it complicated. And we don’t have a system that large to test things on right now.” Indeed, no such system exists yet, the hardware is changing, and a final vendor or possibly multiple vendors to build the first exascale systems have not yet been selected.
“The computer vendors share their roadmaps so we know their plans for the future,” Thakur said, “but we have to do things in anticipation of what may be coming. So right now we are writing software that can be used for any of these potential systems. It gets very challenging.”
But if all goes according to plan, by 2021 exascale computing should be a reality. The plan is for these superfast systems to run applications addressing critical research areas such as oil and gas exploration, aerospace engineering, pharmaceutical design, and basic science, among others. And underneath, making it all work, will be a vast software infrastructure: the software stack.
PathForward Supports the Pursuit of Designs to Meet ECP Challenges
With the June 15 announcement of the PathForward award recipients, the Hardware Technology focus area joins the previously announced Application Development and Software Technology focus area projects as the third leg of the Exascale Computing Project’s (ECP’s) research and development (R&D) investment. The vendor-based PathForward projects provide a conduit for co-design collaborations with the Application Development and Software Technology focus area projects.
The portfolio of PathForward projects provides the ECP with new opportunities to leverage vendor momentum in architecture R&D from the US Department of Energy’s (DOE’s) previous pre-ECP vendor investments in FastForward (for node-level designs) and in DesignForward (for interconnect and system-level designs); support holistic co-design through the ECP’s broad portfolio of Application Development and Software Technology projects; provide formal support for technical reviews of PathForward deliverables based on lessons learned from the FastForward and DesignForward experiences; take advantage of PathForward’s bridging of the gap between open-ended architecture R&D and advanced product development focused on the delivery of first-of-a-kind, capable exascale systems; and support interagency alignment and coordination to further the objective of the National Strategic Computing Initiative to increase the coherence of the technology base for both high-performance and data analytic computing.
The ECP would like to acknowledge the efforts of more than 60 technical staff and managers from six national laboratories—Argonne, Los Alamos, Lawrence Berkeley, Lawrence Livermore, Oak Ridge, and Sandia—that culminated in these PathForward awards. Lawrence Livermore National Laboratory (LLNL) deserves special recognition for stepping up to lead the PathForward procurement process.
In early 2016 a team from the six labs began to draft the technical specifications to represent architectural solutions to the ECP’s technical challenges, although the Application Development and Software Technology focus area projects had not yet been selected. The PathForward procurement process was on a fast track and could not wait for the determination of the ECP’s portfolio for these other focus areas projects. The last 18 months consisted of a rapid sequence of events:
- March 2016—release of the PathForward draft technical specifications
- April 2016—PathForward vendor information meeting
- June 2016—release of the PathForward request for proposals
- August 2016—technical evaluation of PathForward proposals
- August–October 2016—buying team prioritization
- November–December 2016—statement-of-work negotiations with PathForward vendors
- January–May 2017—submission of contracts for LLNL and DOE reviews
The announcement of the PathForward funding can be found here.
Be sure to subscribe via the ECP website to receive our email updates.
Collaboration Software, Service Desk, Internal Communications Hub Online
The ECP Project Office ramps up to support the growing ECP research community
The Exascale Computing Project (ECP) Office provides overall project management and business services—such as financial, information technology (IT), and procurement services—for the hundreds of researchers and staff working on ECP projects. Located at national laboratories and universities throughout the country, team members must collaborate on complex projects with aggressive deadlines. Last November, the Project Office began providing team members accounts for Atlassian’s JIRA, a project tracking software application, and Confluence, a team collaboration software package.
ECP collaborators have since created more than 800 user accounts, and team members are using the complementary software tools for day-to-day project tasks and monthly reporting. In Confluence, team members can also review ECP-related news, events, blog postings, and useful links at the Internal Communications Hub.
Recently, the Project Office IT team introduced the ECP Service Desk to provide a main point of contact for ECP-related questions and issues. Requests such as opening a new account or reporting a system problem can be submitted through the ECP Support portal, available on the internal Service Desk site. The ECP Foreign Travel Request system is also accessible through the Service Desk. However, team members should note that they must also request travel through their home institution’s travel system for official approval.
Further, the Project Office is providing team members a video chat subscription through HipChat and the document preparation software LaTeX. For more information about any of these resources, team members can submit a request at the Support portal, email the IT team at firstname.lastname@example.org, or call 802-ECP-DESK.
Lab Partner Updates
Be Sure to Visit Our Partner Lab Websites
Remember that two of the ECP partner labs, Lawrence Berkeley National Laboratory and Los Alamos National Laboratory, have launched new exascale landing pages.
Berkeley Lab created a website highlighting the lab’s participation in the Exascale Computing Project. The site also describes other Berkeley Lab projects helping to prepare the DOE research community for computing in the exascale era.
LANL has a dedicated website to spotlight the lab’s work in many areas of exascale research, including its role as home of the ECP Co-Design Center for Particle-Based Methods: From Quantum to Classical, Molecular to Cosmological.
The other ECP core partner national laboratories—Argonne, Lawrence Livermore, Oak Ridge, and Sandia—are working to bring their exascale landing pages online.
Featured lab update
We’re pleased to spotlight a discussion with Jeff Nichols, associate laboratory director of Computing and Computational Sciences at Oak Ridge National Laboratory (ORNL) and a member of the ECP’s Lab Operations Task Force. Jeff provides a overview of ORNL’s role in collaboration with the ECP.
In Case You Missed It
SPECIAL TRAINING UPDATE
INTERMEDIATE GIT WEBINAR
The ECP IDEAS Productivity project, in partnership with several DOE Computing Facilities is resuming the webinar series on Best Practices for HPC Software Developers, which we began last year. The next webinar covers intermediate git on July 12. For more information or to register, please visit: https://www.exascaleproject.org/event/intermediate-git/.
On behalf of the ECP OpenMP project, you are invited to participate in an OpenMP tutorial on Wednesday, June 28 at 1:00 PM EDT. For more information or to register, please visit
NEWS WORTH NOTING
Department of Energy Awards Six Research Contracts Totaling $258 Million
U.S. Secretary of Energy Rick Perry announced that six leading U.S. technology companies will receive funding from the Department of Energy’s Exascale Computing Project (ECP) as part of its PathForward program, accelerating research necessary to deploy the nation’s first exascale supercomputers.
Read about PathForward.
An Exascale Software Discussion with HPC Veteran Dongarra
Jack Dongarra is perhaps best known for his development of the LINPACK benchmark application, which is used to evaluate high-performance computing (HPC) performance and rank supercomputers in the international Top500 list. Dongarra is also principal investigator on three of the 35 software development proposals funded for the first year of ECP.
Application Director Kothe Outlines Goals and Challenges
HPCwire talked to Doug Kothe, director of the ECP Application Development focus area, who discussed the goals of application development, including the integration of data analytics and modeling and simulation.
Messina Presents on PathForward at the HPC User Forum
HPCwire covered ECP Director Paul Messina’s presentation at the HPC User Forum, hosted by Hyperion Research in April. Messina discussed the status of PathForward contracts and other ECP updates.
Presentation on Small Modular Reactor Simulations at Exascale
At the HPC User Forum in April, Tom Evans of Oak Ridge National Laboratory presented “Coupled Monte Carlo Neutronics and Fluid Flow Simulation of Small Modular Reactors (ExaSMR)” on application development for simulating the advanced nuclear reactor.