Oak Ridge Computing Director Nichols on What Will Matter at Exascale

Featured ECP Lab Partner Update

Oak Ridge National Laboratory (ORNL) in Tennessee is the largest US Department of Energy (DOE) science and energy lab and has been home to some of the world’s fastest supercomputers for over a generation. Currently, the 27-petaflop Titan system at ORNL is the nation’s most powerful supercomputer for open science research, and in 2018, the lab will stand up Summit, a system that will be at least five times more powerful than Titan.

“For years, our vision has been to provide the world’s most powerful open resources for computing, simulation, and data analytics at any scale,” said Jeff Nichols, associate laboratory director (ALD) of Computing and Computational Sciences.

And that means at exascale too.

One of the six DOE partner labs on the Exascale Computing Project (ECP), ORNL is also home to the ECP Project Office—where team members are drawing on years of coordinating complex, multi-institutional projects in computing to lead overall project management; integration, business, project control and risk, information technology, and quality management; procurement; and communications.

“If you look at the focus areas of ECP—applications, systems, software, and hardware developments—we already have activities in all of these areas at Oak Ridge,” Nichols said.

Nichols is a member of ECP’s Lab Operations Task Force, which includes computing directors from each of the partner labs. By overseeing the computational workforce and computing resources at each of the labs, the task force supports the ECP Board of Directors, which is composed of the DOE lab directors.

To develop an exascale system, ECP will need to enlist existing resources and expertise. “We help deliver the resources and capabilities at our labs to support ECP,” he said.

Hundreds of ECP members are computational and computer science and engineering staff from the DOE labs.

“Each lab brings its particular strength,” Nichols said, citing ORNL’s leadership in scientific applications and accelerated computing.

ORNL’s Deputy ALD for Computing and Computational Sciences Doug Kothe is overseeing the development of 20 to 30 exascale application projects in his role as Application Development focus area director. The exascale-ready applications will range from industry- and infrastructure-geared applications such as advanced manufacturing and power grid planning to fundamental science applications such as the chemical and materials sciences.

“The way we do science today has fundamentally changed over the past couple of decades,” Nichols said. “Twenty years ago, computers became fast enough that computation became predictive. Computational science could validate experiments and theories, could show people what to go look for in the lab. Five to 10 years ago, analytics came on the scene, allowing us to tackle data in a more cognitive way. Analytics is where you’re going to see a huge impact at exascale.”

In 2009, ORNL leadership decided to pursue a supercomputing architecture with both CPUs and graphics processing units (GPUs) to efficiently delegate computational tasks to solve complex problems faster. Titan was the first system of its magnitude to use GPUs, and Summit will likewise exploit a hybrid architecture.

“We’re on a well-defined path to be able to deploy exascale in the 2021 timeframe,” Nichols said. “At Oak Ridge, we’ve focused on accelerated architectures, which originally targeted efficiency by delivering more flops per watt, but accelerators like GPUs are also working well for analytics like A.I. [artificial intelligence] and machine learning. You want an exascale computer to be powerful enough in both compute- and data-intensive work to tackle the grand challenge problems.”