A Conversation with one of ECP’s Principal Investigators—Thom Dunning
Thom Dunning is a research professor in the Department of Chemistry at the University of Washington (UW) and a Battelle Fellow at the Northwest Institute for Advanced Computing, a joint endeavor of UW and the Pacific Northwest National Laboratory (PNNL). He spoke with Mike Bernhardt of ECP Communications at the ECP Second Annual Meeting in Knoxville, Tennessee, in February 2018. This is an edited transcript of that conversation.
Thom, your research with ECP is focused on open-source high-performance computational chemistry and, specifically, NWChem and the design and implementation of NWChemEx, the exascale-ready version of the software. Will you share with our listeners who might not be familiar with this package what it’s all about, who uses it, and why it’s so important?
Most people don’t really think about or realize that many of the phenomena that they see are actually controlled by molecular processes. This is also true for many of the energy challenges that the US Department of Energy [DOE] faces. Addressing these challenges requires an understanding of the underlying molecular processes. A simple example of such a challenge is the performance of automobile engines. The combustion of fuels produces the energy, and combustion is controlled by molecular processes. Northwest Chem can be used to predict the structures, energetics, and a number of other properties of the molecules that are important in understanding the combustion of fuel in an automobile engine.
But Northwest Chem can be used to study many other molecular systems also. Methane hydrates, which store methane at very high pressure under the sea floor, are simply big molecules. The boron equivalents of the Nobel Prize–winning carbon fullerenes are also molecular entities. Transport of nuclides in the soil is controlled by molecular processes. Northwest Chem can be used to study all of these challenges as well as aid the interpretation of data that is produced by many of DOE’s big user facilities.
Northwest Chem implements a broad range of molecular modeling methods, but most important in all of this is the fact that it implements very high-fidelity computation models that can predict a broad range of important molecular properties with an accuracy that rivals that obtained from experiment. In fact, our accuracy sometimes exceeds that available from experiment. This capability is extremely important when the phenomena of interest—the molecular processes of interest—are either difficult or even impossible to study in the laboratory. These are some of the reasons that make Northwest Chem an integral part of the research and development programs in the US Department of Energy.
Let’s clear up something for our listeners. We always refer to it as NWChem, and you refer to it as Northwest Chem. I’m assuming that’s what you prefer?
Either works. I think half of the people refer to it as NWChem; the other half refer to it as Northwest Chem.
Why is it important that we prepare Northwest Chem for exascale?
Although Northwest Chem can model many molecular systems, there are many very large complex systems that lie at the heart of the challenges that the Department of Energy faces. Examples of these are the scientific challenges that we identified as driving problems in the Northwest ChemEx project. Number one is the molecular processes that control the response of plants to stress, especially to drought. Our lack of this type of understanding prevents us from designing plants for the production of biomass that can be grown on lands that are unsuitable for the production of food. Number two is the molecular processes that control the production of biofuels from biomass. Once you produce biomass-derived chemical, efficient catalytic processes are needed to produce the biofuels. In this case, our lack of understanding of the molecular aspects of these catalytic processes prevents us from developing more energy-efficient processes for the conversion of biomass products into biofuels.
Both of these problems require an ability to model active sites, which is where all of the chemistry is taking place, on the order of 1,000 atoms. And, we need to be able to do this at very high accuracy, while embedding the active site in a much larger environment of 100,000 or more atoms. We simply can’t do that with the computers that we have today. But tackling problems like that is not just a matter of having larger computers. The software must be designed to take advantage of these much more powerful computers, especially as we move to the exascale. To take advantage of exascale computers, we are redesigning Northwest Chem to take advantage of the characteristics that define exascale computers: extreme levels of concurrency and very deep levels in the memory hierarchy as well as many other features present in exascale computers. Although Northwest Chem runs on the petascale computers of the day, it really has to be redesigned and reimplemented for the exascale computers of tomorrow.
This sounds like a pretty big task. Do you have many collaborators on this project?
You’re correct, Mike. Northwest Chem is a very large molecular modeling package. It’s about 4 million lines of code, which by current scientific standards is a large scientific code. So to tackle this task of redesigning and re-implementing Northwest Chem, we assembled a team of computational chemists, computer scientists, and applied mathematicians from six national laboratories and one university—Ames Laboratory, Argonne National Laboratory, Brookhaven National Laboratory, Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory, Pacific Northwest National Laboratory, and, then finally, Virginia Tech.
We do have one ace up in our sleeve in this effort. We’re currently developing a domain-specific compiler that will actually generate a lot of the code that goes into Northwest ChemEx. We call this compiler TAMM, for tensor algebra for many-body methods, and it actually builds on our earlier work on the tensor contraction engine, which was referred to as TCE. We developed TCE in the early 2000s to help us write the code involved in implementing the extremely complicated formulas for some of the high-level many-body methods that are implemented in Northwest Chem. We simply could not do this by hand—we would make too many mistakes and we would spend too much time debugging the code. So, we developed software to do this for us. Not only does TCE write the code, but it also actually enables us to retarget Northwest Chem for different computer architectures because TCE also knows about computers and how the code can be laid out on different kinds of architectures. TAMM will be even more effective and more efficient than TCE. It will also be much smarter.
Have you and your team been using any of the computer time made possible through the ECP system allocations?
We make heavy use of that the available computer time. To date, our work is focused on the redesign of Northwest Chem, but we’ve also explored a number of alternate strategies for implementing the overall redesign as well as the redesign of the algorithms, and this work required access to the ECP computing allocations. We’ve also focused on development of initial implementation of TAMM, the compiler that I mentioned earlier. The result of this work is very encouraging. We’re already seeing a significant improvement over TCE. To date, we’ve gotten over a 3x performance improvement. During the rest of the year, we’ll be making heavy use of the computer allocations in ECP because we’ll be focusing on implementation of Northwest ChemEx. As this work proceeds, we’ll not only be doing a series of calculations to understand the performance of the new code, but we’ll also be benchmarking Northwest ChemEx against Northwest Chem. This is something that’s going to become very challenging as the size of the molecules that Northwest ChemEx can tackle becomes much, much larger than what is feasible with Northwest Chem.
It sounds like you and your team are moving ahead pretty quickly, but you’re overcoming some anticipated barriers and have significant milestones that you’re already mentioning. Other than the milestones—and not to downplay those—how do you measure your progress?
Measuring progress in a project like this is always somewhat of a challenge. I gave you one example of that where we developed a TAMM compiler that we could compare with the TCE compiler and measure the performance increase. Most of our challenges actually lie in the future. Now that we have the design effort completed and a wealth of ideas on how to implement the new code, much of this next year we will be focusing on the development of the code. And there’s where the rubber meets the road. We’re going to find answers to questions like: Have we effectively removed the bottlenecks to scalability with the redesign of Northwest Chem? Are the new algorithms performing well on Oak Ridge’s Summit, Argonne’s Theta, and NERSC’s [National Energy Research Scientific Computing Center’s] Cori? How well does the code generated by TAMM perform on these new computer systems? What’s the level of performance portability? There is a whole series of questions that we’ll begin to answer during this coming year.
That is fantastic. Is there anything that you’d like our listeners to know about the Northwest Chem effort that may be unclear to them when they hear about this project?
Northwest Chem is widely used in the computational chemistry community. There have been over 70,000 downloads of Northwest Chem over the past decade or so, and it’s been referenced in some 3,000 articles. So the scientists involved have a good understanding of exactly what Northwest Chem does. Now our challenge is reproducing that capability in Northwest ChemEx.
There is something that’s not specific to our project that I think is worth stressing at this point, and that highlights the importance of the investments in software and computing systems, application software, and computing systems software that are being made in the ECP along with the investments that are being made in the hardware. This is a hallmark of ECP that distinguishes it from almost all of the other major high-performance computing activities around the world. Many other countries have realized, to their chagrin, that just building a big computer doesn’t have value if you don’t have software that runs on that computer that does useful work. ECP is religiously avoiding falling into that trap and is making substantial investments, both in the software as well as in the hardware.