By Scott Gibson
Hi, and welcome. This is where we explore the efforts of the Department of Energy’s (DOE’s) Exascale Computing Project (ECP)—from the development challenges and achievements to the ultimate expected impact of exascale computing on society.
An ECP subproject called QMCPACK aims to find, predict, and control materials from first principles with predictive accuracy.
In this episode, we’ll get the big picture of QMCPACK from its principle investigator, Paul Kent, a distinguished Research & Development staff member at Oak Ridge National Laboratory.
Our topics: the broad context for QMCPACK, the essential need for computer simulations in materials research, the meaning and importance of making predictions about materials from first principles, and more.
Gibson: We want to start with the broad context or scientific basis for the QMCPACK subproject of ECP. Paul, what is a straightforward definition—I’ll say layperson definition—of quantum mechanics, and why is it being applied to materials?
Kent: So, one of the goals of this project that we’re really excited about is providing the capability for really accurate and reliable predictions of the fundamental properties of materials. The sorts of properties we’re interested in, things like, what is the structure of the material? How easy are they to form? Will they decompose? Do they conduct electricity? And things like, can we use them for energy storage? And lots of other properties.
And the thing is, to compute those properties, we need to work at the level of atoms and electrons. The electrons are particularly important for us since they form the bonds that give rise to the material structure and then the majority of the properties that we’re interested in. It’s really very simple at that point—to work at that scale and treat the behaviors of electrons, we have to do quantum mechanics. So, quantum mechanics are the rules of the road that apply at that scale and then, importantly, to the behavior of electrons.
I should probably mention that besides the methods we’re using, applying quantum mechanics to materials is actually very established and proven to be very useful. So, despite all the angst about how to interpret quantum mechanics—for our purposes, it’s really a practical and useful method. For example, we could look at a battery electrode or a catalyst and learn something about the fundamentals of how they’re working and also get some insight into how they might be improved. So, these are methods that can really help us improve a lot of technology. For example, I’m looking at my laptop currently and maybe want a better display or a longer-lasting battery, or just a lighter laptop. And these are the things we can certainly help with. But, of course, there are bigger societal problems, things like long duration and reliable energy storage, how we generate energy, and all of these have got some element of materials or chemistry to them.
Gibson: Why are computer simulations essential in designing, optimizing, and understanding the properties of materials that could improve our use of energy?
Kent: Well, very simply, we are increasingly having the ability to get to properties that are difficult to get to from experiment. So, we can act as a complement in many cases. And, of course, a good day for a materials theorist is identifying a wholly new material or property that’s been overlooked by our experimental colleagues and doing so ahead of them. So, we’re very much working in concert with our experimental colleagues, but we want to really be able to guide the work and speed it up, make it much more efficient.
Gibson: Could you help the science-interested person understand what’s meant by “using robust quantum mechanics–based methods to describe materials truly from first principles?”
Kent: When it comes to making predictions for materials, there’s an obvious problem if your method needs some input from an experiment because then you’re completely stuck if you’re trying to predict a new material since you just don’t have the data that you need. Of course, more realistically, there’s also a problem there if the properties you’re interested in are very costly or time consuming to get. And the issue that we run into is that for the accuracies and the range of properties that we’re interested in today, a lot of the standard methods need that input.
So, doing quantum mechanics and first principles means doing so with a minimum of input from experiments, so a minimum of empiricism, and ideally, none beyond the specific composition of the material. And then such a scheme would, in principle, be both much more reliable but also unbiased. So, we can much more confidently explain how a system is working. For example, this would be particularly helpful if there are multiple competing explanations for what is going on in a particular material.
Gibson: What is special about the Monte Carlo family of approaches?
Kent: This project is dedicated to quantum Monte Carlo (QMC) approaches, and I should say a little bit about the equations we’re solving and what normally needs to be done. So, when it comes to solving Schrodinger’s equation, which is the equation that we need to solve to do quantum mechanics, all the established approaches trade off making some sometimes very large approximations for reduced computational cost.
Of course, we’re all battling the same equations, and where QMC comes in is that it has the advantage that besides making only a very few approximations, they’re all very small today, so this is a very accurate approach. But the key advantage is that we actually know what those approximations are, and they’re the kind that we can test and make smaller—at least if we have enough computational power. That’s where exascale comes in. And so, that means that we are increasingly having the possibility to have what I refer to as a pay more, get more approach. At least in principle, that’s the kind of deal that we’d like to make. Use more computer power and get steadily more accurate results.
And so, this is a missing capability that we just don’t have in the field right now and one area of research that I think is really overdue. Of course, it’s going to be computationally costly—that’s the trade-off—but of course, this is where exascale has its role. That’s important because it’s going to put us in the position of being able to start providing guarantees on our predictions along the lines of, this result is good to three digits, and that’s something that we can’t do right now.
Gibson: In elevator-speech fashion, will you tell us what the QMCPACK team is doing?
Kent: So, first of all, I should say that we are team of a bit over a dozen people all contributing varying amounts. We have team members at Livermore, Sandia, Argonne, and Oak Ridge, and we also have a university partner at North Carolina State.
I would say there’s two main areas we’re focused on. First, delivering what’s called a performance-portable application able to run 50× faster relative to where we started and we’ve heard about from other projects. But also, we need to make sure that we can run on everything from a student’s laptop all the way to the largest machines—so, Summit, Perlmutter, Frontier, Polaris, and Aurora. And we need to be able to do this with minimal changes between these different platforms because that’s the only manageable and sustainable way that we can have a general science code. So, that’s one area.
The second has been to work really closely with a lot of the other ECP projects, particularly the SOLLVE project, as well as the vendors and the computational facilities that help mature the software stack—so, things like the compilers and numerical libraries. And I can’t stress enough just how much this needs to be done. There’s been a lot of progress, of course. We still need more; of course, the project has still got some more runway to work on that. And this is something that ECP has really helped facilitate, and it wouldn’t have happened otherwise.
Gibson: What does the QMCPACK team plan to accomplish with respect to making predictions about materials that can more effectively inform experimentation? I know that’s a very important aspect of your work.
Kent: I should mention that the science is done on other funded projects. But to mention some of the really big advantages that are really going to help with this—for example, recently we added something called the spin orbit interaction, and this is important because it’s needed to tackle elements in, arguably, a third or maybe half of the periodic table. So, now that we have that, we can tackle many more topical materials. For example, so-called quantum materials that have exotic properties that we’re very interested in but are also very delicate to model.
And for example, there’s a recent review article in Nature that calls for the ability to get the magnetic structure of those materials from first principles much more reliably, and that’s something that we’re looking forward to doing soon.
A thing we can do now is find the structure—where the atoms go—for at least simple materials, with the method. This has been a long-term problem in the field, and we now have the algorithms that are good for that, at least for topical 2D nano materials, for example. And again, this is a place where we know input from more accurate methods is needed. There’s a call from the community, and we’ll be able to inform experiments much more than we were able to in the past. And, of course, to get there, we need improvements in the method, as well as the improvement in computational power.
Gibson: How will exascale computing broaden the range of what will be possible with QMC modeling and simulation?
Kent: Well, I think the big thing is increasing the variety of systems as well as their complexity, and that helps us get closer to experiment for a much wider range of materials. So, that’s very important—as I just mentioned, materials with elements from across the periodic table, for example.
Another way that we think we’ll be able to help is that we think soon people will start using data generated by the method as opposed to running the calculations themselves. And while this is still a little bit off, we’re thinking, to produce databases of materials properties—at least simple material properties—just as has been done with cheaper and more approximate approaches. So, this could lead to directly improving another method. Or, these days, it’s the sort of method that we might feed into a machine learning or artificial intelligence-based scheme to do the upscaling, as we call it. So, there’s lots of exciting developments there that we can connect to.
Gibson: You have a great team of very accomplished people addressing some difficult challenges. ECP has been going for about 6 years or so. Can you highlight for us some of the obstacles QMCPACK has had to overcome?
Kent: So, I think there’s two main sets of obstacles that we’ve had to deal with. First of all, we had to redesign the application, and I think the importance of application design isn’t talked about enough. And then secondly, all the work with the software stack.
To tackle the first issue, even at the beginning, we were the only [QMC] code that had production GPU capability—so, a specific implementation. But it targeted NVIDIA GPUs only for CUDA, and it wasn’t a general implementation. So, we had to learn how to generalize this. We needed a new design and methods that were portable, but also along the way we made improvements to the algorithms that are actually higher performing now in some cases.
One thing that we had to work out how to treat, which hadn’t been done before, was how to take care of all the data movement so that we could always run on the CPUs, for example, if a particular GPU implementation wasn’t available. And this is a really important feature for user productivity. For doing science, the application should always run; the question is then, what speed? So, that’s one area.
And then the second area has mainly been working with the compilers. In particular, the so-called OpenMP target offload capability for GPUs from C++ that QMCPACK is written in. And there, the main problem has been a lack of maturity in the sense that there is a specification that was several years ahead of the implementation.
We’ve been providing an awful lot of feedback. I hope we’ve always been polite but also pointed and persistent, since we found lots of gaps, bugs, and performance issues with just about every single compiler, whether open source or vendor. There have also been problems with numerical libraries, and so we’ve had our hands pretty full with this second topic area.
The good news that I would say is that a lot of these problems have been fixed now, and of course once they’re fixed, they’re hopefully fixed for everybody in the ecosystem. So, if we trip a raised condition in a run time, which was done recently, once that’s gone, it’s hopefully gone for good.
I also mentioned OpenMP, but of course there’s other technologies out there. And one thing about the design and where this meets with the compilers is that things, like long term, we’d like to be using technologies like standard C++, and new designs should let us transition to that and anything that arises as these things mature.
Gibson: What successes would you like to highlight?
Kent: I think that the headline result is that we’re now starting to do production science with this new implementation, and we get decent performance with the open-source LLVM compiler, at least today when it’s targeting NVIDIA GPUs. And of course, this already gets us Summit, Perlmutter, and Polaris, and the very large install base of NVIDIA GPUs. For us, using open-source LLVM is very important, since now all the vendor compilers are derived from this, so they can see what was done. And of course, we now have proof of principle of this new design.
So, I’ve got confidence that in the future we’ll be apt to get to the same situation with other architectures; of course, we still need to get there. One thing I should also mention is that with this new design, it’s much more flexible than the old one. In the event that someone has a special hardware feature that could have a lot of performance benefit or maybe a special software library, we still have the flexibility to use it with limited changes and not wholesale rewrite. So again, I think that’s a success in the flexibility of the application that we just didn’t have before.
Gibson: You know, sometimes during the course of work, you see exceptional performance from colleagues. Are there certain people that you’d like to mention who have gone above and beyond or brought to bear special skills and talents that have especially helped QMCPACK?
Kent: Yes, I would. Everyone plays a part on the team, but I would really like to highlight Ye Luo at Argonne and also Peter Doak at Oak Ridge, who did the majority of the new design implementation, as well as lots of other things, so they’ve been really tremendous.
Actually, I’d also like to highlight a few other ECP projects that have been really important. The SOLLVE project has been very important for the compiler work, but also, for example, the Spack project has been very helpful for us and a huge productivity boost. Spack is often described as a way to install applications, which is true, but we’ve been using it as a way to test a huge variety of compilers and libraries and all the dependencies that QMCPACK has. And that’s something that just wouldn’t have been practical before.
And just more generally, I think I should thank the compiler developers, both open source and the vendors. They’re really critical, and perhaps they don’t get the credit they deserve. We’ve seen an awful lot of improvement with time. And, of course, they’re dealing with a very complex topic. And software like the compilers is key to unlocking the hardware performance. That’s been very important for us, and it’s been terrific to see the improvements.
Gibson: The Frontier machine is here now, and so exascale has arrived. My question for you about Frontier is whether QMCPACK will be among the first applications to run on that machine?
Kent: Yes, it will, Scott. So, this is an exciting time, and, of course, we’re recording this late in July. And, at the moment, we’re running on the test and development system Crusher, while the OLCF is working to stand up Frontier for general availability. We have a very extensive test set, and those are all passing, at least most of the time. And I think if we actually tried to run on Frontier today, we’d be able to scale to the whole machine—a big milestone—but I don’t think we’d get the performance that we were after. So, there’s still a bit of a work in progress here, but we’re well on the way to having something usable for science.
I should add that, although Frontier is here now, we’re also very much looking forward to getting on the Aurora hardware, since running well on all different vendor GPUs is one of our goals.
Gibson: You alluded to work in progress. Will you tell us more about QMCPACK’s current activities?
Kent: Yes, well, within ECP, we need to meet our figure of merit, first of all—so, get that 50× performance increase from the reference runs we did at the start of ECP with the machines we had then. I think the main focus is on increasing the performance on AMD architectures at the moment. And then, since we’re transitioning to science production, another thing we’re focused on is really making sure that our new implementation is robust and really usable for science by general users. Again, that’s a very important step.
Gibson: Will you explain what you mean by figure of merit?
Kent: Yes. So, every project—including ours—has defined some measure of performance, and in our case, it’s essentially how fast we can do our science. So, we set up a target problem—it’s a material, nickel oxide, that we’re actually doing experiments on—at the start of ECP, and essentially we want to do calculations like those published in the literature [but with] 50× [higher performance]. So, we have an equation that captures all of that, and now shortly on Frontier, we hope to actually exceed that goal.
That doesn’t capture everything that we’re working on but does capture a lot in terms of the performance of the machines, as well as the improvements in the software, that we’re able to capture.
Gibson: Paul, is there anything that I’ve not asked that you wish I would have?
Kent: Well, one thing I haven’t mentioned yet is that we’re a fully open-source code, and we welcome more users as well as contributors. QMCPACK is up on GitHub (https://github.com/QMCPACK/qmcpack/). We organize workshops and have a YouTube channel, for example, with recordings from the last one, as well as a virtual machine to try things out and so on.
So, we’re very happy to hear from anyone who’s interested in either the methods or learning more about the different software technologies and approaches we’ve been using.
Gibson: Well, this has been very informative. Thank you for being on the program.
Kent: Thank you.
This research is part of the DOE-led Exascale Computing Initiative (ECI), a partnership between DOE’s Office of Science and the National Nuclear Security Administration. The Exascale Computing Project (ECP), launched in 2016, brings together research, development, and deployment activities as part of a capable exascale computing ecosystem to ensure an enduring exascale computing capability for the nation.