Collaborative Strength Enables the EXAALT Project to Optimize GPU Performance

11/03/20

Exascale Computing Project · Collaborative Strength Enables the EXAALT Project to Optimize GPU Performance

By Scott Gibson

Danny Perez, Los Alamos National Laboratory, and Rahul Gayatri, NERSC

From left, Danny Perez, Los Alamos National Laboratory; Rahul Gayatri, National Energy Research Scientific Computing Center

A dream of researchers is to cut the time it takes to develop new materials by using computer simulations almost exclusively. The outcomes could be better nuclear fuels and the right first wall materials for fusion reactors. Or new metals, semiconductors, and insulators. And more possibilities exist.

In this episode of the Let’s Talk Exascale podcast, we’ll have a look at a project called EXAALT, which refers to Exascale Atomistic capability for Accuracy, Length, and Time. It has the potential to bring atomistic materials predictions to the engineering scale. Materials design and synthesis could be demystified and, for the most part, done virtually.

To discuss EXAALT we’re joined by Danny Perez of Los Alamos National Laboratory and Rahul Gayatri, a project collaborator from the National Energy Research Scientific Computing Center (NERSC).

Our topics: Molecular dynamics simulations—what they are, why they’re important and advantageous, how they can be used in the design of materials, and their limitations. We’ll also have a project highlight.

Interview Transcript:

Gibson: Danny Perez and Rahul Gayatri, welcome!

Perez: Thanks, Scott.

Gayatri: Thanks, Scott.

Gibson: Why molecular dynamics simulations? Why are they important?

Perez: In the following, I’ll refer to molecular dynamics as MD just for short. MD is very powerful because it’s a very fundamental approach. If you’re interested in understanding how a material behaves, then you can just set up a virtual system and just integrate the equation of motion of all of the atoms in your system and just see what happens. So, this is very powerful because you don’t have to know ahead of time what the atoms want to do. You just set the system up, let it go, and see what happens.

And, very often, there’s a very wide range of phenomena that will just spontaneously occur, and, oftentimes, these aren’t expected. So, even if you don’t know what the key physical steps are, then molecular dynamics will just show it to you. In that sense, you can think of MD as a numerical experiment where you gain fundamental insight into material down to the atomic scale where we get a complete resolution and we can see where each atom is at every point in time.

That is really a great complement to actual experiments, where we often have limitations in terms of resolution in space and time. In MD, we cannot simulate everything, but we really have full and complete information about what is going on in the material; and that is very powerful.

Gibson: First-principles calculations can be used for design and prediction in materials science. Will you speak to what molecular dynamics simulations can offer in that respect?

Perez: Materials design is kind of an outstanding goal in the community nowadays, to try to see if we could design materials purely computationally—so, get a sense of how a material would perform just from computer simulation without having to go to the lab and try different possibilities and so on. The ultimate idea would be to shorten the time it takes to develop new materials by being able to basically develop them from scratch on the computer.

And, as I said, the advantage with MD is that you can think of it as an experiment. So, if you want to, for example, design a material that would go into a nuclear reactor, typically these kinds of experiments are very expensive. You have to go in the lab, form your material, and then stick it in the reactor. It might stay there for a year, and then it comes out radioactive; so, it’s very complicated to manipulate and so on. Instead, we could just simulate with molecular dynamics how the material will react to the environment inside of a reactor. Then, potentially, we could limit the number of experiments that need to be done before a new material can be certified and this way, speed up the whole process.

So, as I said, you can think of MD as some kind of virtual experiment that lets us explore on the computer how a material would react ahead of time without having to do the experiment or before we decide it’s actually worth doing the experiment.

Gibson: What are the limitations of molecular dynamics?

Perez: One of the main issues with molecular dynamics is that it’s quite expensive. We cannot directly model materials at the engineering scale because we’re resolving every single atom in the system. As you can imagine, that turns out to be quite expensive from a computational point of view. So, we have limitations both in the system sizes that we can simulate and also in the times that we can simulate.

For system sizes, if you take the largest computer out there, we can probably simulate something like a trillion atoms. That looks like a big number, but if you think of it as a volume in space, you see that, really, it’s only something like a one micron cube of material. That is very small. So, that means you cannot simulate a car or a big chunk of a plane with molecular dynamics. But, still, on the scale of a micron, there’s enough you can learn about the type of defects in a material and so on.

The other limitation that’s more stringent, in a sense, is the time scale. For the length scale, the bigger the computer, the larger the system you can simulate, so that’s kind of nice. You get that for free. Every time a new computer comes online, you can simulate the bigger system that you were able to do before. Time scales are a bit different because there’s something really serial in integrating an equation of motion like this. You have to finish a time step before you start another time step, and that makes parallelization for time much more difficult. So, basically, in MD, no matter what computer you use you’re basically stuck at sub-microsecond time scales. And that’s independent of how many atoms you simulate and what computer you use to simulate it. So, that’s a very short time for many applications, and, oftentimes, it’s way too short to even see the material start to react to external constraints like stress or pressure or temperature.

These are really the main limitations of molecular dynamics, that the size and time scale limits that we can reach are fairly limited.

Gibson: As we’ve said, Danny, the project that you, Rahul, and the rest of your team are working on is called EXAALT. What is EXAALT doing to address those limitations you just explained?

Perez: Our goal in EXAALT is to develop a new generation of ultra-scalable algorithms that would let the researcher exploit a very large computer in new and more flexible ways. As I said before, the natural way that you would use a large computer nowadays is to do a larger simulation than you were able to do before. And that is useful in some cases, but it’s quite limiting in other cases. So, a big part of what we do is to try to push along two other axes at the same time. One of them is the time axis, to be able to do simulations on smaller systems but on much longer times, and that requires complete rethinking of how we organize the algorithms, how we distribute the computation on the machine, and what kind of physics we can exploit to increase the simulation time scale. And the other axis that we’re pushing on is accuracy. To do a simulation, you need a basic model as to how the atoms in the material interact. And these models are quite time-consuming to parameterize and to obtain. So, what people would like to do is enable the creation of new and improved materials models by leveraging the exascale so if a company comes with a new material composition for which we don’t have a model, we can turn to a large computer, get a model very quickly, and then be able to do our MD simulations.

So, our work in EXAALT is kind of two-fold. We’re developing brand-new approaches and then we’re finding ways to integrate these approaches together so that somebody should be able to choose the kind of length, time, and accuracy that they’re interested in. And then we organize this in a code that’s highly optimized so that it can scale on very large computers.

Gibson: Tell me about EXAALT’s work relative to nuclear fuels and fusion energy. And, relatedly, in what areas will EXAALT bring about a technology step change—that is, a sudden significant improvement in the state of the art?

Perez: As I said, our focus in EXAALT is on nuclear materials, both for fusion and fission applications. For fusion applications, we’re trying to design or understand how the materials react in the walls of fusion reactors. If you remember what fusion is, it’s basically the nuclear reactions that happen at the core of a star, where you fuse hydrogen isotopes together and you produce neutrons with lot of energy and some helium. One of the issues in these first wall materials of the reactor is that they’re exposed to very high temperature, very large fluxes of neutrons, and of helium that comes out of that reaction.

It is known that all of this energy and particles that come out from the plasma and impinge on the material really affects the microstructure of the material and also its properties. As you keep running your reactor, you see that your material is changing. It might become more brittle, or the thermal conductivity might decrease, and all of these things are bad. So, one of our objectives is to be able to use molecular dynamics simulations to understand all these complex changes that are occurring in the materials when they’re exposed to these plasma conditions. And find ways to improve the materials. So maybe add in some alloying element or change the microstructure of the material so that it’s more tolerant to defects.

The other aspect is nuclear fission, so more on the nuclear fuel side. In this case, we’re interested in understanding, again, how the nuclear fuel evolves with time. The fuel generates lots of neutrons and it generates lots of transmutation elements when the nuclear reactions occur. You would have gases like xenon or krypton that’s going to appear in the material with time, and these gases can form bubbles and, again, affect the microstructure of the material. So, what we would like to be able to do is to get the capability to simulate over a long time how these defects form, how they organize into bubbles, and, potentially, how these bubbles evolve and affect the properties of the material.

So, these two examples are really just motivating problems that we use as benchmarks to monitor our progress. But the capabilities that we develop are meant to be very general in the sense that they should be broadly applicable to any kind of hard materials. So, think about metals, semiconductors, insulators—so, hard, crystalline materials. There’s a broad range of applications in that space of problems, so you can think of, for example, developing new alloys for high-temperature turbines or jet engines or micro-electronic components such as memristors, for example, where the functionality of the device really depends on how its structure evolves with time. Or you can think of lightweight materials for cars, for example, and so on.

There are many examples where understanding the how the structure the material down at the atomic scale evolves with time is really essential. EXAALT now is really concentrating on this class of problem in the hard material, that kind of structural material part of materials science. But our aim is to build very general tools and also a very general computational infrastructure so that somebody else could come and extend it to their problem of choice like soft matter or other problems. But at this point, our focus is really more on the hard material side.

Gibson: How might molecular dynamics simulations help in the development of soft materials (for biology) and also in the development of hard materials for energy applications other than nuclear? I’m referring to superconductors, insulators, quantum, and such.

Perez: That’s a good question. So, typically, soft materials are different enough in the way that they behave that when you approach them computationally you have a very different set of tools that you would come up with. Basic molecular dynamics is the same, but, for example, if you want to look at how to extend the simulation time scale, then that’s much more difficult. Or it requires a different way of looking at the problem. EXAALT itself is really geared toward hard materials, but the methods that we implement in EXAALT have generalizations that could be applicable for softer materials. So, I think that down the line, this is definitely something that we aim at generalizing to, but it’s not something that would be really straightforward to do. So, that’s more over a few years’ time scale that we will need to start maturing these ideas and start thinking of how to scale them up.

For quantum materials, that’s a very interesting question. In classical molecular dynamics, you’re really focused on the structure of the material, so on where the atoms are. And the electrons are kind of implicit in this whole formulation. As part of EXAALT, we have one kind of computational engine that we use that explicitly resolves electrons so you can start looking at chemical reactions and so on. But our focus is really on predicting the structural properties of materials more than on predicting the electronic properties. So, the good news is, in principle, if you have this nice, new shiny quantum code and you want to couple it to EXAALT to do long time scale or large size scale, then we should be able to do that. But, at present, the cost of these calculations at the quantum level is so high that it is difficult to afford or even to consider doing the simulations, even at the exascale. So, I would say that this is the bleeding edge that we would like to push , but it’s something that will require some more fundamental development on the simulation side before these kinds of simulations are to be affordable.

Gibson: Let’s delve into a recent achievement. I understand EXAALT has made impressive performance improvements.

Perez: Yeah, one nice thing with EXAALT and working with ECP is that we have the chance to partner with many different organizations. One of our key partners in improving performance is NERSC [National Energy Research Scientific Computing Center], and Rahul, who is here on the podcast with me, is an application performance expert at NERSC; and we work very closely with him and with other collaborators that he will tell you about. So, I would like to turn this over to Rahul so he can give us a sense at the technical level what we were able to achieve over the last few months.

Gibson: Great.

Gayatri: Thanks, Danny. As Danny mentioned, EXAALT is an ECP project, which means that it will be one of the forefront applications that will get compute time on the exascale machines. In order to make effective use of this high compute power, there are certain problems that have been identified as figure of merit problems, against which the progress of the project is to be measured. Around one and one-half years ago, the ECP figure of merit for EXAALT showed a downward trend, where its performance against the peak performance that is achievable on a given hardware was going down compared with the new architectures and GPUs that were coming at that time. In order to solve this problem, we had a group, a collaboration with a few application engineers from NERSC, NVIDIA, and HPE, and we started working on improving this problem for GPUs.

The focus of this team was very much on improving the SNAP [Spectral Neighbor Analysis Potential] module on NVIDIA GPUs. There were a lot of optimization strategies that we tried out. Of those some of the main strategies that stand out, the first one is the strategy of kernel fission, where you break a single large kernel that handled all of the work of an atom into multiple smaller kernels, each concentrating on the completion of a stage in the algorithm for all the atoms. What this allowed us was to optimize each individual kernel because each of them might have different needs in how they are scheduled on the machines and will allow us to better utilize the resources of a GPU.

The one downside of the strategy was that we now needed additional storage because we had to pass this atom-specific intermediate information between these kernels. But this was actually a good thing because this led us to think about innovative solutions on how we optimize our memory footprint. By the end of it, we ended up with a lower memory footprint than even the original implementation.

So, another strategy that we tried out and which is beneficial across all GPUs, is to access data in a coalesced manner. What this means is that consecutive threads on a GPU access consecutive memory locations. This is usually beneficial on a GPU because it will reduce the number of memory transfers that are necessary to execute a SIMD instruction and improves the overall arithmetic intensity of the application. While this is usually beneficial on GPUs, this should be avoided on CPUs because it might lead to false sharing and cache trashing.

And the last point that I want to mention in the optimization that we did was to use this fast cache memory that is available on NVIDIA GPUs, also commonly known as shared memory, to optimize on the number of reads and writes that we do from global memory. We made use of this bi-spectrum symmetry that is available in the SNAP algorithm and saved the intermediate results in the shared memory. So, this improved our AI immensely and pushed the code into a more compute-bound regime because we ended up spending less cycles in accessing memory from global leads compared to accessing it from the shared memory.

A detailed presentation of all our optimizations is available in an IDEAS webinar series on best practices for HPC [high-performance computing] developers by refactoring EXAALT MD for emerging architectures.

So, by the end of all of our optimizations, we achieved a nearly 22x speedup for the ECP benchmark problem on an NVIDIA Volta GPU compared to the older GPU implementation on the same hardware. What this means on a bigger scale is that if we run this new implementation on the entire Summit machine, we get a 350-times speedup compared to the best on Mira [supercomputer] performance. This was a hugely successful collaboration and showed us how effective collaborations between different labs, vendors, and vendor engineers can help us achieve these high-performance goals.

Gibson: We’re grateful to our guests: Rahul Gayatri of NERSC and Danny Perez of Los Alamos National Laboratory. And we hope you’ll join us next time for Let’s Talk Exascale.