A conversation with Danny Perez of Los Alamos National Laboratory
The audio podcast for this interview can be found here.
Danny Perez of Los Alamos National Laboratory (LANL) spoke with Exascale Computing Project (ECP) Communications at SC17 in Denver. Perez is a member of the Exascale Atomistic Capability for Accuracy, Length, and Time (EXAALT) project team, led by Principal Investigator Arthur Voter, also of LANL. EXAALT specializes in molecular dynamics simulations of materials. This is an edited transcript of our conversation.
What is EXAALT all about?
We do simulations of materials where we resolve all of the atoms in the system so that we can basically make movies of how they move as a function of time. These kinds of simulations are called molecular dynamics. From these, we can infer a lot of very exciting things about how materials behave at the nanoscale. These simulations are basically virtual experiments where you set up the system, let the simulation run, and learn a lot about how materials evolve.
The main science that drives EXAALT is the understanding of materials in extreme conditions. We focus mostly on materials for nuclear energy, both on nuclear fuels in fission power plants and on the walls of fusion reactors. These are very demanding applications where the environment is quite harsh, so it’s imperative to have a good handle on how the materials react to ensure that they’re safe, they’re cheap, and they can perform over the whole lifetime of the power plant, for example. This is the main science driver of EXAALT.
Computationally, our goal is to develop a comprehensive molecular dynamics capability that can scale all the way up to the exascale. The user should be able to say “I’m interested in this kind of system size, timescale, and accuracy” and directly access that regime without being constrained by the usual scaling paths of current molecular dynamics codes. We aim to build a comprehensive capability and demonstrate it on these applications, but, really, it’s a very general framework that anybody else in materials science could come in and use.
How is your research important to advancing scientific discovery?
Molecular dynamics is a real workhorse in materials science and biology. If you look at computing centers across the country, a very large fraction of the cycles is actually used doing molecular dynamics simulations one way or another. We believe that at exascale, molecular dynamics will still be a large part of the workload. So it’s very important to start right now to build the codes that will take us up to these extreme scales.
What’s challenging about molecular dynamics is that if you just take the codes that we have today and run them on an exascale machine, you’ll be able to do much larger systems, but not longer times. In fact, simulation times have basically been stuck at the same level for the last 10 years. Our goal is to break this limit so that people can do long timescales, big systems, high-accuracy models, or whatever the science dictates. We think that could be kind of a revolution in the way people use molecular dynamics as we get to these very large computational scales.
What are the project’s major successes at this point?
The thing we’re most excited about is that a few months ago we had the first release of our integrated EXAALT package. Now there’s an open source code out there that everybody can try out. It integrates three big pieces of code that were developed mostly at Los Alamos and Sandia National Laboratories. I’m personally involved with the part that allows people to reach very long timescales. We have developed a so-called Accelerated MD module that allows you to do replica-based simulations where we use a lot of copies of the system to build very long trajectories in a way that’s scalable. This capability builds on top of a very well-known molecular dynamics code called LAMMPS that’s used across the materials sciences community. LAMMPS is very scalable along the spatial domain, so it is great to do huge systems. The last piece of EXAALT is a code called LATTE that is being developed at Los Alamos. It provides a very high-accuracy semi-empirical quantum simulation capability. So if people need to do these kinds of simulations to capture very complicated electronic effects, they can call the LATTE module out of EXAALT.
What we have done so far is integrate these codes in a coherent whole so the users can just dial in the regime they’re interested in, set up their system, and then launch EXAALT on a very large machine. Now anybody can go to our GitHub site, pull EXAALT, start using it, tell us what they think about it, and suggest improvements we can make in the future.
What have your ECP collaboration and integration activities been so far?
- That has been a very interesting aspect of being part of the ECP. We have quite a number of collaborations with different kinds of efforts in the ECP. For example, we’re interacting with two of the co-design centers—the Co-Design Center for Particle Applications (CoPA) and the Co-Design Center for Online Data Analysis and Reduction (CODAR) at the Exascale. With CoPA, the main idea is that they help us design really fast and portable kernels for our molecular dynamics. They identify the cycle-consuming parts of the code and then write efficient kernels that we can run on different architectures. CoPA has experts in designing proxy apps, benchmarking them on tons of different systems, so this really helps us get most of the performance out.
With CODAR and the EZ [ECP’s “Fast, Effective, Parallel Error-bounded Exascale Lossy Compression for Scientific Data” project] teams, we’re working on data management aspects of the project, because our simulations can create huge amounts of data that we have to deal with somehow. With CODAR, we’re looking at innovative ways of using compression in real time. The objective there is to either store more stuff in memory or aggressively compress the results before we write to disk. In terms of data, we’re also working with the DataLib [ECP’s “Data Libraries and Services Enabling Exascale Science” project] team. They’re helping us develop distributed scalable databases that we’re going to use as a backend of our simulation code. Finally, we’re also working closely with the code productivity team of the IDEAS [ECP’s code productivity team] project. They really helped us streamline our build systems, our deployment systems, and our overall development processes. We’re really happy to be a part of this ecosystem where we have access to tons of people with different expertise. It really has been great for us.
Has your research taken advantage of any of the ECP’s allocation of computer time?
Yes. We are up and running at NERSC [National Energy Research Scientific Computing Center] and the Oak Ridge Leadership Computing Facility, and we’re preparing to set up our code at the Argonne Leadership Computing Facility. So far, we’ve basically run just to demonstrate that the code was built properly. We’re mostly going to use our allocations at these centers to do demonstrations at very large scale to show that we can scale up to thousands or tens of thousands of nodes. These will be coming next year, when we’re really going to unleash the code on the biggest machines we can find. In that respect, having access to these machines will be super useful.
What is most important about your project with respect to enhancing society and national security?
One of our main targets is to develop better materials, which is really crucial to society. If you just consider metals, hundreds of millions of tons are consumed in the United States every year. So, coming up with better materials is a really big drive, economically speaking. But as it turns out, developing a new material is really, really time-consuming. From the time a new class of material is developed in the lab until it hits market, we’re talking about 10 to 20 years. So that’s a remarkably slow process. A lot of effort has been directed at trying to harness computation to shrink this development cycle to something that’s more manageable, like something on the order of 3 to 5 years.
Exascale will really help with this objective in that with current platforms, we’re quite limited in the system sizes we can do or the timescales we can reach. So whatever simulation we do, we are still fairly disconnected from real conditions that materials scientists care about. So we still need to build models that extrapolate our simulation results up to the real world, and this takes a long time and is very error prone. We hope that exascale will give us the ability to run simulations directly in the conditions that are relevant to the applications. I think this will really help in terms of the design and testing of novel materials; that will impact scientific discovery, for sure, but also industrial research. And since we focus on materials in extreme conditions, our work has impact on the national security side of DOE’s mission as well.
What is next for EXAALT?
- We’re quite excited to access very large machines as part of the big ECP allocation we have. So our first step next year is really to show that we can run at scale on current machines and demonstrate that we can get useful science out of our efforts. On our team, we have two domain scientists who are specialists in materials for nuclear fusion and nuclear fission. They will start deploying the code in more of a research-type setting and show that it can scale and can produce really useful results. That’s our first big to-do item for next year.
- The second item is the second release of the EXAALT package where we’re going to roll out a set of brand new methods that will further expand the scope of the simulations we can do. At this point, EXAALT can do huge systems for short times or rather small systems for very long times. Next year, we are going to add the capability to be somewhere in between, i.e., to do rather large systems but also long timescales. That will be a big step closing the gap that we have now in the simulation space that we can reach.