Providing Exascale-Class Multiphysics Simulation Capability to Multiple Science Domains
In this episode, Let’s Talk Exascale features Flash-X, an R&D 100 Award–winning software developed under ExaStar within the US Department of Energy’s Exascale Computing Project.
The Flash-X team is composed of members from Argonne National Laboratory, Oak Ridge National Laboratory, Michigan State University, the University of Chicago, the University of Illinois at Urbana-Champaign, and the RIKEN Center for Computational Science in Japan. The Flash-X lead is Anshu Dubey of Argonne, and she joins us, along with Flash-X team member J. Austin Harris of ORNL.
Anshu is a senior computational scientist with deep experience in design, architecture, and sustainability of multiphysics scientific software used on high-performance computing platforms.
Austin is a high-performance computing performance engineer and astrophysicist with experience as a software developer and scientific user for many of the codes in the ExaStar project.
Both Anshu and Austin are also members of the ECP ExaStar project team. Daniel Kasen from Lawrence Berkeley National Laboratory is the principle investigator of ExaStar.
We cover the following topics:
- The background or history leading up to Flash-X development
- The Flash-X framework capabilities and features
- A brief description of how Flash-X works
- The distinguishing factors that garnered Flash-X the R&D 100 Award
- What insights may come from running on exascale supercomputers
- The Flash-X software technology dependencies
- What the Flash-X team is working on now
[Scott] This is Let’s Talk Exascale from the Department of Energy’s Exascale Computing Project. I’m your host, Scott Gibson.
Flash-X, supported by ECP funding, is a software framework or computer application tool for simulating various physical phenomena. The Flash-X team recently received the prestigious R&D 100 Award. More on that shortly.
A prime example of what Flash-X is used to simulate are supernovae, or star explosions, which can send out light bursts up to a billion times brighter than the sun. Studying supernovae has taught scientists a lot.
Supernovae are a tool to develop maps of the universe. The iron in our blood can be traced back to supernovae or similar cosmic explosions. And here’s the clincher: we are stardust. Almost all the elements in our body were made in a star, and many came by way of several supernovas.
Flash-X will simulate phenomena from astrophysics, computational fluid dynamics, or CFD, and cosmology. Astrophysics seeks to understand the universe and where we fit into it. CFD enables engineers, for example, to use mathematics and data structures to model gas and fluid flows to test designs in aerodynamics, aerospace, weather simulation, and more.
Before there was Flash-X, there was Flash, a product of the Flash Center for Computational Science at the University of Chicago. Flash has enabled a variety of scientific discoveries over the past decade.
Until recent years, developers were able to focus on parallel computing and the numerical aspects of coding. However, as technology has increasingly evolved toward heterogeneous supercomputing platforms, meaning systems that incorporate accelerators such as GPUs, in combination with CPUs, Flash needed to morph into Flash-X.
[Anshu] So you have different types of computational units within the same platform, and to be able utilize them, you need to have other kinds of separations of concerns: parallel hierarchy, the ability to have different data layouts for different parts of the computation, and so on.
We needed to dig deeper for turning things into more fine-grained components. So Flash, for example, could get away with having components, the finest level of components at the level of a single function. But because we were working with different data layouts and different control structures, we needed to go further down into the size of the components.
We did that by introduction of macros with alternative definitions. The other part of it is that the data orchestration needs to be a lot more complex, as opposed to what the distributed memory version use to be, where, you know, you do a bunch of things locally and then you carry out a data exchange with your neighbors, and then you do another bunch of things locally, and so on.
Now, there is the possibility of running things in parallel or different types of computation in parallel, which means that you need to also think about data movement. So we needed to develop a runtime for data movement. And we needed to develop some sort of code generator tools where you could express this orchestration in a succinct way and have a translator generate the code automatically. It required deep changes into almost all parts of the code.
[Scott] How to unpack the Flash-X capabilities a little bit, explaining at a high level how Flash-X works and describing its main features …
[Anshu] So Flash-X actually has these components that carry their own meta information with them. And we have a tool that knows how to parse this meta information and configure the application. We have a special kind of unit, which we call the simulation unit, which is where we define our application. The tool that configures the application starts from this simulation unit, and then it recursively parses all of the requirements that it encounters as it is configuring the whole application.
All of these components also carry their own build snippets; also, it sort of puts together and generates a build. Eventually, what you end up having through this process is that all of your macros are translated; the code transformation happens; all units that are needed for that application instance are grouped together; a make file is generated; and the code is configured. So all of the emitted code is compilable. This is then compiled, and you have an executable ready. And we work with a single executable.
[Scott] J. Austin Harris is an astrophysical and performance engineer at the Oak Ridge Leadership Computing Facility and a member of the Flash-X team. He explained the physics aspect of Flash-X.
[Austin] The main feature of Flash-X from the science perspective is this ability to incorporate multiple disciplines of physics, multiple kinds of sub-codes, in a very easy, composable way. So Anshu mentioned that Flash started as an astrophysics code, which is very much a quintessential multiphysics problem.
And so, we have to have physics capabilities like neutrino radiation transport, which is a very computationally expensive thing to do. That’s something that’s really benefited from the Flash-X framework, where we can try to utilize some of these different runtime features, and certainly, the modularity aspect really, really helps there. We’re able to develop the neutrino-radiation transport code, kind of in a separate realm entirely with its own developers and then actually integrating that into the Flash-X system becomes relatively simple. This also applies that we have a nuclear equation state framework that is also developed by its own developers. And that’s something that’s brought in.
Then we’ve also kind of built on existing features in Flash as part of Flash-X. So the traditional Flash had nuclear burning, as Anshu mentioned, and we’ve kind of expanded that to a more generalized framework with a nuclear burning code called X-net, which can do a generalized reaction network that works up to larger sizes that are needed to capture the entirety of the physics that’s happening and some of the supernovae simulations. And this is not to mention. I mean, there’s also this hydrodynamics capability that Flash also had—something else that we’ve kind of built on by incorporating the Spark code for magneto hydrodynamics. That’s been developed by team members at Michigan State.
And that is to say that, by having this plug-and-play feature with Flash-X … it allows us to pull in the separate teams’ effort into a single project. And that’s something that’s been infamously hard to do over the years.
[Scott] Returning to the R&D 100 Award mentioned earlier, the Science and Technology Media Group R&D World oversees the R&D 100 Awards. Judging is by an outside panel of nearly 50 research-and-development experts from across the globe. Information from the awards program states that, quote, “for the past 60 years, it has been honoring great R&D pioneers and their revolutionary ideas in science and technology.” So what qualities garnered the R&D 100 Award for Flash-X?
[Anshu] I would say part of it is the sheer extent of capabilities that are incorporated into the code which make it possible to be used by multiple science domains quite seamlessly. The second part of it is … I think the particularly innovative part that we have in Flash-X is the performance portability layer.
And so, what happened was that the performance portability solutions that have been developed in the community have largely been dependent upon C++ and the features that C++ offers. Now, here we are with a legacy Flash code, legacy Fortran code. And those solutions don’t really work very well for us. So the option is either really write the whole code in C++, which is tall order, or we figure out ways of solving performance portability challenges.
We decided to go with the performance portability challenge. Now, the other thing that we learned during our first two iterations of refactoring the code is you need to do the refactoring gradually, which means that you really want to be able to make incremental changes and be able to verify every time you’ve made a change that you haven’t broken something; otherwise, the project just becomes intractable. So that informed the design of our tools.
And so, the tools that we designed now that are integrated within Flash-X, in order to provide this performance portability layer, each one of them, the philosophy has been that the tools themselves operate in a plug-and-play mode in some sense. And that you can choose to use as much of the tools as you want to, and you can gradually increase your codependence on these tools. And each tool is individually simple. But they’re designed in such a way that together they deliver a comprehensive performance portability solution. This happens in a completely language agnostic way.
You can do this for Fortran, C, C++, even Python if you want to. And so, what we do, I think that is really the most innovative part of Flash-X that was instrumental in winning the award, in addition to all the physics capabilities that we have.
[Austin] Yeah, and I’m from, primarily, a domain scientist code developer perspective. I want to double down on what she said about this, this incremental approach to performance portability. It is notoriously hard to get scientists who aren’t necessarily entrenched in the HPC community to write HPC code. And even more, if it’s this monumental task, that just seems like it’s never going to happen. And one of the really biggest advantages of Flash-X to those code developers’ perspective is that they don’t have to buy in whole cloth. At the start, they can take in little pieces at a time. And that’s very attractive.
[Scott] Let’s paint a picture for the audio theater of the mind, if you will, as to how Flash-X helps scientists model supernovae and other physics phenomena.
[Austin] Supernovae are a heavily multiphysics problem in every sense of the word. There’s hydrodynamics, like magnetic fields, intense gravity, nuclear equations of state that depend on the strong force, nuclear burning. I mean, I don’t think that you could find a problem that ticks more boxes on the multiphysics aspect. And what that means in terms of challenges, right, is all of these different physics and science domains that are being included, they all have different time scales. And they have different computational requirements, different ways, different methods to solve problems.
And typically, that means you need a way to interface them all together in a way that they play nicely. One of the challenges in doing that, in the past is getting that to scale very well because of all these competing time scales, eventually scaling starts to break down as you go to higher and higher and higher sizes of some simulations that you’re trying to do. And that’s kind of really been one of the things that’s been at the forefront of the Flash-X and is maintaining scalability that’s dictated a lot of our choices in development and has allowed us to kind of incorporate all of these different pieces of physics in a way that still maintains our ability to run at the largest scales, which is ever-present in our minds as Frontier starts to come up.
[Anshu] I would add to that is that supernova is a part of the other physics phenomena. It is because of the extensibility and composability features of the code that people from other domains can come along and just build bits and pieces of their own physics added to the code reuse some of the existing capabilities and often add, they need to put in a fraction of effort that they would otherwise have to put in to develop a code for their own domain.
So for Flash, about 5 or 6 years ago, I wrote a paper about the effect of the impact of investment in an extensible design. And at the time, a back-of-the-envelope calculation showed that the availability of infrastructure abilities alone, not the physics, just the infrastructure abilities alone, would have roughly saved the other domains that started using the code roughly 75 full-time equivalent of work. So that’s a huge saving in terms of effort and reusability and speaks to the investment in design and how it pays off.
[Scott] What sorts of insights will the application enable when it’s run on exascale platforms such as ORNL’s Frontier or Argonne’s upcoming Aurora? And in what ways will scientists and researchers be able to leverage the Flash-X plug-and-play optimization?
[Anshu] One of the things is that the way we’ve designed the code, we can configure different applications that would run well. And as I mentioned before, people can add capability for one domain, and all of the other domains will benefit from this.
So it’s not just that supernova kind of simulations. For example, one of the postdocs working with me is using Flash-X to do simulations on bubble formation for understanding how the cooling systems work. And his application will scale in similar ways to the way we expect the supernova simulations to scale. But as far as insights in terms of supernova themselves are concerned, and what these new scales at which you’re able to run, I think that Austin should answer.
[Austin] We have these systems coming online that are really, it’s a mind-boggling scale, when you think about how large these systems are. And that presents, you know, challenges especially for running these large supernova simulations. One of the nice things about the plug-and-play customization of Flash-X is that we’re now able to run these simulations at scale on Frontier with one programming model and then have that same code also be able to run on Aurora with an entirely separate software stack. I mean, they’re similar in a lot of ways, but they’re also very distinct.
And so that portability layer that we have with Flash-X is really helpful there. That allows us to kind of tune for the individual machines. I’ll also add that one of the nice things about these big machines is that it’s not always just about being able to do the same simulations that have been done in the field faster or bigger. The ECP project as a whole has really allowed us to advance the level of the science that were in physics that were in the code.
So the simulations are able to do things like expand the realism, and the nuclear burning. It allows us to do things like incorporate general relativity that’s always been kind of a little bit out of reach in the past. It allows us to use more realistic treatment of neutrino matter physics, which is what really drives the engine of the supernovae. All of these pieces kind of have to come together to give us the full picture of how supernovae work and how they produce the elements that make up everything here on Earth. It’s, you know, that’s heavier than iron. I mean that’s all coming from supernovae, or their remnants. And really understanding how that works requires this level of detail that you have to be able simulate and simulate in a performant way. That requires a lot of expertise from the science teams being able to capitalize on including that physics in a Flash-X architecture that’s performing at scale.
[Anshu] I would summarize that the scale of these machines and the additional capabilities that they offer and the additional capabilities that we’ve added into Flash-X will actually increase the fidelity of the simulation that they’re able to do, which means that it is more realistic than ever before. And therefore, we will have greater confidence in the inferences we are going to draw from that, and more finer-grain inferences can be drawn from these simulations.
[Scott] Flash-X does depend on certain software technologies developed by ECP.
[Anshu] the hard dependencies are very few in the sense that you can get Flash built and running with only compilers, MPI, and HDF5 available. That of course, is not enough to run on exascale platforms. And I mean you can still run, but you will not be able to utilize all the resources the exascale platforms offer.
There are other dependencies that we have. For example, we need some mechanism by which data can be moved to devices such as either OpenMP or OpenACC or some other mechanism for offloading data. We definitely need to use OpenMP in order to have threading run on the CPU side of the devices. We also have a dependency on asynchronized I/O, which is the HDF5 project. Then, too, we support two different types of meshing. And these are both structured mesh, but one of them is AMReX, which is a co-design center with ECP project.
And AMReX allows more flexibility and more state-of-the-art development of the AMR technology that we depend upon. And certain types of physics also depends on other math libraries such as PETSc, and hypre for solving, for example, multigrid that we need for gravity and so on. So those are the dependencies that we have. Did I miss anything?
[Austin] I would add that one of the very crucial dependencies that we have is on a performant and up-to-date Fortran compiler. And the Flang effort that’s kind of been sustained by ECP is something that we’re really going to have to rely on going forward. And I’ll also mention that there’s some other optional dependencies. We have variety for the multiphysics side we need time integrators that are able to incorporate different pieces of physics. And so we have soft dependencies on things like the SUNDIALS project that does different methods for time integration. Those are different avenues we’re exploring to try to see what works best for our project.
We also have heavy dependencies on linear algebra and different pieces of the code. And so, Anshu mentioned some of the distributed linear algebra we do for elliptical solvers. But we also have MPI rank local linear algebra that we do. There we’re relying on the MAGMA project quite heavily for performant implementations of things like matrix decomposition and matrix–matrix multiplication.
[Scott] The Flash-X team is composed of people from Argonne, ORNL, Michigan State University, the University of Chicago, the University of Illinois at Urbana-Champaign, and the RIKEN Center for Computational Science in Japan. The Flash-X team continues to work on the code and prepare for science runs.
[Anshu] At the code level, what we are doing is that several of our integration … the performance portability tools are just now beginning to transition from prototype to production stages. And so, we are working on integrating them and hardening their integration with the code. And then once they are fully integrated, we need to understand their performance characteristics, which means that we will be running under different configuration in different circumstances and capturing performance data and making sense out of it.
[Austin] So having the code capability is one thing, right? And that’s what we’ve been working so hard on for the entirety of the project. But now that the code is kind of reaching maturity, we’re really trying to turn our eyes toward how do we use it? And so, there are things we have to do like running preparatory models and just exactly what it is that we, you know, that we should study.
Right now, we’re going through the process of doing some exploratory collapse simulations for core-collapse supernovae to try to nail down things like the resolution that we really need to capture all of the features we’re interested in, the size of the nuclear network that we need, how long we expect a full production science run will take. And so, these are the kind of things we’re learning here in the final stages.
And then the other thing that we’re doing is some benchmarking performance analysis of these codes in production so that we know where to look next. In terms of eking out if there’s some low-hanging fruit, we can eke out performance to do these runs in a more economical fashion. Or if there’s certain things that require us to actually dig in and look at maybe different methods or more stringent performance analysis and optimizations on particular areas of code.
So that’s kind of where we’re at right now in terms of actually doing science. We’re getting excited to be able to actually do a production simulation; it’s been a long time coming. We’d like all the physics that we’re including. We do feel like it’s something that we leave with the code capability is there now. So it’s just a matter of deciding exactly what are the limits of the capability of exascale.
[Anshu] I would also like to add things that we are doing for future sustainability of the code, because, you know, we’ve invested so much in developing this tool. It wouldn’t be very helpful if we didn’t have a sustainability plan. So as part of that sustainability plan, there are two things that we’re doing.
One of them is to transition the whole management of the code into a community-based model, which is through the formation of a sort of consortium council, whatever you wish to call the different members coming together, representatives from different domains. And some people who are taking care of the infrastructure, putting in place policies for ongoing verification and validation of the code, putting in policies for keeping the documentation up to the mark, putting in all of those other policies that are critical for ongoing health and sustainability of the code.
And as part of it, we are heavily leveraging actually another of the ECP projects, which is the IDEAS project, which is, unusually enough, a project dedicated entirely to sustainability efforts. And what we are doing with them is helping them develop tools that make the job of people who are maintaining the code easy, such as people who are doing code reviews. We’re building with them things like pull request assistant, which mines the code analyzes it and tells you where your documentation might have gone out of sync and where your testing infrastructure might have gone out of sync because of this new code that you’re adding, basically things that help in keeping the code up to date and not being too taxing on the developers to keep the code up to date in an ongoing fashion.
I think the investment in code design has been absolutely critical because of which we could do long-term thinking, take a few chances, do a bit of exploration in terms of ideas that would work well. And that was possible because of sustained funding. And what I want to say about the tool chain is that this tool chain that we have developed actually is very usable. Beyond Flash-X, we’ve deliberately kept it not closely tied to Flash-X. So it’s applicable to all Fortran code or, for that matter, anyone else who would like to have the option of gradual ramping on to performance portability. And we’d be happy to work with projects who want to do this.
[Scott] Thanks to Anshu Dubey and Austin Harris. And thank you for listening. Follow the Exascale Project on Twitter and visit exascaleproject.org.
Scott Gibson is a communications professional who has been creating content about high-performance computing for over a decade.