Modern cosmological observations carried out with large-scale sky surveys are unique probes of fundamental physics. They have led to a remarkably successful model for the dynamics of the universe and several breakthrough discoveries. Three key ingredients—dark energy, dark matter, and inflation—are signposts to further breakthroughs because all reach beyond the known boundaries of the Standard Model of particle physics. Sophisticated large-scale simulations of cosmic structure formation are essential to this scientific enterprise. They not only shed light on some of the biggest challenges in physical science but also rank among the very largest and most scientifically rich simulations run on supercomputers today. The ExaSky project is extending existing cosmological simulation codes to work on exascale platforms to address this challenge.
The universe is filled with dark unknowns. Dark energy, dark matter, the mysterious neutrinos, and how objects like massive black holes form. Scientists think about 85 percent of the mass in the universe is comprised of dark matter that has conventional gravitational interactions, but they do not know what the dark matter is made of. Dark energy, the mysterious agent causing the current acceleration of the rate of expansion of the universe, is even less understood.
Since scientists are unable to experiment with the universe, they turn to computers to run experiments. In this case, a suite of software codes solve the underlying complex equations that best model the current knowledge and generate catalogues of objects such as galaxies and galaxy clusters that live within dark matter clumps. The ExaSky project’s codes are designed to run on the world’s fastest supercomputers, like Frontier and Aurora, exascale computers that can do a billion billion calculations per second. The codes create simulations of the universe in several different scenarios, requiring calculations of massive amounts of data, aiming to produce the most extensive synthetic sky maps ever created.
ExaSky drew on two major sets of computer codes that simulate a large range of physical processes, such as how billions of galaxies formed and arranged themselves in what is known as the cosmic web, from star birth to supernova death throes and everything in between. The challenge was to account for sizes up to several gigaparsecs in scale – length scales that are probed by current and future cosmic surveys, with galaxy formation-related physics modeled down to kiloparsecs—a dynamic range of one part in a million. The codes predict details of the structure and properties of individual galaxies as well as how they interact with other galaxies and with dark matter, via gravity. The ExaSky team updated these codes to best use exascale computers’ capabilities, by rewriting the solvers and by adding new physical processes, a process that took seven years and many contributors to make it happen. One set of codes, called Nyx, was initially written for CPUs and the team had to rewrite much of it to run on GPUs, the computational technology that exascale computers use. The other main code, HACC, was already written for GPUs, but was re-optimized and made much more powerful by adding a number of astrophysical modeling capabilities. Both ExaSky codes, HACC and Nyx, although using different methods, have been shown to produce simulation results that agree at the 1% level, a major improvement over the previous state-of-the-art.
The ExaSky simulation program is now shedding light on some of the biggest questions in cosmology today: the mystery of dark matter and dark energy; the nature of primordial fluctuations, or how the universe evolved from a relatively smooth almost wrinkle-free vacuum to the wiggles and bumps that gave way to galaxies; and determining the mass of the mysterious neutrinos.
Cosmological simulations can represent a huge range of scales, from the size of the smallest galaxies to a distance of less than a fifth of the way to the edge of the observable universe – a huge scale from the smallest to the largest scale of observation. They are also allowing scientists to model what happens inside the central core of energetic galaxies, where radiation is detected but is not coming from stars. If it Is the work of supermassive black holes, what are the processes forming them? Supercomputer models such as the ExaSky simulations can help scientists figure that out.
Analysis of all this data can be used to interpret sky survey observations by both terrestrial and space-borne telescopes. The needed simulations of the past, present, and perhaps future of the universe are not possible without the power of exascale computing. The ExaSky project codes can produce levels of performance that is an order of magnitude greater than previous efforts. The scale of improvement for HACC is a factor of 270 times over from when the project started. Coupled with next-generation sky surveys, the ExaSky simulations have already and will continue to improve scientists’ understanding of the large-scale physical processes that drive the evolution of structure in the universe.
A new generation of sky surveys will provide key insights into questions raised by the current cosmological paradigm and provide new classes of measurements, such as those of neutrino masses. They could lead to exciting new results, including the discovery of primordial gravitational waves and modifications of general relativity. Existing supercomputers do not have the performance or memory needed to run the next-generation simulations that are required to meet the challenge posed by future surveys whose timelines are parallel those of the Exascale Computing Project. The ExaSky project extends the capabilities of the HACC and Nyx cosmological simulation codes to efficiently use exascale resources as they become available. The Eulerian AMR code Nyx complements the Lagrangian nature of HACC. The two codes are being used to develop a joint program for the verification of gravitational evolution, gas dynamics, and subgrid models in cosmological simulations run at very high dynamic range.
To establish accuracy baselines, there are statistical and systematic error requirements on many cosmological summary statistics. The accuracy requirements are typically scale-dependent, large spatial scales being subject to finite-size effects and small scales being subject to several more significant problems, such as particle shot noise and code evolution errors, including subgrid modeling biases. Strict accuracy requirements were already set by the observational requirements for US Department of Energy-supported surveys, such as the Cosmic Microwave Background-Stage 4 (CMB-S4), Dark Energy Spectroscopic Instrument (DESI), and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), which are typically sub-percent (statistical) over the range of well-observed spatial scales. Systematic errors must be characterized and controlled, where possible, to the percent level or better. The final challenge problem runs will be carried out with a new set of subgrid models for gas cooling, UV heating, star formation, and supernova and active galactic nucleus feedback under active development.
The simulation sizes are set by the scales of the cosmological surveys. The challenge problem simulations must cover boxes of linear sizes up to several gigaparsecs in scale with galaxy formation-related physics modeled down to roughly 0.1 kiloparsecs—a dynamic range of one part in 10 million, improving the current state of the art by an order of magnitude. Multiple box sizes will be run to cover the range of scales that must be robustly predicted. The mass resolution of the simulations in the smaller boxes will go down to roughly 1 million solar masses for the baryon tracer particles and about five times this value for the dark matter particles. The final dynamic range achieved depends on the total memory available on the first-generation exascale systems.
The ExaSky science challenge problem will eventually comprise a small number of very large cosmological simulations run with HACC that simultaneously address many science problems of interest. Setting up the science challenge problem in turn requires multiple simulations—building subgrid models by matching against results from very high-resolution galaxy formation astrophysics codes via a nested-box simulation approach, having a medium-scale set for parameter exploration, and—based on these results—designing and implementing the final large-scale challenge problem runs on exascale platforms.
Project simulations are classified into three categories: (1) large-volume, high-mass, and force resolution gravity-only simulations; (2) large-volume, high-mass, and force resolution hydrodynamic simulations, including detailed subgrid modeling; and (3) small-volume, very high-mass, and medium/high-force resolution hydrodynamic simulations, including subgrid modeling.
The first simulation set is targeted at observations of luminous red galaxies, emission line galaxies, and quasars. The simulations are relevant to DESI, the NASA SPHEREx mission, end-to-end simulations for LSST, and modeling the cosmic infrared background for CMB-S4. The second and main set of simulations will include hydrodynamics and detailed subgrid modeling with the resolution and physics reach improving over time as more powerful systems arrive. The main probes targeted with these simulations are strong and weak gravitational lensing shear measurements, galaxy clustering, clusters of galaxies, and cross-correlations internal to this set and with CMB probes, such as CMB lensing and thermal and kinematic Sunyaev-Zel’dovich effect observations. A set of smaller volume hydrodynamic simulations will be performed in support of the program for convergence testing and verification and to develop and test a new generation of subgrid models based on results from high-resolution, small effective volume, galaxy formation studies performed by other groups.