ECP Application Will Deliver Molecular Movies in Minutes

ExaFEL will enable quicker analysis of data from SLAC’s x-ray free-electron laser to help understand everything from viruses to materials.

By Lawrence Bernard

Amedeo Perazzo of the SLAC National Accelerator Laboratory is ECP ExaFEL subproject principal investigator.

Amedeo Perazzo has a guarantee for researchers who want to see a movie of molecular behavior: You can have data in 10 minutes.

Perazzo leads a team at the SLAC National Accelerator Laboratory to analyze the data generated by the Linac Coherent Light Source (LCLS). LCLS—an Office of Science User Facility operated for the US Department of Energy (DOE) by Stanford University—boasts an x-ray free-electron laser that produces 1 million pulses per second, each pulse lasting just quadrillionths of a second for the brightest x-ray images possible.

Using LCLS’s capability to image individual atoms at such a high speed, Perazzo looks forward to accessing data for analysis in a matter of minutes, not several weeks, as it does now.

Having complex data so quickly available for researchers is among the promises of exascale computing. Perazzo is principal investigator of a DOE Exascale Computing Project (ECP) subproject called ExaFEL, providing code for x-ray laser data that could benefit a wide variety of research areas.

“Exascale computing allows us to process many more events in a much shorter amount of time,” Perazzo said. “We will be able to provide feedback in minutes, and researchers won’t have to wait. If you’re a researcher who needs to see how atoms behave, that’s a big benefit.”

Faster and Better Insight

In addition to doing things faster, exascale computing also allows scientists to “get better insight into data we’re collecting. This leads to better science,” said Perazzo, Controls and Data Systems Division director in the Technology and Innovation Directorate at SLAC. “Fast turnaround is necessary, and exascale will get us there. So exascale gives us more events faster, and better analysis and insight.”

An example of how ExaFEL on exascale computers will provide better insight into data is its ability to significantly enhance the atomic detail of the reconstruction. It accomplishes this by correcting the diffraction images for the actual beam spectral shape and for the crystal mosaic texture. This capability also provides the ability to determine a molecule’s dynamics in addition to its average electron density.

SLAC’s LCLS generates photons through a series of extremely quick pulses in a process called x-ray lasing. Through another process known as x-ray diffraction, those short pulses allow researchers to take a snapshot of what’s happening before the atoms are destroyed by the x-ray (known as diffraction-before-destruction).

“If it takes weeks to complete the image reconstruction, the researchers are flying blind; they don’t know what their data look like. Now we will be able to show them,” Perazzo said.

Light Source a Powerful Tool for Atomic Imaging

X-ray lasers are powerful tools that provide glimpses into fundamental processes in nature at the atomic level, providing images of smaller particles of matter and extremely short time scales more than any other procedure. Researchers across the science spectrum use them to elucidate how atoms behave and move, with essentially what amounts to stop-time movies provided by the x-ray pulses. Similar to the effect a strobe light has on dancers, x-ray lasers can create a moving picture of atoms and molecules.

However, producing such a picture takes an enormous amount of data—millions of gigabytes of disk space. Scientists estimate data flow will exceed a trillion data bits per second and will require petabytes of online storage—far beyond what is currently available without exascale-level computing. Analyzing such a volume of data in real time is a challenge that exascale computing helps overcome.

The light source essentially replaces a camera lens with a supercomputer. Diffraction records are unreadable until they record hundreds of thousands of images. Only then can the data show what the researchers need for their applications.

“The only way to make a system look like a camera, where you are seeing exactly what is happening, is having a computer reconstruction as fast as possible,” Perazzo said. “The faster we go, the more computing power we need.”

The LCLS is being upgraded in fall 2022 and will operate at 1 MHz, or 1 million pulses per second. This high rate will require an ability to process massive amounts of data quickly, a need that only exascale computing resources can meet.

Technique Could Benefit Many Industries

The ExaFEL technique focuses on nanocrystallography and single-particle imaging. These techniques can be used in studying chemical reactions, biological processes, how chemical bonds form, and other materials research. The algorithms required are so complex and the molecular images so numerous that only high-performance supercomputers can handle them.

But ExaFEL will benefit many other experimental techniques, Perazzo said. “The framework itself, the ability to scale the computation, the automation aspects, what we learn in ExaFEL for ECP, can all be used in other experiments besides the techniques we are focused on,” he said.

Indeed, one key activity is moving data off the network to the supercomputer. Data must move as quickly as possible so that analysis can begin as soon as possible. Materials science, biology—analyzing viruses such as the one that causes COVID-19—chemistry, and other scientific disciplines can benefit from LCLS data using exascale computing.

By imaging down to single nanoscale particles, and in understanding timescales of chemical reactions in real time, x-ray lasers represent a new scientific frontier when coupled with the right computing power. Nanoscale particles have at least one critical dimension less than 100 nanometers and possess unique optical, magnetic, or electrical properties. Atoms, molecules, living cells, and particulate are all examples of matter at the nanoscale.

Single-particle imaging could enable scientists to develop new drugs to fight disease, components for next-generation computers, new damage-resistant aircraft materials, and customizable chemical reactions for clean and renewable sources of energy, to name a few potential benefits of ExaFEL.

Scaling Challenges Met

Some of the challenges the team has surmounted include porting their codes to the GPUs, which provide most of the computing power of upcoming exascale machines; scaling these codes to millions of cores; and developing more sophisticated algorithms that handle realistic beam conditions and departure of the material structure from that of a perfectly regular lattice.

At DOE’s Oak Ridge National Laboratory, Frontier recently became the first supercomputer to breach the exascale barrier, boasting 1.1 exaflops of performance and exceeding the target threshold of a quintillion calculations per second. The system will enable researchers to develop critically needed technologies for the country’s energy, economic, and national security missions, helping address problems of critical importance to the nation that lacked realistic solutions as recently as five years ago.

With a PhD in particle physics from the University of Pavia, Italy, Perazzo did postdoctoral research at another US DOE national laboratory, Lawrence Berkeley National Laboratory. He started at SLAC as a software engineer, working on BaBar, ATLAS, and the Gamma Ray Large Area Space Telescope before joining LCLS. An avid soccer fan and player, he and his wife have two boys, and they enjoy the San Francisco Bay area and all that the region offers. “You can go from the ocean to skiing and everything in between all in a short amount of time,” he said. “It’s wonderful.”

Perazzo is excited about the scientific possibilities exascale enables.

“We’re in the right place at the right time,” he said. “Computing culture is critical now, and science will benefit enormously from exascale computing.”

Use of the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515.

This research is part of the DOE-led Exascale Computing Initiative (ECI), a partnership between DOE’s Office of Science and the National Nuclear Security Administration. The Exascale Computing Project (ECP), launched in 2016, brings together research, development, and deployment activities as part of a capable exascale computing ecosystem to ensure an enduring exascale computing capability for the nation.