ExaFEL and CoPA: Rapid Imaging of Molecular Systems

The Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory is the world’s first hard X-ray free electron laser (XFEL) facility, using X-rays to take snapshots of atoms and molecules at a specific moment in time.

When XFEL beams scatter off a target, they produce a diffraction pattern—a pattern of light of various intensities, which can be used to study the target—before the radiation damages the molecular structure of the target. This makes XFELs uniquely suitable for studying biological molecules, since an image of its diffraction pattern can be taken before the molecule is destroyed. LCLS can therefore hep scientists understand how atoms interact and move in everything from photosynthesis to the formation of chemical bonds.

A recent upgrade to LCLS—dubbed LCLS-II—enables the brightest X-ray images in the world, using a million X-ray pulses per second, each lasting quadrillionths of a second. This is an 8,000-fold increase from LCLS’s 120 pulses per second, and the ultrahigh repetition rate and brightness provide scientists with adequate resolving power to study the structure and natural variations between individual molecules. With more rapid shots, more pixels per detector, and improvements in X-ray quality, the LCLS-II upgrade collects significantly more data than its predecessor, pointing to a need for accelerating the analysis process.

“All of this means that in the future, what used to be weeks [for data turnaround] could be years, unless we invest into innovative approaches to data analysis,” says Johannes Blaschke, an application performance scientist at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. “It’s becoming necessary for certain experiments to require real-time data processing, because it will be impossible to make any decisions on a small subset of data, or to save all this data for post-processing.” This rapid feedback capability could inform on-the-fly adjustments to experiments, better utilizing LCLS resources.

High-performance computing (HPC) can help. Exascale workflows are able to process bursts of data in short periods of time, enabling efficient data processing at granular atomic resolution. The Exascale Computing Project’s (ECP’s) ExaFEL utilizes exascale computers to rapidly analyze X-ray diffraction data at SLAC, cutting analysis down from weeks to minutes, while keeping up with future data collection rates.

Collecting Molecular Images

To prevent computational data throughput from becoming the bottleneck in achieving useful experimental data, ExaFEL must deliver results as quickly as possible.

“We’re kind of used to finding workarounds to data transfer problems,” says Blaschke, the principal NERSC liaison to ExaFEL. A prime example is the Event Horizon Telescope’s 2017 delivery of the first image of the supermassive black hole at the center of the Milky Way, Sagittarius A*. Astronomers used commercial airplanes to transport hard drives to combine terabytes of data collected by various telescopes into the image. “These workarounds will not be tenable in the future as detectors and accelerators are getting more powerful.”

In stark contrast to cargo transport, ExaFEL—which provides scientists with data processing code for X-ray nanocrystallography and single-particle imaging—has a data turnaround time of mere minutes. These near-real-time results are important in assuring researchers that their experiment is not going awry, as LCLS operating time and the targets used are both costly.

Because the data rate of LCLS is so large, only HPC systems can handle the vast amount of image processing and complex algorithms required. ExaFEL can also help researchers make stop-motion molecular “movies,” stitching together snapshots of different stages of a molecular interaction at LCLS—a computationally challenging and extremely data intensive process. ExaFEL’s enhanced workflow makes this possible in as little as ten minutes.

“I like to imagine this challenge has a certain opportunity as well,” says Blaschke. “A lot of analysis techniques were not considered in the past because they would require a stupendous amount of data, or large amount of compute. Having this available now means we can finally also start to ask scientific questions we might not have asked in the past, because it would not have been practical.”

All of this would be unimaginable in a pre-HPC environment. Blaschke describes ExaFEL as a particularly intersectional challenge, and ECP brought together diverse scientists—beamline scientists, X-ray scientists, crystallographers, computing experts, and more—to come up with solutions to complicated scientific problems for everyone’s benefit.

“In addition to computing scientists and domain scientists, we need mathematicians. If you’re going to develop a data analysis tool, it’s good to have a mathematician,” Blaschke says. “That’s why I really love working on problems like this—it is taking the state-of-the-art from one area, and helping to make the state-of-the-art in another area better.”

One avenue in which mathematicians were instrumental to ExaFEL was for fast Fourier transforms (FFTs).

A Computing Lens for Diffraction

Fourier transforms are a mathematical tool that breaks a function down into its constituent frequencies, similar to breaking a musical chord down into its individual notes and their intensities. In signal processing, Fourier transforms are a step in reconstructing the structure of a target from its X-ray diffraction pattern, like simulating a lens that brings the diffracted light back together to produce the original image. To achieve this key step, ExaFEL depends on another ECP product, FFTX.

FFTX is one of three main libraries and applications included in ECP’s Co-design Center for Particle Applications (CoPA). Along with FFTX, CoPA—whose products include the Cabana Pariticle Library and PROGRESS/BML Libraries for electronic structure solvers and quantum molecular dynamics algorithm —offers a suite of particle application capabilities for molecular dynamics, fusion simulations, and more.

Standard FFT libraries cannot be easily scaled up to exascale hardware systems. FFTX not only meets this challenge, but also offers additional optimizations. While traditional vendor FFT libraries are often a “black box”—that is, they complete their task without any transparency of their inner workings—FFTX allows users to optimize the entire process that calls FFTs. Rather than forcing scientists to sequentially apply and optimize a set of operations, FFTX uses a code generation system to combine the FFT, any necessary linear operations, and an inverse FFT all into one process.

“The big advantage [of FFTX] is the integrated applications, where you’re combining FFTs with linear operators and optimizing that as a whole, reducing the communication with the computer and speeding it up,” says Peter McCorquodale, a computational scientist at Lawrence Berkeley National Laboratory, who leads the FFTX project. According to McCorquodale, this merging of capabilities speeds up the process by about a factor of four.

For scientists that only need to calculate an FFT, the FFTX system is plug-and-play. But achieving the full process of an optimized FFT along with linear operations requires some integration support from the FFTX team, which the ExaFEL project took advantage of.

In particular, the FFTX team did specific work to make the library useable for ExaFEL. Though the standard FFTX is based on a C++ interface, ExaFEL uses Python, so the team developed a custom Python interface for ExaFEL. Other FFTX integrations include the ECP projects WarpX and NWChemEx, which address plasma accelerator and biofuel problems, respectively.

“These libraries are good not just for the ECP project, but they’re good for other projects—ECP or not—for the future,” says Susan Mniszewski, the principal investigator for CoPA.

Looking Ahead at Exascale Processing

ExaFEL is designed with scalability and portability in mind, making it adaptable to future HPC systems beyond ECP and beyond the exascale era of supercomputing.

With the ongoing increase in scientific data and imaging resolution, resources like ExaFEL and FFTX will continue to extend the types of systems scientists can image and study.


Figure 1: A schematic of ExaFEL’s workflow, demonstrating how it reconstructs the structure of a target from its diffraction pattern.