The Exascale Computing Project has concluded. This site is retained for historical references.

ExaFEL

The SLAC Linac Coherent Light Source (LCLS) facility uses x-ray diffraction to image individual atoms and molecules to observe fundamental material processes. The near-real-time interpretation of molecular structure revealed by x-ray diffraction will require computational intensities of unprecedented scales coupled with a data path of unprecedented bandwidth. Detector data rates at light sources are advancing exponentially, and with the LCLS-II-HE upgrade, LCLS will increase its data throughput by three orders of magnitude by 2025. The objective of the ExaFEL project is to leverage exascale computing to reduce, from weeks to minutes, the time needed to analyze molecular structure x-ray diffraction data generated by LCLS.

Summary

Light source experiments using x-ray free-electron lasers (XFEL) allow scientists to observe the structure and dynamics of individual atoms and molecules in high resolution. Researchers use the results from these experiments to predict how various biomolecules and engineered materials interact and, ultimately, to create new technologies in the biological, materials, and energy sciences. However, light source experiments can be difficult to conduct due to high construction and operating costs for facilities, limited operating time on XFEL machines, and difficulty processing the large amount of data that these experiments generate. High-performance computing can significantly improve research efficiency at light source facilities by enabling rapid data analysis and enhanced resolution in diffraction pattern models. These improvements will help researchers more thoroughly understand fundamental chemical processes such as bonding and catalysis and help them apply this information for technological developments throughout the sciences.

The Exascale Computing Project’s ExaFEL application is built to enable near real–time analysis of data from XFEL experiments to maximize the scientific output at billion-dollar facilities such as the SLAC National Accelerator Laboratory’s Linac Coherent Light Source (LCLS). ExaFEL supports efficient methods to rapidly identify molecular structures and reconstruct them in 3D so researchers can visualize how a structure is changing over time. These capabilities reveal structures and molecular interactions with atomic-level detail and enable real-time feedback and control over experiments.

LCLS users require an integrated combination of data processing and scientific interpretation, both of which demand intensive computational analysis. Traditional analysis methods typically take weeks to process data from light source experiments, thereby limiting researchers’ ability to iterate their experiments and extract the most value from scarce beam time. Furthermore, data throughput at light source facilities is increasing rapidly as these facilities become more sophisticated, exacerbating the need for improved analysis capabilities. For example, new additions to the SLAC LCLS facility will increase its data throughput by three orders of magnitude by 2025, rendering current analysis methods untenable.

The ExaFEL team has addressed this urgent computational need by creating an exascale-based data analysis workflow that reduces analysis times from weeks to minutes. The team designed new GPU-accelerated reconstruction algorithms that have improved calculation speeds by 1,000× since 2016 while also improving image fidelity. Furthermore, the ExaFEL application is designed to scale with the increasing computational demands of next-generation light source facilities, ensuring rapid data analysis and improved outcomes in future light source experiments.

ExaFEL enables researchers to modulate their experimental parameters during runs on advanced light source machines. This new capability will improve research efficiency at light source facilities and will accelerate scientific progress as a result. Using ExaFEL, researchers at facilities such as the upgraded LCLS will gain unprecedented insight into key unknowns such as the real-time functionality of biomolecules, fundamental interactions in quantum and nanoscale material dynamics, catalysis and photocatalysis for new chemical transformation and solar energy conversion processes, and beyond.

Technical Discussion

LCLS users require an integrated combination of data processing and scientific interpretation in which both aspects demand intensive computational analysis. The ultrafast x-ray pulses are used like flashes from a high-speed strobe light to produce “stop-action movies” of atoms and molecules. Data analysis must be performed quickly to allow users to iterate their experiments and extract the most value from scarce beam time. Enabling new photon science from the LCLS will require the near-real-time analysis (~10 min) of data bursts, requiring commensurate bursts of exascale-class computational intensities.

The high repetition rate and ultrahigh brightness of the LCLS make it possible to determine the structure of individual molecules, mapping out their natural variation in conformation and flexibility. Structural dynamics and heterogeneities, such as changes in the size and shape of nanoparticles, or conformational flexibility in macromolecules are at the basis of understanding, predicting, and eventually engineering functional properties in the biological, material, and energy sciences. The ability to image these structural dynamics and heterogeneities by using noncrystalline-based diffractive imaging, including single-particle imaging (SPI) and fluctuation x-ray scattering, has been one of the driving forces behind the development of x-ray free-electron lasers (XFEL). However, efficient data processing, the classification of diffraction patterns into conformational states, and the subsequent reconstruction of a series of 3D electron densities, which allow for the visualization of how the structure is changing, are vital computational challenges in diffractive imaging.

The ExaFEL challenge problem is the creation of an exascale-based data analysis workflow for serial femtosecond crystallography. Here, the molecular structure is determined by merging the x-ray diffraction patterns from millions to billions of protein crystals exposed in random orientations. XFELs are uniquely suited for studying enzymatic reactions that involve biomolecules because the diffraction pattern is produced before the molecular structure is damaged by radiation. Fast reaction triggering with optical lasers or rapid mixing allows time progression to be observed, providing a vastly improved understanding of the reaction chemistry.

Exascale computing serves two roles in this regard. First, it will allow the diffraction pattern to be modeled with greatly enhanced detail, leading to very granular atomic resolution that will follow the path of single atoms reacting within a large molecular complex. Second, by streaming the experimental data to a supercomputing facility in real time, diffraction quality can be assessed in a matter of minutes. Such feedback into experimental decisions at the x-ray facility is critical since the x-ray beam and biological sample are both limited resources and very valuable. New GPU-accelerated software (nanoBragg) was developed to simulate diffraction patterns based on a physical model, and the remaining challenge will be to solve the inverse problem of adjusting the physical model to closely predict the observed data.

The ExaFEL stretch goal is to create an automated analysis pipeline for imaging single particles via diffractive imaging. This entails reconstructing a 3D molecular structure from 2D diffraction images by using the new multitiered iterative phasing (M-TIP) algorithm. In SPI, diffraction images collected from individual particles are used to determine molecular or atomic structure, even from multiple conformational states, or from nonidentical particles, under operating conditions.

Determining structures from SPI experiments is challenging because orientations and states of imaged particles are unknown, and images are highly contaminated with noise. Furthermore, the number of useful images is often limited by achievable single-particle hit rates, currently between 1 and 10% of the machine rate. The M-TIP algorithm introduces an iterative projection framework to simultaneously determine orientations, states, and molecular structure from limited single-particle data by leveraging structural constraints throughout the reconstruction, offering a potential pathway to increasing the amount of information that can be extracted from single-particle diffraction.

Rapid feedback is crucial for tuning sample concentrations to achieve a sufficient single-particle hit rate, ensuring that adequate data are collected and available to steer the experiment. The availability of exascale computing resources and an HPC workflow that can handle incremental bursts of data in the analyses will allow for data analysis on the fly, providing immediate feedback on the quality of the experimental data while determining the 3D structure of the sample simultaneously.

To show the scalability of the analysis pipeline, the ExaFEL team is progressively increasing the fraction of the machine used for reconstruction while keeping the number of diffraction images distributed across multiple nodes constant. The goal is to distribute the images over an increasing number of nodes while reducing the overall reconstruction time up to the point at which the analysis can keep up with data collection rates (5 kHz).

Principal Investigator(s)

Amedeo Perazzo, SLAC National Accelerator Laboratory

Collaborators

SLAC National Accelerator Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory

National Nuclear Security Administration logo Exascale Computing Project logo small U.S. Department of Energy Office of Science logo