The SLAC Linac Coherent Light Source (LCLS) facility uses x-ray diffraction to image individual atoms and molecules to observe fundamental material processes. The near-real-time interpretation of molecular structure revealed by x-ray diffraction will require computational intensities of unprecedented scales coupled with a data path of unprecedented bandwidth. Detector data rates at light sources are advancing exponentially, and with the LCLS-II-HE upgrade, LCLS will increase its data throughput by three orders of magnitude by 2025. The objective of the ExaFEL project is to leverage exascale computing to reduce, from weeks to minutes, the time needed to analyze molecular structure x-ray diffraction data generated by LCLS.
LCLS users require an integrated combination of data processing and scientific interpretation in which both aspects demand intensive computational analysis. The ultrafast x-ray pulses are used like flashes from a high-speed strobe light to produce “stop-action movies” of atoms and molecules. Data analysis must be performed quickly to allow users to iterate their experiments and extract the most value from scarce beam time. Enabling new photon science from the LCLS will require the near-real-time analysis (~10 min) of data bursts, requiring commensurate bursts of exascale-class computational intensities.
The high repetition rate and ultrahigh brightness of the LCLS make it possible to determine the structure of individual molecules, mapping out their natural variation in conformation and flexibility. Structural dynamics and heterogeneities, such as changes in the size and shape of nanoparticles, or conformational flexibility in macromolecules are at the basis of understanding, predicting, and eventually engineering functional properties in the biological, material, and energy sciences. The ability to image these structural dynamics and heterogeneities by using noncrystalline-based diffractive imaging, including single-particle imaging (SPI) and fluctuation x-ray scattering, has been one of the driving forces behind the development of x-ray free-electron lasers (XFEL). However, efficient data processing, the classification of diffraction patterns into conformational states, and the subsequent reconstruction of a series of 3D electron densities, which allow for the visualization of how the structure is changing, are vital computational challenges in diffractive imaging.
The ExaFEL challenge problem is the creation of an exascale-based data analysis workflow for serial femtosecond crystallography. Here, the molecular structure is determined by merging the x-ray diffraction patterns from millions to billions of protein crystals exposed in random orientations. XFELs are uniquely suited for studying enzymatic reactions that involve biomolecules because the diffraction pattern is produced before the molecular structure is damaged by radiation. Fast reaction triggering with optical lasers or rapid mixing allows time progression to be observed, providing a vastly improved understanding of the reaction chemistry.
Exascale computing serves two roles in this regard. First, it will allow the diffraction pattern to be modeled with greatly enhanced detail, leading to very granular atomic resolution that will follow the path of single atoms reacting within a large molecular complex. Second, by streaming the experimental data to a supercomputing facility in real time, diffraction quality can be assessed in a matter of minutes. Such feedback into experimental decisions at the x-ray facility is critical since the x-ray beam and biological sample are both limited resources and very valuable. New GPU-accelerated software (nanoBragg) was developed to simulate diffraction patterns based on a physical model, and the remaining challenge will be to solve the inverse problem of adjusting the physical model to closely predict the observed data.
The ExaFEL stretch goal is to create an automated analysis pipeline for imaging single particles via diffractive imaging. This entails reconstructing a 3D molecular structure from 2D diffraction images by using the new multitiered iterative phasing (M-TIP) algorithm. In SPI, diffraction images collected from individual particles are used to determine molecular or atomic structure, even from multiple conformational states, or from nonidentical particles, under operating conditions.
Determining structures from SPI experiments is challenging because orientations and states of imaged particles are unknown, and images are highly contaminated with noise. Furthermore, the number of useful images is often limited by achievable single-particle hit rates, currently between 1 and 10% of the machine rate. The M-TIP algorithm introduces an iterative projection framework to simultaneously determine orientations, states, and molecular structure from limited single-particle data by leveraging structural constraints throughout the reconstruction, offering a potential pathway to increasing the amount of information that can be extracted from single-particle diffraction.
Rapid feedback is crucial for tuning sample concentrations to achieve a sufficient single-particle hit rate, ensuring that adequate data are collected and available to steer the experiment. The availability of exascale computing resources and an HPC workflow that can handle incremental bursts of data in the analyses will allow for data analysis on the fly, providing immediate feedback on the quality of the experimental data while determining the 3D structure of the sample simultaneously.
To show the scalability of the analysis pipeline, the ExaFEL team is progressively increasing the fraction of the machine used for reconstruction while keeping the number of diffraction images distributed across multiple nodes constant. The goal is to distribute the images over an increasing number of nodes while reducing the overall reconstruction time up to the point at which the analysis can keep up with data collection rates (5 kHz).