Exascale Day Article: Moving toward advances in brain mapping using exascale supercomputers

Argonne National Laboratory Early Science Program brain mapping research

By Linda Barney

There are many mysteries about the human brain. Being able to create comprehensive maps of the human brain to identify the location of every neuron and connections of the brain will help solve the mysteries.  

To develop an accurate map that shows these connections between brain neurons that form the communications or signaling pathways of the brain, Dr. Nicola Ferrier, a Senior Argonne National Laboratory Computer Scientist, is leading a brain mapping project titled “Enabling Connectomics at Exascale to Facilitate Discoveries in Neuroscience.” The information in the brain map is called a brain connectome. Research on brain mapping and brain signals will provide researchers with important information about brain function. The ultimate goal of brain research is determining information such as how the brain ages, and to help give insights into diseases such as autism, diabetes, and stroke. The project uses high-energy X-rays, an electron microscope (EM) to acquire images, and requires state-of-the art software and supercomputers to analyze all the data generated in the research. Ferrier states, “Our goal is to use the power of the future exascale Aurora supercomputer and neural network segmentation tools which will allow us to map neurons in larger volumes of brain tissue. 

The project is a collaboration between Argonne National Laboratory, neuroscientists at the Kasthuri Lab at the University of Chicago, researchers at Harvard University, Princeton University, computer scientists, and corporate organizations including Google. The research group is supported by the Argonne Leadership Computing Facility’s (ALCF) Aurora Early Science Program (ESP).The team is currently running computer analysis codes on various supercomputers at ALCF, but the research will require the computational power of the future Aurora exascale supercomputer that is estimated to deliver in excess of two exaflops of peak double precision compute performance. Aurora will be located at the U.S. Department of Energy’s (DOE) Argonne National Laboratory. 

The science behind brain mapping

The brain mapping process begins with taking small samples of brain tissue from a fruit fly or mouse stained with heavy metals to provide visual contrast. The brain tissue samples are sliced into extremely thin 40 nm samples and analyzed using Argonne’s powerful electron microscope. The system generates a collection of smaller images, or tiles of each brain segment—there may be thousands or tens of thousands of images. 

The image tiles must be reassembled digitally to reconstruct each slice, then stacked, and aligned to produce a 3D image. The neuron in each image must be traced through a stack of images, reconstructing the neural structure. The reconstruction steps use a convolutional neural network to create a dataset of the underlying neural structures. 

Figure 1. Left: Data from electron microscopy; grayscale with color regions showing segmentation. Right: Resulting 3D representation. (Image: Nicola Ferrier, Tom Uram and Rafael Vescovi, Argonne National Laboratory; Hanyu Li and Bobby Kasthuri, University of Chicago)

There are many areas where human error can be introduced such as how the brain sample was stained and sealed in plastic, cut, imaged, and neurons traced. It is easy to inaccurately trace the neurons and connections through the large volume of images. Once a dataset is created from the reconstructed slices, a human specialist such as a neuroanatomist, must analyze the dataset to verify if the anatomy is correct. If errors are found, human-annotated corrections are made. What is needed is a method to overcome the human intensive effort and a way to increase accuracy of the connectome brain map analysis. 

Pre-Aurora computers and software used in the research

The Ferrier team currently runs brain mapping analysis on the Theta and ThetaGPU supercomputers located at ALCF. Tom Uram, Computer Scientist at ALCF indicates that the Theta GPU system has 192 A100 GPUs and is a powerful system. The Polaris ALCF supercomputer is slated to come online during 2022 and the team plans to run their analysis on this system. Polaris is a Hewlett Packard Enterprise (HPE) testbed system that will provide a platform utilizing several of the Aurora technologies and similar architectures to provide users with a platform for early exascale scaling and testing. The final goal is to be able to run brain mapping research on the future Aurora supercomputer which is a significantly more powerful exascale supercomputer. 

Uram states, “Researchers doing the analysis may be located globally and need access to the reconstructed data. A web-based tool called Neuroglancer can be used to make the reconstructed neuron data available on a web browser—think of it as Google Maps for brain data. Using the tool, researchers can perform remote analysis by panning through a volume 3D neurons and trace, connect, and analyze the data. This tool is useful at the end of the proof-reading human analysis process.”

“The team’s technical work includes creating computer vision algorithms for aligning neighboring tissue slices to determine how images should be adjusted so they are correctly aligned and can trace neurons properly. We are talking with Intel about how to speed up this process so that the code can run with high occupancy using Intel GPUs on Aurora,” states Uram. On Aurora, the team’s code will run mainly on the Intel Data Center GPUs (codenamed Ponte Vecchio). The Intel Xeon Scalable processor, codenamed Sapphire Rapids with high bandwidth memory) will act as a data pre-processor before moving data to the GPU. Both the Intel CPU and GPU will be equipped with high bandwidth memory (HBM) designed to improve memory usage. Uram sees an opportunity to use HBM during the alignment between the slices phase to keep sections in memory for a longer time as a placeholder for segmentation.

Many of the brain mapping codes are written in C++, so the team will use Intel’s Data Parallel C++ (DPC++) compiler, a oneAPI implementation of SYCL, to expose parallelism and vectorization to effectively use the Aurora GPUs.  

The team performs tracing of neurons using code created by Google called Flood Filling Networks (FFN) which is a neural network implemented using Tensor Flow. The team is optimizing this code for node-level performance, parallel efficiency, and input/output handling using Theta and Polaris at Argonne. There is an effort within Intel to make sure that Tensor Flow effectively uses the Intel GPUs based on the optimized Tensor Flow code that Intel delivers.

The team has also done some testing using Intel Distributed Asynchronous Object Storage (DAOS) which is the foundation for Intel’s exascale storage. The team’s work generates large volumes of data which is decomposed into sub-volumes. Retrieving sub-volume data requires a read/write operation for every sub-volume. The team is exploring using DAOS to tune the optimal way to perform the read/write task based on the best performance.

Ferrier indicates, “The Intel-led cross-industry oneAPI will aid in our work because we are currently doing the work on the ThetaGPU system that will be transferrable to Aurora.”  

Future brain mapping research requirements

The current methods used in brain mapping are complex and require extensive work by humans. Being able to obtain detailed information of brain neural connections is important for answering many unresolved questions about the brain. 

“Having the power of the exascale Aurora supercomputer will allow us to do braining mapping research that is not currently possible. It is critical that we use the power of exascale supercomputers, AI, and neural networks to evolve faster ways to do brain mapping research. Using these advanced tools, we want to do brain mapping research on multiple mouse brain samples to obtain data for comparative analysis,” states Ferrier. 

The ALCF is a DOE Office of Science User Facility.