Exascale Solutions for Microbiome Analysis
Microbiomes are integral to the environment, agriculture, health and biomanufacturing, but analyzing the DNA of these microorganism communities is one of the most computationally demanding tasks in bioinformatics, requiring exascale computing and advanced algorithms.
Microorganisms are central players in climate change, environmental remediation, food production, and human health. They occur naturally as “microbiomes,” which are communities of hundreds or thousands of microbial species of varying abundance and diversity, each contributing to the function of the whole. Less than 1% of the millions of species of microbes in the world have been isolated and cultivated in the laboratory, and only a small fraction of those have been sequenced. Meanwhile, collections of microbial data are growing exponentially, representing an untapped wealth of information that could be used for environmental remediation or to manufacture novel chemicals and medicines.
“Metagenomics” — the application of high-throughput genome sequencing technologies to DNA extracted from microbiomes — is a powerful method for studying microbiomes. But the first assembly step has high computational complexity, akin to putting together thousands of puzzles from a jumple of their pieces. Following assembly, additional data analysis is needed to find families of genes that work together and to compare across metagenomes. The ExaBiome team from Berkeley Lab, Los Alamos, and the Joint Genome Institute is developing exascale algorithms and software to address these challenges and will work with the vendor community to co-design systems that have the necessary network and memory features to address these and other large scale analytics problems.