Analyzing the DNA of microorganism communities is one of the most computationally demanding tasks in bioinformatics, requiring exascale computing and advanced algorithms.
Microorganisms are important to agriculture, food production, human health, environmental remediation, and the development of industrial products. Yet, less than one percent of microbe species have been isolated and cultivated in a lab and even less have been genetically sequenced and analyzed. High-throughput sequencing technologies are leading to a surge in the collection of microbial data, giving scientists more information than ever before. Powerful data analytics can help reveal the functions of microbial species within large communities known as microbiomes—which is critical information for developing new energy, medical, and industrial applications.
The field of “metagenomics,” or the application of high-throughput genome sequencing of DNA extracted from microbiomes, is an intensely data-dense field. Analyzing the DNA of thousands of microbial species in a microbiome, is akin to constructing thousands of puzzles with the pieces mixed together, and then determining how genes across the puzzles work together. With exascale resources, scientists will be able to analyze over a million metagenomes in search of signature genes that could serve important functions in biomanufacturing and other applications.