Applying Graph Algorithms to Solve Key Science Problems of Importance to the Nation

Proteins are the workhorses of life. Protein functions can be determined from protein sequences (structure). High-throughput sequencing of genomes and metagenomes has dramatically expanded the sequence data that is now available. However, function for a large fraction of proteins is currently unknown, and therefore, computational annotation of metagenomes has become a critical task. Construction of protein similarity graphs (networks) enables efficient prediction of functions. The figure illustrates the rendering of a protein similarity network constructed from the PATRIC database with 170K vertices (proteins) and 1.7 million edges, where colors indicate clusters (or communities) of proteins. Only the largest component is show in the figure. Credit: the ExaGraph co-design center, Exascale Computing Project