ADEPT introduced to improve large-scale bioinformatics data analysis

Researchers have introduced ADEPT, a novel domain-independent parallelization strategy that optimizes the Smith-Waterman algorithm for DNA and protein sequencing on the heterogeneous architectures and GPUs of petascale supercomputers. The work contributes to the Exascale Computing Project’s ExaBiome project, which develops scalable GPU-optimized tools for computational problems in metagenomics and specifically for studying microbial communities at unprecedented scales and fidelity. The team demonstrated ADEPT’s capabilities with two ExaBiome applications: MetaHipMer, a metagenome assembler, and PASTIS, part of the protein clustering pipeline. ADEPT performed up to 10× faster on GPUs compared to CPUs and other GPU-based applications and resulted in 10–30% speedups in application performance, enabling large-scale bioinformatic data analysis and machine learning on genomic and proteomic data. The work was published in the September 2020 issue of BMC Informatics.

Dynamic programming algorithms, typically used for DNA and protein sequencing analysis, are computationally expensive, making them good candidates for improved methods harnessing GPUs; however, their irregular data access patterns and communication make optimizing via GPUs challenging. Multiple SIMD strategies for these algorithm are available for CPUs, and existing GPU-based strategies have been optimized for a specific type of characters such as nucleotides or amino acids or for a narrow group of application use cases (i.e., they are domain or application specific). ADEPT provides a GPU-based unified solution to these bioinformatics kernels and will play a significant role in exascale bioinformatics applications while inspiring new application development within the bioinformatics community to leverage accelerator-based computing. The development of ADEPT has enabled the analysis of environmental data sets at the Joint Genome Institute, for example, and provides the first production-quality tools of their kind that can effectively use DOE’s distributed-memory supercomputers.

 

M. G. Awan, J. Deslippe, A. Buluc, O. Selvitopi, S. Hofmeyr, L. Oliker and K. Yelick. “ADEPT: A Domain Independent Sequence Alignment Strategy for GPU Architectures.” MC Bioinformatics (2020) 21: 406. https://doi.org/10.1186/s12859-020-03720-1