Exascale MPI / MPICH

Efficient communication among the compute elements within high-performance computing systems is essential for simulation performance. The message passing interface (MPI) is a community standard developed by the MPI Forum for programming these systems and handling the communication needed. MPI is the de facto programming model for large-scale scientific computing and is available on all the large systems. Most of the US Department of Energy’s (DOE’s) parallel scientific applications that run on pre-exascale systems use MPI. The goal of the Exascale-MPI project is to evolve the MPI standard to fully support the complexity of the exascale systems and deliver MPICH—a reliable, performant implementation of the MPI standard—for these systems.

Project Details

Although MPI will continue to be a viable programming model on exascale systems, the MPI standard and MPI implementations must address the challenges posed by the increased scale, performance characteristics, evolving architectural features, and complexity expected from the exascale systems, as well as provide support for the capabilities and requirements of the applications that will run on these systems.

Therefore, this project addresses five key challenges to deliver a performant MPICH implementation: (1) scalability and performance on complex architectures that include, for example, high core counts, processor heterogeneity, and heterogeneous memory; (2) interoperability with intranode programming models that have a high thread count, such as OpenMP, OpenACC, and emerging asynchronous task models; (3) software overheads that are exacerbated by lightweight cores and low-latency networks; (4) extensions to the MPI standard based on experience with applications and high-level libraries and frameworks targeted at exascale; and (5) topics that become more significant for exascale architectures (i.e., memory and power usage and resilience).

The MPICH development effort continues to address several key challenges, such as performance and scalability, heterogeneity, hybrid programming, topology awareness, and fault tolerance. Several additional features are being developed to support the exascale machines that will be deployed, including: (1) support for multiple accelerator modes and native hardware models that will facilitate data transfers between GPU accelerators and the communication network in cases in which native hardware support is lacking and (2) offline and online performance tuning based on static and dynamic system configurations, respectively.

This team will also produce a significantly larger test suite to stress test various use cases of MPI and develop a test generation tool kit that automatically profiles MPI usage by applications via the MPI profiling interface and generates a simple test program that represents the MPI communication pattern of the application, covering basic MPI features, sanitized iterative loops, memory buffer management, and incomplete executions. These activities will help improve the reliability and performance of the MPICH implementation and other MPI implementations as they evolve.

The team will continue to engage with the MPI Forum to ensure that future MPI standards meet the needs of the Exascale Computing Project (ECP) and broader DOE applications. To achieve good performance on exascale machines, the team plans to develop new MPI features for application-specific requirements, such as alternative fault tolerance models and reduction neighborhood collectives, either through inclusion in the standard or as extensions to the standard.

Principal Investigator(s):

Yanfei Guo, Argonne National Laboratory

Progress to date

  • The Exascale-MPI team developed a high-performance, production-quality MPI implementation called MPICH. The team continues to improve the performance and capabilities of the MPICH software to meet the demands of ECP and other broader DOE applications.
  • Some technical risks that were retired include scalability and performance over complex architectures and interoperability with intranode programming models that have a high thread count, such as OpenMP.
  • Support for communication by using GPU buffers was recently added to MPICH. MPICH supports GPUs from multiple vendors, including NVIDIA, AMD, and Intel.

 

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo