A computation being performed on one part of a large system often needs to access or provide data to another part of the system in order to complete a scientific simulation. The Partitioned Global Address Space (PGAS) programming model provides the appearance of shared memory accessible to all the compute nodes while implementing this shared memory behind the scenes using physical memory distributed across the nodes and primitives such as remote direct memory access (RDMA). The Pagoda project, as part of the US Department of Energy (DOE) Exascale Computing Project (ECP) is developing a productive and performant PGAS programming system to be deployed on exascale supercomputers.
The Pagoda project is developing a programming system to support exascale application development using the PGAS model, with a focus on supporting applications exhibiting irregular data structures and communication. There are two components to Pagoda: (1) a portable, high- performance, global-address-space communication library (GASNet-EX) and (2) a C++ template library (UPC++) that provides convenient abstractions for accessing and using the global address space. Together, these components enable the agile, lightweight communications that occur in applications, libraries, and frameworks running on exascale systems.
Pagoda enables effective scaling by avoiding the overhead of long, branchy serial code paths, leveraging hardware-offload support for RDMA, and supporting efficient fine-grained communication for both single- and multi-threaded environments. The importance of these properties is exacerbated by application trends; many applications in the ECP require the use of adaptive meshes, sparse matrices, dynamic load balancing, or similar techniques. Pagoda’s low-overhead communication mechanisms can maximize the injection rate and network utilization, tolerate latency through overlap, streamline unpredictable communication events, minimize synchronization, and efficiently support small- to medium-sized messages arising in many applications. Pagoda complements other programming models, enabling developers to focus their efforts on optimizing performance-critical communication.
The Pagoda team is focusing on developing new features that will support application and library requirements unique to the ECP and performance improvements that will enable the ECP software stack to exploit the best-available communication mechanisms. These include novel features being developed by vendors, such as remote direct memory access mechanisms offered by network hardware and on-chip communication between distinct address spaces. Recent work includes delivering accelerated communication of GPU-resident data for the systems on DOE’s exascale roadmap.
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. The PGAS programming model provides an alternative to the popular Message Passing Interface (MPI) programming model. PGAS offers significant advantages in performance as well as programmer productivity. These advantages translate into a programming model and communications library that can scale efficiently to potentially millions of processors, while still delivering high-performance on smaller platforms.
UPC++ leverages the underlying GASNet-EX communication library to deliver efficient, low-overhead Remote Memory Access (RMA) and Remote Procedure Call (RPC) on HPC systems. Thus, the ECP UPC++ software can make use of modern system capabilities such as Remote Direct Memory Access (RDMA) offload capabilities and native on-chip communication between distinct address spaces.
The ECP project focused on three guiding principles:
The ECP GASNet-EX effort updated the 20-year-old GASNet-1 PGAS codebase and communication system. The ECP solution included an implementation overhaul along with this major redesign of the GASNet-1 software interfaces.
UPC++ provides the high-level productivity abstractions appropriate for PGAS programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g., GPUs), and mechanisms for aggressive asynchrony to hide communication costs.
UPC++ has proved it is able to provide efficient, low-overhead RMA and RPC on HPC systems. This includes accelerated transfers to and from GPU memory. Demonstrations show that the UPC++ library delivers robust performance scalability, even on large modern supercomputers. Near-linear weak scaling on a distributed hash table is one example.
Numerous application use cases demonstrate that UPC++ can deliver high performance and portability on systems ranging from laptops to exascale supercomputers. Examples include: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph. See the Berkeley Lab upcxx wiki for a list of notable applications/kernels/frameworks using UPC++.
UPC++ enables the programmer to compose the powerful productivity features of the C++ standard template library seamlessly with RPCs to help the programmer create understandable and maintainable code. This includes the use of C++ lambda expressions which have many programming advantages.
Ease of use and proven application performance means that UPC++ and the GASNet-EX communications library will maintain a large application portfolio and user base in the future.