SOLLVE: OpenMP for HPC and Exascale

By Johannes Doerfert, contributing writer

SOLLVE is an Exascale Computing Project (ECP) effort focused on defining and developing OpenMP capabilities for the high-performance computing (HPC) community. Numerous HPC applications rely on OpenMP for parallelization and accelerator offloading. Through SOLLVE, HPC application developers can effectively communicate with the OpenMP standardization body and various vendors to ensure that OpenMP will remain a performant, portable, and productive programming model in the exascale era.

The SOLLVE project is deeply engaged in efforts to provide a high-quality and complete OpenMP implementation in the LLVM Compiler Infrastructure Project. LLVM, an open-source collection of compiler and toolchain technologies, serves as a test bed for proposed OpenMP extensions (e.g., the interop directive in OpenMP 5.1) and as a vehicle to provide production-quality implementations of OpenMP features that could be exploited in vendor compilers built with LLVM technology. Doug Kothe, Director of the US Department of Energy’s (DOE’s) ECP, believes that “LLVM compiler technology is becoming the nexus for vendor and community compiler development and evolution.”

LLVM compiler technology is becoming the nexus for vendor and community compiler development and evolution. — Doug Kothe, Director of DOE’s ECP

SOLLVE is one of several ECP efforts to improve the LLVM Compiler Ecosystem. As such, the SOLLVE work in LLVM is closely aligned with the LLVM/Flang development that is supported by the ECP Flang project, the LLVM improvements orchestrated through the ECP PROTEAS-TUNE project, and other ECP efforts that advance or use LLVM technologies.

As an ECP project, SOLLVE is a collaborative effort led by computer scientists and researchers at multiple DOE laboratories and US universities. As part of its overarching goal to provide an exascale-ready OpenMP, it aims to bridge the gaps between application developers, the OpenMP language committee, and implementers of OpenMP standards. According to Oscar Hernandez, a computer scientist at Oak Ridge National Laboratory (ORNL), “SOLLVE helps us prioritize the functionality needed in the OpenMP specification and the performance improvements needed in the compilers and runtimes based on application developers’ needs. To do this, we prototype extensions and optimizations to find solutions for application challenges and then propose these extensions to the OpenMP specification or commit them to the LLVM compiler as a reference implementation.”

SOLLVE helps us prioritize the functionality needed in the OpenMP specification and the performance improvements needed in the compilers and runtimes based on application developers’ needs. To do this, we prototype extensions and optimizations to find solutions for application challenges and then propose these extensions to the OpenMP specification or commit them to the LLVM compiler as a reference implementation. — Oscar Hernandez, computer scientist at ORNL

The interaction between application teams and SOLLVE could accelerate the development of a performant and portable OpenMP application for exascale machines. Furthermore, the application developers’ feedback helps identify and prioritize shortcomings in the LLVM compiler that the SOLLVE team can address. This includes new OpenMP features, such as declare variant and the assume directive; compiler capabilities, such as math and complex arithmetic in OpenMP target regions compiled for GPUs; and performance improvements in the compiler and runtimes.

Performance of different HPGMP versions.

Performance of different HPGMG versions.

A request for a feature or performance enhancement can be communicated directly to the SOLLVE team or to the OpenMP LLVM subproject. Chris Daley, an HPC performance engineer at Lawrence Berkeley National Laboratory, reported that the High-Performance Geometric MultiGrid (HPGMG) application was spending one-third of its execution time to allocating and deallocating memory on the GPU when compiled with the development version of LLVM 11. The SOLLVE team introduced a GPU memory manager that completely removed this overhead. The graph on the right shows the performance of different HPGMG versions and the impact of the new memory manager. A similar impact was observed on the GridMini SU(3)×SU(3) benchmark. As shown in the graph below, throughput improved dramatically when translated using LLVM with the new memory manager (patched, red bars) in comparison with using the baseline LLVM/Clang version (unpatched, blue). Its performance now matches that of CUDA for mid-sized and large inputs. As with other implementation improvements developed by SOLLVE, the memory manager was merged into the LLVM community compiler and runtime ecosystem, and it is available to vendors in the current development version and the coming LLVM 12 release.

 

Chart showing the impact of the memory manager patch in GridMini code

Figure 1: Impact of the memory manager patch on the performance of the SU(3)×SU(3) benchmark in the GridMini code for the QCD ECP project. Courtesy of Meifeng Lin, a Brookhaven National Laboratory computational scientist.

Even when standardization and initial implementations of OpenMP features are complete, the SOLLVE project’s efforts continue. With an ever-growing OpenMP validation and verification test suite, SOLLVE monitors how well compilers and runtime systems support OpenMP across pre-exascale and exascale test bed systems. This provides guidance to the implementers and facilities and ensures the portability of OpenMP with regard to compilers and systems.

To support the development of OpenMP applications, SOLLVE is actively improving the relevant documentation, tooling, insight, and debugging capabilities in and around the LLVM compiler framework and the associated OpenMP runtimes. Recent improvements that will be broadly available in the next LLVM release include an easier setup of an OpenMP offload capable compiler, integrated offload runtime profiling, informative error messages and runtime information, and compiler remarks that explain OpenMP-specific optimizations. Such remarks identify not only transformations that were performed but also ones that were missed and why. To make accessing the latest LLVM compiler easy, SOLLVE helps maintain an LLVM spack package and installs an up-to-date prerelease version of LLVM/Clang on all ECP test beds and various other systems. By providing these development builds to ECP application teams, the teams can comment on the newest compiler features outside the regular 6 month release cycle.

The SOLLVE project’s efforts in the OpenMP standard and implementations are complemented by training events and application engagement. Through regular (virtual) sit-downs with the HPC developers via hackathons, workshops, tutorials, and presentations at HPC venues, SOLLVE provides support from the planning phase of a project to the final performance tuning.

The full spectrum approach taken by SOLLVE to drive OpenMP design, implementations, and adoption is best described by Barbara Chapman, professor at Stony Brook University and principal investigator for the project: “Interactions between the SOLLVE team and ECP application developers have been key to the rapid evolution of the OpenMP standard. Input from, and engagement with, the applications teams remains crucial as we continue our efforts to ensure that high-quality OpenMP implementations and tools are readily available on exascale platforms.”

Interactions between the SOLLVE team and ECP application developers have been key to the rapid evolution of the OpenMP standard. Input from, and engagement with, the applications teams remains crucial as we continue our efforts to ensure that high-quality OpenMP implementations and tools are readily available on exascale platforms. — Barbara Chapman, principal investigator for SOLLVE

Recent accomplishments by the SOLLVE team include the following.

  • The adaption of critical HPC features into the OpenMP 5.X standards and LLVM compiler toolchain. Together with vendors and others on the OpenMP Architecture Review Boards, SOLLVE designed and integrated HPC-centric OpenMP features—such as the metadirective, interop directive, and dynamic context selectors—into the standards. From there, the SOLLVE team prototypes and implements these features in LLVM to make them available to users and vendors that build their compiler on top of LLVM technology.
  • The ability to extract OpenMP-aware compile time and runtime information from the LLVM compiler toolchains. Information about memory mapped to GPU devices, performed and missed optimizations, and time profiles are among the things that the LLVM compiler and OpenMP runtime provide to application developers. This and the LLVM O penMP documentation page, which provides more information and a usage description, were SOLLVE-lead efforts.
  • The availability of an up-to-date LLVM/Clang and SOLLVE verification and validation (V&V) test suite on various DOE facility machines. The former allows users to test the newest features and provide feedback, and the latter allows them to compare the OpenMP conformity of all available compilers. The V&V suite welcomes application use cases to ensure portability across compilers and systems.