Exascale systems are characterized by computer chips with many cores, a smaller amount of memory, and a range of various architectures, which can result in decreased productivity for library and application developers who must write specialized software for each system. The Kokkos/RAJA project provides high-level abstractions for expressing the necessary parallel constructs that are then mapped onto a runtime to achieve portable performance across current and future architectures, freeing developers who adopt these technologies of the burden of writing specialized code for each system.
The average full-time developer writes about 20,000 lines of useable code per year. Most applications require about 500,000 lines, on top of relying on other libraries that can total millions—like mathematical libraries responsible for certain types of calculation, or visualization libraries that create scientific images. This presents a problem in high performance computing (HPC), where a new system is deployed every few years, and rewriting applications for each new system is an expensive solution.
The Exascale Computing Project (ECP)’s Kokkos project provides a unique set of semantics that allows developers to write their code independently of the HPC platform being used. Implemented in C++, a common and powerful programming language, the Kokkos model does not depend on the implicit differences between systems. With a common interface, Kokkos enables developers to write their code just once and port it from platform to platform, saving significant amounts of time and money.
At least 1,200 developers from over 150 institutions use Kokkos for a total of about 300 different HPC projects. Within ECP, over 50% of the projects that use C++ rely on Kokkos for portability and longevity.
Part of the success of Kokkos is due to how easily it interfaces with other products, such as the math libraries and other tools developed by ECP. These tie-ins provide users with the ability to write larger applications with a much smaller time investment.
Kokkos has a strong history of engagement with the HPC community, providing tutorials and training materials to its wide range of users. Its legacy will continue beyond ECP, as there are currently no other solutions for C++ portability that operate at its scale.
Library and application developers are confronted with the challenges of inventing new parallel algorithms for many-core chips while learning the different programming mechanisms for each architecture and creating and maintaining specialized performant code for each. Adapting libraries and application software as the architectures evolve and become more complex to attain improved performance is a large time investment. The purpose of the Kokkos/RAJA project is to provide portable abstractions that developers can adopt to reduce or eliminate this overhead and improve their productivity.
Kokkos provides a C++ parallel programming model for performance portability that is implemented as a C++ abstraction layer, including parallel execution and data management primitives. RAJA provides various C++ abstractions for parallel loop execution and supports constructs to reorder, aggregate, tile, and partition loop iterations and complex loop-kernel transformations. RAJA’s companion projects Umpire and CHAI provide portable memory management and smart data motion capabilities. Application and library developers can implement their code by using Kokkos/RAJA, which will map their parallel algorithms onto the underlying execution mechanism by using existing parallel programming models, such as OpenMP.
The Kokkos/RAJA team is focused on developing and optimizing backends to support the Aurora and Frontier systems. These backends will ensure that libraries and applications built with the Kokkos/RAJA abstractions will run and achieve high performance on these exascale systems without requiring the library and application developers to change their code, even if these architectures require their own custom programming mechanism.
Exascale computing systems—high performance computers that can run at least 1018 calculations per second, like Aurora at Argonne National Laboratory, Frontier at Oak Ridge National Laboratory, and the upcoming El Capitan at Lawrence Livermore National Laboratory—require specialized software. In other words, if a programmer writes code for a standard HPC system, it does not easily transfer up to systems at the exascale level. This ends up presenting complex challenges and a very large time investment for developers, who may need to rewrite their code.
RAJA, a software technology developed by the Exascale Computing Project (ECP), helps programmers with necessary code transformations. RAJA allows developers to write their code in C++—a versatile and commonly used programming language—to recompile their code to run on different high performance computing systems without major changes to the source code. This provides the developers with flexibility: They can use a language familiar to them and only consider the backend details of their system to the extent that is necessary for advanced performance tuning. For developers, it’s simply a matter of substituting a C++ language type in their code with another, indicating which system it will be running on.
Used in at least eight different ECP projects and many other Department of Energy laboratory applications, RAJA provides a significant return on investment when preparing codes for new computing platforms. In some cases, RAJA has enabled a more than 20-fold reduction in application preparations for Lawrence Livermore’s upcoming El Capitan supercomputer.
RAJA will continue to be developed and widely accessible beyond the closeout of ECP and will continue its impact on computing platforms within the Department of Energy. Its use for educational purposes in university coursework teaching parallel programming will also help train the next generation of computing experts.