The most challenging aspect associated with parallel programming is expressing the available parallelism and then mapping it onto the potentially heterogeneous computational resources on the target system while achieving high performance. The PaRSEC task-based runtime supports the development of domain-specific languages and tools to simplify and improve the productivity of scientists when using a task-based system and provides a low-level runtime that seamlessly leverages the combined computing power of accelerators and manycore processors at any scale when executing the tasks.

Project Details

PaRSEC provides support to application developers to express dataflow parallelism by using domain-specific languages, tools, and maps, and it executes the resultant program on exascale systems with heterogeneous computational and memory resources. The team’s interaction with scientists focuses on building domain-specific languages that suit domain scientists needs and facilitate the expression of algorithmic parallelism with familiar constructs.

The runtime maps the resulting parallel tasks to the hardware and provides seamless support for heterogeneous architectures, accelerators, and data transfers between different memory hierarchies.

The PaRSEC team focuses on (1) increasing programming flexibility by using domain-specific languages that benefit from optimized runtime components, architecture-aware coverage of all target architectures, and efficient data movement inside and outside a single memory hierarchy; (2) extending the programming system to new composable paradigms; and (3) providing a production-quality runtime with documentation, testing, packaging, and deployment. This work enables libraries and applications developed by the Exascale Computing Project (ECP) to efficiently use exascale systems in a pure dataflow programming environment, whereas the domain scientists focus mainly on algorithmic aspects and leave the architectural details and optimizations, such as overlapping of communication/computation and data movement, to the runtime that supports the programming paradigm.

Over the ECP project lifetime, the PaRSEC team has drastically improved the runtime on multiple levels. At the lowest level, key elements were modularized and exposed for end-user control. Node-level task schedulers and GPU managers were designed that support hyperthreading to offload scheduling decisions. The communication subsystem was extended to take advantage of remote memory access hardware support and improve the general performance of distributed applications. Critical limitations on the internal representation of the tasks tracking and dependencies tracking were removed by opting for scalable, efficient, open addressable data structures suitable for shared memory parallelism on many-core architectures. Support for heterogeneous hardware was improved and includes better memory management strategies, which allows problems many times larger than the available memory on the accelerators to be solved without a significant performance penalty. Proof-of-concept integrations with libraries and applications supported by the ECP show promising performance at large scale.

Principal Investigator(s):

Hartwig Anzt, University of Tennessee, Knoxville

Progress to date

  • The PaRSEC runtime has been continuously improved to support diverse exascale architectures and was integrated with other program models/frameworks and with performance and correctness tools.
  • The team also designed programmatic interfaced-to-prefetch data on accelerators that provide memory management advice to the accelerator engine to improve scalability and performance.
  • The team dedicated efforts to improve software quality and usability. Tutorial material was created to facilitate user adoption along with developer and user documentation. To ensure that users have reliable access to all runtime system capabilities, continuous integration tools were streamlined in the development process.
  • PaRSEC enables other software to define dependency between tasks in a portable way by taking over the scheduling and execution of computational tasks and data movement in hybrid environments. Some of the users include the following.
    • MPQC: PaRSEC supports the MADNESS runtime (NWChemEx) and integrated heterogeneous-memory-aware algorithms via MADNESS and TiledArray.
    • DPLASMA: PaRSEC allows for the automatic support for heterogeneous environments for highly scalable and efficient dense linear algorithms.
    • HiCMA: This supports dynamic scheduling over heterogeneous resources for sparse and low-rank linear algebra and facilitates the implementation of multiprecision algorithms.

National Nuclear Security Administration logo U.S. Department of Energy Office of Science logo