The most challenging aspect associated with parallel programming is expressing the available parallelism and then mapping it onto the potentially heterogeneous computational resources on the target system while achieving high performance. The PaRSEC task-based runtime supports the development of domain-specific languages and tools to simplify and improve the productivity of scientists when using a task-based system and provides a low-level runtime that seamlessly leverages the combined computing power of accelerators and manycore processors at any scale when executing the tasks.
PaRSEC provides support to application developers to express dataflow parallelism by using domain-specific languages, tools, and maps, and it executes the resultant program on exascale systems with heterogeneous computational and memory resources. The team’s interaction with scientists focuses on building domain-specific languages that suit domain scientists needs and facilitate the expression of algorithmic parallelism with familiar constructs.
The runtime maps the resulting parallel tasks to the hardware and provides seamless support for heterogeneous architectures, accelerators, and data transfers between different memory hierarchies.
The PaRSEC team focuses on (1) increasing programming flexibility by using domain-specific languages that benefit from optimized runtime components, architecture-aware coverage of all target architectures, and efficient data movement inside and outside a single memory hierarchy; (2) extending the programming system to new composable paradigms; and (3) providing a production-quality runtime with documentation, testing, packaging, and deployment. This work enables libraries and applications developed by the Exascale Computing Project (ECP) to efficiently use exascale systems in a pure dataflow programming environment, whereas the domain scientists focus mainly on algorithmic aspects and leave the architectural details and optimizations, such as overlapping of communication/computation and data movement, to the runtime that supports the programming paradigm.
Over the ECP project lifetime, the PaRSEC team has drastically improved the runtime on multiple levels. At the lowest level, key elements were modularized and exposed for end-user control. Node-level task schedulers and GPU managers were designed that support hyperthreading to offload scheduling decisions. The communication subsystem was extended to take advantage of remote memory access hardware support and improve the general performance of distributed applications. Critical limitations on the internal representation of the tasks tracking and dependencies tracking were removed by opting for scalable, efficient, open addressable data structures suitable for shared memory parallelism on many-core architectures. Support for heterogeneous hardware was improved and includes better memory management strategies, which allows problems many times larger than the available memory on the accelerators to be solved without a significant performance penalty. Proof-of-concept integrations with libraries and applications supported by the ECP show promising performance at large scale.