“Forward” Projects Boost U.S. Leadership in Advanced Computing and Artificial Intelligence

11/07/24 — Caryn Meissner, Contributing Writer

A long-term investment strategy promotes technology innovation through government–industry partnerships to revolutionize high-performance computing and advance AI capabilities.

High-performance computing (HPC) has been an indispensable research tool for accessing physical realms difficult, or impossible, to achieve with experiment alone. For several decades, the Department of Energy’s (DOE’s) Office of Science has deployed sophisticated HPC systems for solving the nation’s most pressing grand challenge problems in energy, climate change, and human health[1]. In addition, DOE’s National Nuclear Security Administration (NNSA) has adeptly applied HPC in support of key national security objectives, such as nuclear science and stockpile modernization and stewardship. Over time, HPC systems have become increasingly more complex and capable, and as each new machine has come online, scientists and engineers have taken advantage of vast increases in compute power to accelerate scientific discoveries and engineering innovation.

In 2022, Oak Ridge National Laboratory’s (ORNL’s) Frontier machine, the first of DOE’s three planned exascale systems, debuted as the world’s first announced exascale platform—ushering in an era of scientific and engineering HPC and opening avenues to scientific exploration never before achievable. Exascale systems, which can perform more than a quintillion floating-point operations per second (FLOP/s), can more realistically simulate the intricate processes involved in extremely complex applications used to study precision medicine, regional climate, additive manufacturing, biofuels production, materials discovery and design, and the fundamental forces of the universe.[2] Frontier and the more recently deployed Aurora exascale system at Argonne National Laboratory (ANL) achieved the #1 and #2 spots on the June 2024 Top500 list, respectively. (Notably, Frontier has topped this list since June 2022). Later this year, El Capitan—NNSA’s first exascale system—sited at Lawrence Livermore National Laboratory (LLNL), is projected to achieve more than 2 ExaFLOP/s of computing capability.

The Fast-, Design-, and PathForward programs accelerated the development of critical technologies needed to deliver exascale computing capability to the nation. The Frontier exascale system (left) at Oak Ridge National Laboratory came online in 2022. The Aurora exascale system (middle) at Argonne National Laboratory has been deployed and is available to early science users; and later this year, El Capitan (right)—NNSA’s first exascale system—sited at Lawrence Livermore National Laboratory—will come online to strengthen national security research.

The transition to exascale, which is 1,000 times faster than petascale, was far more than an evolutionary step in computing. The more unknowns in the mathematical calculations and the higher the complexity of the problems to be solved, the more compute power is needed to solve them, and early on, computational experts could see a problem on the horizon. With every generation of new HPC systems, fundamental principles such as Moore’s Law and Dennard Scaling were hitting their limits. “For years as technology advanced, processors became faster, yet costs remained constant,” says Terri Quinn, LLNL’s deputy associate director for HPC. Eventually, transistors could be manufactured at such small scales that they were becoming less efficient, generating more heat, and demanding more power during operation. Following the conventional approach to increasing processor speeds became impractical and prohibitive to achieving the desired three-orders-of-magnitude improvement in computing capability. The question became: how could computing capability be advanced without taking drastic steps, such as building a nuclear power plant, to produce sufficient power for exascale? The answer lay in a public–private partnership between government and industry to pioneer, accelerate, and deliver critical HPC technologies.

The core of this effort was a series of DOE Office of Science and NNSA co-sponsored programs, called FastForward, DesignForward, and PathForward, that were successively put in place over several years to spur innovation in exascale hardware and software research and development and help address key exascale challenges, such as energy efficiency, advanced processors and memory, reliability, resiliency, and interconnectivity. “Industry involvement and engagement was an essential component of getting us to exascale and enabling us to use it,” says Bronis de Supinski, chief technology officer for Livermore Computing, who served as a technical lead in all three programs and was the primary technical lead and control account manager for PathForward. Orchestrated as part of a longer term vision and investment strategy that would become an integral part of the DOE Exascale Computing Initiative (ECI) and its hallmark Exascale Computing Project (ECP), these programs would drive the exascale innovations needed to support national interests, provide options for subsequent system procurements, and boost U.S. economic competitiveness.

Engagement: An Essential Ingredient

Referred to collectively as the *Forward (Star-Forward) programs, FastForward, DesignForward, and PathForward were instituted to bring in industry leaders alongside government experts early on to participate in what was seen as high-risk, long-term research and development (R&D) efforts. Typically, in the business computing sector, technology advancement is directed at market share, and development paths are relatively flexible, in that if an idea becomes unviable, new directions are quickly forged. This environment is a striking juxtaposition to the longer-range R&D needed to bring ever-more advanced HPC systems online for big science and engineering. These systems are 5 to 10 years in the making, and their delivery requires more rigid R&D follow through to ensure that hardware and software milestones will be achieved.

Recognizing the need for vendor expertise and the departure from standard business drivers, the *Forward programs were strategic investments that would cover specific research, design, and engineering costs to help critical hardware and software technologies mature from ideas toward commercial products.[3] The DOE Office of Science’s Advanced Scientific Computing Research (ASCR) and NNSA’s Advanced Simulation and Computing (ASC) programs would fund 60 percent of the cost of the research and sometimes less, and vendors would contribute a cost-share of a minimum of 40 percent. By investing their own funds, industry participants earned the right to retain any intellectual property from the R&D programs. Hal Finkel, director of DOE’s Computational Science Research and Partnership Division, who while at ANL served as a technical representative (TR) during PathForward, says, “DOE’s general philosophy is to invest in computing technologies that are going to be first of a kind but not one of a kind. We want products that benefit the science and technology enterprise broadly and that encourage marketplace innovation because that is how we help advance our national competitiveness in computing.” On the flip side, DOE benefits from the investment in tangible ways. “DOE investments have to provide returns,” says Si Hammond, who served as a TR for PathForward during his time at Sandia National Laboratories and who is now a federal program manager for NNSA’s ASC Program. “The return on these programs was getting technology to mature and become available as products much sooner for high-performance computing than we would have otherwise, particularly with respect to exascale machines.”

Vendors for the *Forward programs were awarded funds through rigorous and stringent selection processes, each one involving an initial “Request for Information” (RFI) where vendors could provide details on how they could contribute towards the exascale goal given the projected timeframe to deployment and how they could help address any perceived technology gaps. This input gathering was followed by a formal “Request for Proposals” (RFP) and extensive reviews and evaluations of the proposals against key criteria. “The RFPs were a competitive process across many vendors,” says Matt Leininger, who leads LLNL’s Messaging Working Group for Advanced Technology Projects and served as a *Forward TR. “The participating DOE laboratories had their own subject matter experts review the proposals and provide feedback and technical recommendations to an overarching executive team who made the final determinations on funding and prioritization of the technical work.”

The *Forward programs were individually focused on key aspects of exascale hardware (plus related software in some cases) and built upon one another to bring about a holistic transformation of current state-of-the-art HPC architectures and systems engineering. “If we wanted to have an exascale computer by the early 2020s, then we needed to start years earlier to hit that timeframe,” says Quinn. Thus, while the groundwork for ECI was underway, the green light was given to have ASCR and ASC together fund preliminary investments that would accelerate exascale technology maturation in the interim. Quinn says, “FastForward was an offensive maneuver to tackle the problem early.”

Getting a Head Start

The FastForward program awards were announced in 2012 and funded five computing companies—AMD, IBM, Intel, NVIDIA, and Whamcloud (which later became part of Intel). Called FastForward 1, this first round of the *Forward program efforts provided $62.5 million to advance the development of the basic, yet critical, computing elements that would be needed for building exascale systems, including energy-efficient, low-power processors; various processor and memory designs; and storage and input/output (I/O) communication solutions. Ultimately, the technology designs and potential products were intended to reduce economic and manufacturing barriers to constructing systems of sustaining more than 1 ExaFLOP/s, including delivery of next-generation capabilities within a reasonable energy footprint. [4]

Beyond understanding the fundamental components needed for an exascale machine, R&D would also be needed to determine what overall infrastructure demands would be required. Scott Atchley, chief technology officer for the National Center for Computational Science and the Oak Ridge Leadership Computing Facility, who served as the lab TR for AMD’s node architecture work, says, “FastForward 1 was focused on processor design; essentially, what would vendors need to do to make a processor for an exascale system. FastForward 2 broadened the scope to look at node (i.e., server) architectures and memory technologies. The first DesignForward program (DesignForward 1) tackled the challenges for designing and building an interconnect capable of scaling to exascale.” In 2013, DOE announced the DesignForward 1 program, awarding $25.4 million in funding to AMD, Cray, IBM, Intel, and NVIDIA for the design and evaluation of sophisticated energy-efficient, high-bandwidth interconnect networks, which tie together the hundreds of thousands of processors and minimize the time to move data among them, along with the software required to manage and access them. R&D also extended into improving rack design and power distribution, machine temperature regulation, and the overall conceptual design of a complete system. Later on, DesignForward 2 focused on whole system optimization incorporating elements from FastForward 1 and 2 as well as DesignForward 1.”

Together, the FastForward and DesignForward programs brought the exascale picture into focus, and the successful outcomes of the initial efforts led to second rounds for both. In 2014, FastForward 2 funding awards provided $99.2 million for further development of extreme-scale supercomputer technology and specifically emphasized memory and node research. Atchley says, “The FastForward 2 goal was to take those exascale processor architectures designed during the first FastForward and figure out how to make an actual node with them.” In addition to AMD, IBM, Intel, and NVIDIA, Cray was selected to join the R&D effort for node research after focusing primarily on open network protocol standards as part of DesignForward 1. The following year, in 2015, DOE announced an additional $10 million for vendors—AMD, Cray, IBM, and Intel Federal— for DesignForward 2, which complemented and built upon the preliminary work in FastForward-1, FastForward-2, and DesignForward-1. Work was also done to understand and determine how suggested changes in system architectures could affect the scientific applications run on the next generation of supercomputers.

Ultimately, the FastForward and DesignForward programs brought the necessary momentum to make the final push to exascale. For example, FastForward 1 and 2 were instrumental in the research and advanced development of the AMD processor, accelerator, and node designs that became integral parts of advanced exascale components for Frontier and El Capitan. These programs also laid the groundwork for the Intel-developed Distributed Asynchronous Object Storage (DAOS) filesystem—a revolutionary, open-source I/O system designed from the ground up for massively distributed non-volatile memory that is deployed on Aurora to support data intensive workloads.[5] Under DesignForward 1 and 2, work matured on the Slingshot interconnect—a high-performance network designed for exascale-era computers, which began initially at Cray and continued at HPE after the 2019 merger. The programs also enabled investigation into system architectures, including aspects of what would become the Cray Shasta system architecture, which debuted in 2018 and forms the basis of the HPE Cray EX supercomputer. However, it would be the final *Forward program that would bring several promising technologies to the exascale finish, blazing the “path forward” to exascale delivery.

ECP Brings it Home

The DOE ECI, a formalized, large-scale partnership between the Office of Science and NNSA, was formed in 2016 to accelerate research, development, acquisition, and deployment projects to deliver exascale computing capability to the DOE national laboratories by the early to mid-2020s. One of the main pillars of ECI was to stand up ECP, a seven-year-long, $1.8 billion, multi-institutional effort involving more than 2,000 computing and domain experts from all DOE research laboratories, academia, and industry partners focused on delivering specific applications, software products, and outcomes on DOE computing facilities. Under the aegis of ECP, DOE once again would invest in exascale technology innovation, this time through the PathForward program. With PathForward, the awards would be larger, the commitment more rigorous, and the stakes much higher. Quinn says, “The objectives for PathForward were based on what we learned from the previous FastForward and DesignForward programs, but PathForward was all about realizing the exascale goal, taking the concrete ideas and designs for all the components and determining how to make an actual product out of it.”

The PathForward RFP was released in 2016 and focused on ways to improve application performance and developer productivity while maximizing the energy efficiency, reliability, and resilience of exascale systems. “To make the systems cost effective, we needed to drive up the energy efficiency and drive down the energy consumption,” says de Supinski. “A lot of work was done in terms of integration and other solutions—making more efficient transistors, enabling shorter distances for data movement, and other concepts that significantly improved joules-per-compute capability delivered, and the data motion required.” Other big challenges were reliability and resiliency, ensuring systems with hundreds of thousands of components could continue to operate despite individual component failures.

To meet these incredible challenges, the six recipients of PathForward contracts— AMD, Cray, HPE, IBM, Intel, and NVIDIA—were announced in 2017 and covered a broad range of technologies including processors and their instructional sets, memory interfaces, I/O node design, advanced interconnects with software to drive them, and system architecture components. These investments— totaling more than $260 million of government investment paired with over $170 million of industry funding over three years—accelerated the development and commercial availability of exascale technologies and substantially enhanced the procurement landscape for early exascale systems.[6] The program was separate from, but initiated before, the RFP to build the exascale systems along with the associated non-recurring engineering (NRE) support (called the CORAL-2 procurement). “The idea is that the NRE is more closely tied to the system build whereas the longer lead-time work, like that done through PathForward, is related to ideas that may or may not come to fruition in the actual built systems,” says de Supinski. While the first three DOE exascale systems do not include technologies from all PathForward awardees, the research increased the competitiveness of several RFP responses.[7]

Overall ECP’s PathForward activities were a stunning success, meeting all its performance objectives, which directly impacted systems procured by DOE, including the Perlmutter and Crossroads pre-exascale systems deployed at NERSC and Los Alamos National Laboratory, respectively, and ensured the deployment of the three ECI-funded exascale-capable systems (Frontier, Aurora, and El Capitan). The program also met the goal of contributing to U.S. competitiveness as technologies advanced through PathForward have become part of product roadmaps and have been incorporated into the broader market. Notably, the HPC-enabled, Ethernet-compliant HPE Slingshot network was enhanced, partially through the continued development of the company’s Cassini network interface cards with the related software and the Rosetta application-specific integrated circuits, to provide enhanced workload performance and scalability. Slingshot has been combined with the flexible and customizable Shasta system architecture to accommodate the unique, heterogeneous build specifications of all three DOE exascale systems.

For AMD, PathForward funding of the processor designs and node architectures was essential to the success of Frontier, and early ideas have come full circle. Indeed, El Capitan’s compute nodes are powered by the cutting-edge AMD Instinct MI300A accelerated processing units (APUs), initially conceptualized during the Fast and DesignForward programs. These advanced processors integrate a tightly coupled central processing unit (CPU) and graphics processing unit (GPU) into a single package, making for extremely fast compute with enhanced HPC and AI (artificial intelligence)-acceleration. Mike Schulte, a senior fellow at AMD Research who participated in all the *Forward programs, says, “Our vision of the *Forward programs was bringing our CPUs, GPUs, high-bandwidth memory, and I/O together in the same package. And with the MI300A, we introduced the world’s first APU that was optimized for high-performance computing and machine learning.”

PathForward also helped fund Intel’s development of several core DAOS features that are deployed on the Aurora system, including non-volatile memory express solid state drive support, scalable service monitoring, and a more efficient way to wire up the DAOS service. “DAOS leads the top of the IO500 production rankings, and it was a technology that gained momentum and validation through the *Forward programs,” says Intel chief architect Olivier Franza. “Aurora will be among the first systems to use it and by far the largest.”[8] Moreover, PathForward funding allowed exploration of a new model that integrates storage nodes directly into the compute fabric without sacrificing storage resilience or reducing compute resources.[9] Says Franza, “These programs allow for the analysis of a lot of revolutionary ideas…and the support of government funding provides the stability to pursue them.”

Close Collaboration and “Proxy-imity”

One of the key advantages of the *Forward programs was the collaborative environment they created. TRs from the labs—and in the case of PathForward, additional ECP representatives—worked together with vendors to advance objectives. TRs helped negotiate the contracts and track milestones, provided insight into the challenges and needs of DOE, and companies looked to them for information on use cases, expected workloads, and feedback on deliverables and expectations. Interactions were facilitated through regular meetings, presentations, workshops, training sessions, and “hackathons” where colleagues could establish coding practices and work through issues. “Hackathons allowed government participants to collaborate with us and run their workloads on our simulators to obtain preliminary results. We would also give presentations on the progress we’d made on specific technologies,” says Simon Steely, a senior principal engineer at Intel. “This method of work was invaluable to understanding how workloads would operate using new approaches.”

One of the most prominent tools for enabling co-design and expediting technology advancements was to provide vendors with proxy apps—small, simplified codes that allow application developers to share important features of larger production applications without forcing collaborators to assimilate large and complex code bases. Proxy apps are frequently used to represent performance-critical kernels, programming model use cases, communication patterns, or essential numerical algorithms.[10] “Proxy apps allowed DOE to use examples of our code to help focus and prioritize discussions with vendor research teams, which allowed developers and users to narrow down what aspects of the code needed to be focused on,” says Atchley. The utility of proxy apps was fully realized with PathForward. Through ECP, a curated collection of proxy apps was developed for use by application teams, co-design centers, and vendors to drive collaborations and achieve exascale solutions. Gabe Loh, a senior fellow at AMD Research who participated in all the *Forward programs, says, “The proxy apps were not just pieces of code to represent applications, they acted as vehicles for co-design and collaboration. They were useful for us to understand everything from the underlying physics or science being modeled to more algorithmic, code, and memory access needs. They also served as a mechanism for facilitating discussions between us and experts at the DOE national labs to troubleshoot issues and iterate on potential solutions.”

The benefits of these government–industry collaborations directly impact technology innovation. AMD senior fellow Mike Schulte says, “A lot of the work on our open-source software stack was facilitated through the co-design paradigm. The ability to get early feedback and have conversations about what was needed, what was useful, and what the pain points were, helped drive our overall software strategy.” Yet, the benefits extend beyond technology advancement; they also provide an exceptional platform for another key aspect of HPC prosperity: building the workforce pipeline. Hammond says, “These programs are a magnet for bringing talent to DOE laboratories as the work is exciting and unusual. Our teams get to work with several companies at the same time, rather than just one specific group. Opportunities like this are rare outside the labs.” Hammond’s sentiments are echoed by vendors. Mike Ignatowski, AMD senior fellow and *Forward collaborator, adds, “Long lead-time projects like the *Forward programs give stability to the HPC field. These programs have provided a space for those interested in HPC to focus on career growth.”

The Takeaways

Looking to the future, the success of ECP and the *Forward programs provides valuable lessons learned. As an example, the overall structure and funding paradigm offered participants something that isn’t typically found in traditional R&D approaches—the breathing room to take a chance on exploratory concepts. Scott Pakin, a senior computer scientist at Los Alamos National Laboratory who served as a TR during the *Forward programs and led the ECP’s Hardware Evaluation project, says, “For some vendors, especially the smaller ones, the *Forward programs enabled them to try new ideas, in particular ones they may not have tried otherwise. For the larger vendors, the programs provided an opportunity to explore concepts earlier than originally planned because they prioritized the work we were asking them to do.” Matthew Bolitho, director of architecture at NVIDIA, agrees, “PathForward gave us a mandate to do work now on technologies that we were thinking about in the future and helped us get it done.” IBM fellow Jim Sexton notes that this approach also provided a degree of “free energy” to spur innovation. “With the *Forward programs, we had time to think of ideas without having to be focused on what we were going to sell in the next three months. The programs did a great job at helping many different players think through long-term opportunities and innovation in a way to develop advanced computing at scale.”

Companies also recognize the benefit to these programs for understanding the challenges that lay ahead. Loh says, “HPC often acts as a kind of crystal ball to what may be coming down the technology pipeline, so you can glean interesting insights into where the broader industry might be headed. Developing technology to address HPC problems has great trickle-down effects into all other types of products. The impact radius of the *Forward investments extends beyond just the HPC realm.” Schulte adds, “One of the key takeaways is how impactful the *Forward programs were on our overall HPC and AI competitiveness…AMD not only created great systems for the Department of Energy, but greatly enhanced U.S. overall competitiveness in high-performance computing, AI, and energy-efficient computing.”

The *Forward programs set an extremely ambitious energy efficiency goal of 20 megawatts per ExaFLOP for the vendors. While all the vendors significantly improved the energy efficiency of their chips, which saves energy at every data center in the world, AMD was able to meet and exceed its goal. Al Geist, project director of the Frontier project and ECP chief technical officer, points out, “When Frontier was deployed in 2022, it was not only the first exascale computer in the world; it was also the most energy-efficient computer in the world, achieving the #1 spot on the Green500 list.” Thanks to AMD’s aggressive work in the *Forward programs, Frontier used only 14.5 megawatts per ExaFLOP, an improvement of more than 400 percent from the previous generation systems.

The complex software, applications, and technologies that have come out of the *Forward programs, and ECP, are truly monumental, and their implications on the world and society are just beginning. Technical work on ECP ended in March 2024, and at its completion could boast development of 25 applications, more than 70 software products, and an integrated HPC software stack containing more than 100 libraries, all of which are deployable on exascale systems. Hammond says, “We have only just begun science runs on Frontier, Aurora is still in early stages of deployment, and we are awaiting El Capitan to come on. It’s difficult to imagine everything that these systems will be able to show us in 2 to 3 years, but just the early results from Frontier show absolutely exquisite science that we could never have otherwise done and was hard to imagine even 5 or 6 years ago.”

By looking beyond conventional technology roadmaps and developing strong government–industry partnerships, ECP and the *Forward programs have provided a template for such programs in the future. Chris Zimmer, head of ORNL vendor engagements, says, “The success of the*Forward programs is needed again in the post-exascale era and that is why we are creating a series of vendor RFPs called ‘New Frontiers’ that will start in late 2024 and run through 2030.” Increased evaluation of AI capabilities, including AI accelerators, is also an area teeming with possibilities. Hammond says, “This is an immensely rich space for the national labs and the vendors to find new paths for keeping us at the crest of technology, but also pushing forward with what we can do with scientific computing as AI evolves.” ECP and the *Forward programs have shown that by expanding partnerships with industry, the depth and breadth of expertise and resources across the HPC community can be woven together in a way that yields mutual benefit and can move the needle on technology innovation from evolutionary to revolutionary. NVIDIA’s Bolitho says, “We need to build off the momentum the *Forward programs provided.”

In retrospect, the *Forward programs, sustained over 8 years, were a novel approach to R&D, and their implications are manifold. “The *Forward programs have demonstrated that by being focused far enough into the future, we can have a broad and high-impact influence on industry for the portion of the market we represent. Not many other U.S. agencies invest in companies like this; DOE is really a driver,” says Quinn. “A little investment today can make an enormous difference in the future.”

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-TR-867165.

[1] https://www.energy.gov/science/ascr/advanced-scientific-computing-research

[2] https://www.exascaleproject.org/about/

[3] https://science.osti.gov/-/media/ascr/ascac/pdf/meetings/202004/ASCR_at_40_History-ASCAC_APRIL_23_2020.pdf

[4] https://asc.llnl.gov/exascale/fast-forward

[5] https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/high-performance-storage-brief.pdf

[6] https://science.osti.gov/-/media/ascr/ascac/pdf/meetings/202004/ASCR_at_40_History-ASCAC_APRIL_23_2020.pdf

[7] https://www.osti.gov/servlets/purl/1845203

[8] https://www.intel.com/content/www/us/en/newsroom/news/architecting-future-supercomputing.html#gs.bkqwit

[9] https://www.osti.gov/servlets/purl/1845203

[10] https://www.exascaleproject.org/research-project/proxy-apps/