By Scott Gibson
Hi. In this podcast we explore the efforts of the Department of Energy’s (DOE’s) Exascale Computing Project (ECP)—from the development challenges and achievements to the ultimate expected impact of exascale computing on society.
This is the third episode in a series on Frontier, the nation’s first exascale supercomputer. Our guest is Rick Griffin of Oak Ridge National Laboratory (ORNL). Rick is a veteran electrical engineer who has helped install more than fifteen supercomputing systems at ORNL. With each design of electrical infrastructure for a new system, he says, comes greater know-how that can be applied to the next system. His latest electrical design challenge involves the unprecedented power requirements of the Oak Ridge Leadership Computing Facility’s Frontier—the nation’s first exascale-class supercomputer. I interviewed Rick on September 29, 2021.
Our topics: An overview of designing the electrical infrastructure for Frontier, the value of communication and open-mindedness in finding solutions to challenges, Rick’s assessment of progress, and more.
Gibson: We welcome Rick Griffin to the program. Rick, please give us a synopsis of your role as electrical engineering specialist at ORNL and a description of what your responsibilities are in preparing for Frontier, the nation’s first Exascale supercomputer.
Griffin: Thank you, Scott. As an electrical engineering specialist, I am primarily responsible for working with customers to generate different configurations of IT equipment and room layouts for different IT systems. We get information on IT equipment from vendors and use that to generate reasonably detailed layouts. With the information and information from other sources we can size the electrical and cooling system that will be needed to operate a system. That allows us to identify quantity and sizes of equipment and how big an impact each system will have on our facilities. For larger systems, this early effort is needed to ensure that a system isn’t purchased that is too big for available space or uses too much power or cooling.
During the procurement process for the IT equipment, I’ll recommend requirements for the IT equipment electrical systems that incorporate lessons learned from years of experience with high-performance computing and then review submittals to see if these requirements are addressed. Preparation of the facility for a new IT system will begin next and to support that effort I will generate statements of work describing what the design build contractor is to do. I then review proposals and estimates from the contractor and iterate with the contractor over requirement and design approaches that will provide a design and installation that optimize reliability, safety, and efficiency. This iterative process continues throughout the construction phase of the project to accommodate changes and incorporate better ideas. When the installation is completed and in operation, I help investigate any problems that occur. The solutions and fixes that come out of these investigations get added to my list of lessons learned for the next project. Basically, this is what I did to prepare for Frontier.
Gibson: What’s involved in designing the electrical infrastructure for Frontier?
Griffin: First, we had to determine the maximum power the machine and its support systems would draw. We knew early on it would be in the 30MW range so we assumed a 40MW distribution system would be needed. Next, we had to determine how to get that much power from our existing substation while making sure the power available from the substation after Frontier would be accessible for future IT and other site loads. This effort resulted in a separate construction project to install two new 13.8kV overhead power lines from the substation to the computer facility and to modify the substation to accommodate these new feeders.
The facility where Frontier is being installed had two 13.8kV feeders supplying it, but these two feeders didn’t have the capacity to supply the megawatts required for Frontier and other facility support and IT loads. With a good idea of the amount of power we would have to supply, we identified office space around the computer room where Frontier would be located that would have to be converted to utility space to house unit substations to supply power to the computers. Several things had to be considered in preparing the requirements for the electrical distribution system. Equipment and circuits were configured to maximize the availability of what is essentially a Tier 1 system. For example, a 480V/5000A busway was used instead of 11 conduits each with a total of 110 cables between each unit substation and switchboard that distribute power to the computer racks. This minimizes the number of connections and cable exposure.
Another example is the use of individual cables from switchboards to computer racks to supply power instead of a plug-in busway. If a cable has a fault, it affects one circuit and can be isolated and repaired without having to take down a group of racks. We did extensive modeling of the distribution system to minimize arc flash hazards and collateral damage in the event of a fault, to ensure that voltage drop wouldn’t be a problem and to optimize coordination between overcurrent devices. Another consideration that had to be addressed was the routing of cables from switchboards to computer racks in the ceiling space of the computer room. Most of the cooling water piping is under a 3-foot raised floor, and all the electrical distribution, except for a few circuits, is overhead. With a total of 77 compute racks and four 150A circuits per rack, we had a total of 308 circuit to route to the computers. This routing had to be done in a way that would allow expansion of the number of racks for future upgrades or new systems.
Gibson: Your work no doubt entails a lot of problem-solving. How do you approach tapping into the expertise of your staff to find solutions?
Griffin: Basically, it takes a lot of communicating with cognizant people to solve problems. We have a small number of engineers supporting the Frontier project and each of us is very familiar with activities and requirements outside our disciplines. Another aspect of working together is the notion that the best idea is the best idea. We sideline our egos, pride, and emotions and concentrate on providing the customer with the best system we can. So, if I’ve worked on a design for 6 months and someone walks in the door with a better idea, I am all for it. Of course, there are a lot of other considerations but the best idea is the best idea.
Some time back the word synergy was used a lot, and I think it really applies to how we get things done. For example, when I distribute designs, requirements, etc. I expect to get comments back and if they pick apart what I have done, then so be it. I fix my problems and move on and there are no hard feelings. I tell everyone—especially the craft people—if they see something I have done wrong or have a suggestion as to how it can be done better, please tell me.
Gibson: How do you feel about progress and accomplishments at this stage of the preparation for Frontier?
Griffin: I feel very good. The great advantage we have on this project is that we can work very closely with the design build contractors, IT personnel, and vendors and make changes along the way that improve the final product. We have done quite a bit of that. We also have an approach to quality assurance that is paying dividends. As our contractors complete their work, our ORNL craft personnel go behind them and check their work. This has really improved the quality of our installations and as a bonus provides the facility craft people a chance to get experience with the systems before they are turned over to the lab. We are currently installing computer racks and everything is going as planned.
Gibson: Is there anything I haven’t asked that you wish I would have?
Griffin: Yes, and that question would be, ‘What makes this project different?’ To that I would say the following: a long history of successful installations of large computer systems—a success-oriented relationship with our customers, contractors, vendors, and project personnel; a very good familiarity with our facilities and the components we install; an aggressive lesson-learned process; effective communications between customers, contractors, vendors, and project engineering personnel; and an expedited decision-making process.
Gibson: Thank you, Rick.
Griffin: Enjoyed talking to you.
- Pioneering Frontier: Rick Griffin: Keeping the Power on
- The Pioneering Frontier article series
- The Road to Exascale
- Exascale Computing’s Four Biggest Challenges and How They Were Overcome
Frontier Construction Features:
- 09/29/21—Stunning Specs: What’s Inside the Nation’s First Exascale Supercomputer Facility?
- 05/20/21—OLCF Announces Storage Specifications for Frontier Exascale System
- 12/11/20—Building an Exascale-Class Data Center
- 09/23/20—Powering Frontier and complementary Photo Story
- 01/26/20—Making Room for Frontier