A Conversation with Charlie Catlett of Argonne National Laboratory and the University of Chicago
Charlie Catlett is director of the Urban Center for Computation and Data at Argonne National Laboratory and the University of Chicago. He spoke with ECP Communications at the ECP Second Annual Meeting in Knoxville, Tennessee, in February 2018. This is an edited transcript.
Your ECP research is centered around the use of advanced computing technology to explore science-based approaches to sustainable operation of cities. What does that mean?
Well, I would even step a little bit back or up from operation and think about the long-term design and planning issues around cities and, of course, how those impact operations. Our work in the Exascale Computing Project is exciting to me because I started around 8 years ago interacting with urban designers and planners and the city officials that they were working with, and I found that so many decisions such as a bike lane program or a new rapid bus transit or a new campus of buildings are made based on heuristics rather than actual modeling and looking at data. And in interacting with these city officials, they told me yes, these are rules of thumb because we don’t have a better way to do it. I’m excited about exascale because we’re able to bring models together finally that relate to these urban systems—a transportation model, a social economic model, the urban canopy and the urban heat flow through the urban canyon, building models, grid models. So we’re able to bring these existing tools and look at how we could put them together in a way that would allow these designers, planners, and officials to say, okay, if we put this major investment in place, it’s likely to have this effect on congestion and this effect on our electric grid, and so we can sort of plan for it. And that’s important because these decisions that cities are making are decisions that are going to be with us for decades. There is tremendous demand for better modeling. Sometimes you make the right choice, and sometimes you make the wrong choice, but you don’t often know until a few decades later that you made the wrong choice.
I saw somewhere the term urbanization, which I wasn’t familiar with. Can you explain that for me?
Well, I’m a computer scientist, not an urbanist, but I hang out with them. So to me, urbanization is a couple of things. In the Western world, or I’d say developed economies, it’s a combination of influx into the cities that we have and refurbishment or rezoning, rebuilding of parts of our city as the city evolves. In developing economies, it’s all of that plus the expansion geographically of cities in a planned way, as you see in some cities, but in many countries and economies, that expansion’s happening in an unplanned way. So we have unplanned settlements outside of major cities in Africa or India. Urbanization is a bigger set of challenges in the developing world than in the developed world, but it’s still a challenge for us in US and European cities and Japan.
So you’re running these computer models. Why do you need exascale?
There are a couple of reasons that we came to the exascale table with these models. One is that some of the models that we want to run already require exascale resources to do the models at the scales we need. And so an example of that is the urban atmosphere. We want to know about the heat transfer between buildings in a downtown area, and we want to know how that air and heat flow through the buildings. And if we want to use a climate code, the resolution is way too large for that to happen. So we’ve got to scale down in terms of higher resolution.
The second reason we have to scale those codes is that we want to couple that urban temperature gradient and flow with building energy models, and so we would really like, ideally, to know that from the 1st to the 10th floor, this is what the temperature looks like outside the building. From the 10th to the 20th, this is what it looks like, because buildings are not managed as a monolithic thing but on a floor-by-floor and zone-by-zone basis. So if you want to manage the HVAC in a building, you really want at least some decent resolution as to what’s hitting that building on the outside.
And then there are the other codes, like the building energy demand codes. That’s not an exascale code if you’re just running it to model a single building. Even for a large building, it’s not an exascale code. If you want to model 100,000 buildings and you want to model their shading and the heat radiation between those buildings before you even start to think about district cooling and heating, then you’ve got 100,000 objects that are interacting in some way, and it starts to become an HPC [high-performance computing], if not an exascale, code. That’s our focus with the Exascale Computing Project. By the way, we’re a seed project, and that means that we received some startup funding—and our decision working with exascale leadership was to really focus, in this first year that we just finished, on the data flow between the models to understand not just the rates of flow or the volume of flow but the content of that. In other words, what do I need to provide from a building energy model to a transportation model or vice versa? What is the content of that information that’s flowing between models and then what’s the volume? What’s the flow?
In addition to the data flow, the other thing that we’ve been focusing on with exascale is that these models all run at very different rates. So you’ve got an urban atmosphere model that may take an hour to model 10 minutes worth of flow. And meanwhile, in that same hour, you could model a week’s worth of building energy. So we need to understand how to couple these models and match their run rates through a combination of asynchronous sharing with a file system and adjusting the amount of resources each one gets. So if a code runs 10 times faster than another one, we put 10 times as many cores on it so that they can sort of run at roughly the same rate. The challenge is when a code runs 1,000 times faster than another one, and then that’s where this data hub comes in.
I’m glad you brought up the speed term here. I think of this project as really being more about the handling of massive quantities of data than turning something around fast, which is what we typically think of with an exascale application. Is that a good assumption?
You know, I think that’s fair, but I would add to that. So it is a lot about the movement of data and understanding what needs to move between these models.
Let me explain where the timeliness comes in. The first time I thought about doing coupled urban models I was sitting in a room with a bunch of different groups that were designing a new second downtown for Chicago. It actually didn’t get built, but they were designing it, and it was a well-respected urban design firm. It was a firm that did storm water and surface engineering. It was a firm that did transportation. There was another one that did CFD [computational fluid dynamics] flow between buildings to look at heat islands and things like that. So all these different experts from different fields related to urban were there, and the way that they coupled their work together was to come together every 4 to 6 weeks, show each other PowerPoint slides, and then when they were done, the chief architect on their team would say something like, “Well, I wonder what would happen if we moved this bridge from here to here or if we made this tower a 30-story residential instead of a 5-story shopping mall.” And there would be those “what-if” discussions, and there would be a handful of those, and the answer at the end of the day was, okay, we’re all going to go off and come back in 4 to 6 weeks and give you the answer to that what-if. And I actually proposed a question: “Could you imagine if you had a way to computationally model what you’re going off to do in a way that’s coupled together so that those four or five questions of what-if could be answered in the meeting room, or at least within a day or two?”
It made me think of Larry Smarr. Early on, Larry would say one of the reasons we want HPC is because we can get answers faster, which means we can ask more questions. And it was with that sort of, you know, Larry’s rule, the Smarr law, in mind that I thought, wow, what if these guys could ask a dozen questions in their meeting and a dozen more and come back a week later? Then wouldn’t we have better designs because we would have explored the solution space in a much more effective way? So that’s part of why we want exascale—to be able to run these models, maybe not interactively but to be able to run some scenarios. And maybe that team in the room with their dozen questions would send us a dozen questions in terms of scenarios, and we could run them and give them some insight from the models.
It sounds like a pretty daunting task when you put it that way. About how many researchers do you have, or could you talk about collaborators that you have on the project?
One of the really exciting things about the Exascale Computing Project is that all of us from Argonne, from Oak Ridge, from Berkeley, et cetera, have the opportunity to look across the DOE [US Department of Energy] laboratories and find groups that are doing the best research in each of these areas. So our team includes people here at Oak Ridge who are doing social economic modeling and transportation. We’ve got some of the best climate modeling folks at Argonne working on that piece. The Berkeley folks are working with NREL [National Renewable Energy Laboratory] very closely on the building models. So if I were doing this project at Argonne, that’d be great. I could reach across Argonne and get all those pieces, but to be able to reach across the entire lab complex and pull in those pieces, it just gives us a much more powerful team of people we can draw on. And we see that growing as well. We’ve got a number of folks out in universities who we would love to be able to pull into the project. We’re working out how to do that. So to me, it’s exciting. The Exascale Computing Project, from my point of view, has given me the ability to pursue this dream with a bunch of other people who have the expertise to make it happen.
Are you taking advantage of any of the computer allocations that have been made available through the ECP?
We are. Most of last year, we were really looking at the data flows. This year we’re starting to do the benchmarking on the codes, and today in our breakout, we heard from all of our different teams and looked at how they had used the different resources at the labs in the program to do some initial benchmarking. One of our tasks this year is to do enough benchmarking that we can really nail down the baseline of where we are today and validate the projected capability that we want to have with exascale. And to do that, we really need to have those benchmarks in place. All of our teams, working on their specific codes, are at various stages of doing their respective benchmarking with different size codes or different size problems, and we saw some of those results here at the annual meeting. We see more as they’re coming out. We just put in a request for a fairly large allocation for the team that we will be burning through in the next few months.
Are there any significant accomplishments that you can talk about at this point? Any milestones?
Yeah. I think one thing that’s been really important is that we now have gone from the conceptual idea of modeling, or of coupling these models, to knowing exactly what we want to move between the models and having a very precise picture of what that’s going to need from a technical point of view. And based on that, we can now start to do the design of that interconnection between the models, which will rely on a combination of messaging as well as using a data hub. And we’ve gotten to the point now where we can start to design that hub in an informed way based on the specific models that we’ve chosen.
And the other thing I would say that is happening right now is we’ve gone from being the junior seed project last year, just trying to get our feet wet, to having to find our problem space and our potential solutions, to the point that we can now meaningfully interact with the other exascale project teams. As an example, we found out this week the guys right down the hall from me doing CANDLE [CANcer Distributed Learning Environment], Rick Stevens’ group, are doing a similar idea of a data hub. So we’re going to be looking at their architecture to see if it would work for our problem, and we’re going to look at that with help from the CODAR [Co-Design Center for Online Data Analysis and Reduction] group, who are doing some similar things. This meeting in particular, this second meeting now that we’ve got this year and more under our belt, we feel like we’ve learned enough to be able to know how to look for help throughout the rest of the project and start to pull in those tools. This is going to accelerate what we’re doing.
From this discussion, the potential impact on the livability of cities and the people there sounds evident. But could the effects of the ECP Urban project be more than just neater, cooler buildings, more-efficient traffic flows, and things like that?
Yeah, there are some questions that cities still have to grapple with, like these other questions of infrastructure, that are relying on heuristics. An example would be what happened in Chicago over the last decade, or less than a decade, last 7 or 8 years, is miles and miles—hundreds of miles—of new bike lanes. So if you rewind and say what should we do in terms of bike lanes, the arguments would be along the lines of, well, you put a bike lane in, it’s reducing the amount of space that you have to move cars through, so aren’t you going to increase congestion? If you put a bike lane in, will people actually use it? What we found in Chicago is that putting lots and lots of miles of bike lanes in has not caused more congestion. It has caused more people to be riding bikes. It has actually increased—improved—bike safety. So you’ve got more people riding bikes just because you made this infrastructure decision about bike lanes, and you can think of that as a transportation decision. But it’s really also about transforming cities to promote healthy activity like biking.
The other way that we’re looking at all of this is we’ve chosen to start with four models—transportation, social economic, buildings, and urban atmosphere. There are lots of other models about cities. There’s disease flow models or disease spread models. There are certainly electric grid models and water models. And we are looking at urban heat flow, but there’s also urban air quality. And what we find in any major city, probably in smaller cities too, but certainly in any major city, are pockets of that city that have a particular challenging air quality problem. It may be because of the topology of the city or prevailing winds. Maybe it’s because of the location relative to pollutant generators like factories, or even restaurants can produce air issues. And we want to be able to work with cities to also say what do you do if you’re going to try to improve the air quality, let’s say, on the Southwest side of Chicago? What do you do? It’s a different question than what Beijing or Paris or London would ask—not that they wouldn’t ask this question. But they’re also asking daily questions like how do we improve the regional air quality. The use of odd-even license plates is one approach that they’ve used. And these are policies aimed at getting the average to be lower in terms of pollutants, but they don’t get at the neighborhood-level air quality problems. So if we can have a computational model that gives us resolution at the block level of the flow of different pollutants and how that relates to traffic and weather and other activities, we can start to give cities the tools that they need to develop policies to improve the air quality in places like the Southwest side of Chicago.
We’ve covered a lot of information here today. Is there anything in particular about your project that you’d like to share with the listeners before we wrap up? Are there any misunderstandings about what you’re working on or the potential benefits that you’d like to just get out there?
Well, I’d like to maybe give a little context to what we’re doing and also say we’re not doing SimCity. To say that we’re doing SimCity would be sort of like telling the guys designing aircraft wings that they’re doing flight simulators. SimCity would be great if we had a SimCity capability that had HPC behind it. Most of the time, we don’t really need to look at that broad of a question. We’re really looking at a district and specific questions in that district.
The context for this exascale project from my point of view is that not only are we developing a capability that we can interact with and provide assistance to urban designers—those might be architecture firms or commissioners for urban planning and development—but we are doing a tool for those folks to help them answer their questions. There are other things that we’re drawing upon for the Exascale Computing Project, one of which is a fairly long history of working with the City of Chicago and other cities on pulling data from different sources into a place where you can start to look for data that will apply to your project. So for our project in Chicago, outside of the exascale computing, because we’ve got relationships with the City of Chicago and with ComEd and Exelon, we actually have energy data, anonymized energy data that will allow us to, in fact, validate some of our models with 2 years’ worth of 15-minute resolution energy data for all the buildings that we’re modeling.
The other thing that we’re drawing on, in Chicago initially, but we’re doing it in other cities as well, is the Array of Things project. In Chicago, we’re putting in hundreds of these devices that are measuring air quality and general environmental data and also are using vision processing to look for urban flooding and look at pedestrian and vehicle flow. Those will be a tremendous source of data for calibrating our models and for validating those models. And because we’re coming in with the exascale program at the sort of front end of the deployment of the Array of Things in Chicago, we are specifically choosing locations around the areas that we’re going to be modeling. It’s the north branch framework in Chicago, up the Chicago River, a 600-acre area, 20,000 buildings, major rezoning, redevelopment happening, and we’re able to put measurement points in there specifically driven by the models that we want to do in order to be able to look at what’s the right way to rezone and design that area to be a healthy place in the next century.