A conversation with Rick Stevens of Argonne National Laboratory
The audio podcast for this interview can be found here:
Rick Stevens, associate laboratory director for computing, environment, and life sciences at Argonne National Laboratory, spoke with Exascale Computing Project (ECP) Communications at SC17 in Denver. Stevens is the principal investigator for the ECP project called CANcer Distributed Learning Environment (CANDLE), which is developing computational methods targeted to precision medicine for cancer treatment. We talked to him about various details of the project and what it has accomplished so far. This is an edited transcript of our conversation.
What is CANDLE all about?
It has to do with building a scalable deep-learning environment that can be applied to a variety of problems in cancer, initially. CANDLE is designed to run on the big machines that we have at the US Department of Energy (DOE). The goal is to have an easy-to-use environment that can take advantage of the full power of these big systems to search through large combinations of deep-learning models to find optimal models for making predictions in cancer. Eventually, we’ll use the same environment in many other areas of DOE research, such as materials science, cosmology, or climate analysis.
What will CANDLE enable researchers to accomplish that they cannot today?
Right now, we can run individual deep-learning models on the nodes of supercomputers. However, it’s very difficult to sweep through thousands or tens of thousands of model configurations to look for the optimal results of the models and to database all of that and to visualize it. So the CANDLE environment will really enable individual researchers to scale up their use of DOE supercomputers on deep learning in a way that’s never been possible before.
Is this research area taking advantage of some of the computer time available through the ECP?
Yes, we have a significant allocation as part of the ECP to use for development of the algorithms, but also to do some science. So as we debug our algorithms and debug the software stacks, assuming we have enough time left over, we’ll use our allocation to do real science.
What new collaborations have you developed through this project?
CANDLE is great in the collaboration sense because it involves Argonne, Oak Ridge, Lawrence Livermore, and Los Alamos, as well as the National Cancer Institute (NCI) and the Frederick National Laboratory for Cancer Research. The four DOE labs plus the NCI lab are working together on software infrastructure that plugs into the frameworks, and also those labs are partnering on different cancer research problems. They’re bringing those cancer problems as the test cases for this new environment.
What milestones has CANDLE reached so far?
We made a major software release in July, and before that, we released seven benchmark problems that kind of represent our design targets. We’ve also released in the last couple of months state-of-the-art problems that use this environment to advance learning research in cancer. Further, we’ll do another major software release in the spring. Another accomplishment is that we have the system running now at Argonne, the National Energy Research Scientific Computing Center, Oak Ridge National Laboratory (ORNL), and the National Cancer Institute. So we have a large base of people who are collaborating on developing the software, testing it, and benchmarking it. I have a great set of collaborators in Fred Streitz of Lawrence Livermore National Laboratory, Gina Tourassi of ORNL, Frank Alexander of Brookhaven National Laboratory and formerly of Los Alamos National Laboratory (LANL), and Marian Anghel of LANL. We’ve been able to bring together people at each of the labs who are really thinking about deep learning and how it’s going to apply to problems in DOE.
How do you think a project such as CANDLE will affect research within DOE?
I believe in the next 5 years, the number of research projects at DOE that are using machine learning to augment simulation will increase dramatically to take advantage of the large-scale data that DOE collects. That growth will require hundreds of people at each laboratory to come up to speed on how to use deep learning and advanced deep-learning research, network types, and research methods to make it possible to run these large-scale deep-learning problems on the supercomputers. Deep learning has not been the traditional application type for these supercomputers, and it has different requirements in terms of how it uses data, how it produces outputs, and how you need to scale up and use the resources in different ways than traditional simulation applications.
What do you hope to achieve during calendar year 2018?
I think we will have made major progress in different areas of cancer research and this idea of predictive oncology, being able to predict which types of drugs would be most appropriate for given tumor types. We are using deep learning to analyze millions of medical records and to pull that data out in a way that we can then compute on it, and I think we’ll make major progress in that area. ORNL is already showing really positive results. We also want to ultimately steer simulation with deep-learning supervisors where the deep-learning system is actually analyzing the data as the supercomputer is running and steering the computation in a way that humans could never do. The system can absorb more information about what’s happening in the simulation. It can make decisions about where to take that simulation.
Could CANDLE achieve its objectives without exascale?
No. All of these problems are really aimed at big, overarching challenges that require exascale, and we’re climbing up the exascale mountain, so to speak. The first machine that will get us substantially closer to that will be Summit at Oak Ridge. It will have thousands of GPUs and be extremely well-suited to deep learning. But we’re also using the Intel processors at Argonne and Berkeley to advance this as well, so the system is not specific to any given architecture. It’s designed to run the problems across multiple architectures, but it’s definitely forward-looking toward exascale. Many of the problems we have in our imagination that we want to do over the next few years will absolutely require exascale.
Is there anything you would like people to know about CANDLE that we’ve not discussed?
It’s pretty amazing how much interest there is in applying high-performance computing and deep learning at scale to a problem like cancer. You can talk a lot about materials, cosmology, or even climate, and people are interested in those. But when we start talking about problems like cancer, which affects basically everybody, the level of interest is just off the charts, and it’s not just interest in DOE or our NCI collaborators. It’s interest from the vendors. So every one of the major vendors that provides computing capability—whether it’s Cray, Intel, NVIDIA, IBM, HPE, AMD, ARM, and so on—all of them are interested in trying to figure out how they can help make this kind of problem work extremely well for the entire community. It really is a problem that touches people in a completely different way than most of what we’ve worked on in the past.