Let’s Talk Exascale Code Development: CANcer Distributed Learning Environment (CANDLE)

Exascale Computing Project · Episode 91: Let’s Talk Exascale Code Development: CANcer Distributed Learning Environment (CANDLE)

By Scott Gibson

Clockwise, from bottom left: Harry Yoo, Thomas Brettin, and Venkatram Vishwanath of Argonne National Laboratory and the CANDLE project

Clockwise, from bottom left: Harry Yoo, Thomas Brettin, and Venkatram Vishwanath of Argonne National Laboratory and the CANDLE project.

Hi and welcome to episode 91 of the Let’s Talk Exascale podcast. This is where we explore the efforts of the Department of Energy’s Exascale Computing Project—from the development challenges and achievements to the ultimate expected impact of exascale computing on society.

And this is the fourth in a series of episodes based on work aimed at sharing best practices in preparing applications for the upcoming Aurora exascale supercomputer at the Argonne Leadership Computing Facility.

The series is highlighting achievements in optimizing code to run on GPUs. We are also providing developers with lessons learned to help them overcome any initial hurdles.

This time we focus on the computer codes used in a project called CANDLE, which stands for CANcer Distributed Learning Environment. It is addressing three significant science challenge problems in cancer research, and we’ll hear about those shortly. The emphasis of the work is on machine learning and in particular builds on a single scalable deep neural network, or DNN, code that also bears the name CANDLE. The project is developing highly efficient DNNs optimized for the unique architectures provided by exascale-class computing platforms such as the upcoming Aurora and Frontier systems.

The CANDLE project is a collaborative effort with the U.S. Department of Energy and the National Cancer Institute (NCI), involving Argonne, Lawrence Livermore, Los Alamos, and Oak Ridge National Laboratories.

The guests for the program are Thomas Brettin, Venkatram Vishwanath, and Harry Hyunseung Yoo of Argonne National Laboratory and the CANDLE project.

Our topics: an overview of the project’s three challenges, how CANDLE will benefit from exascale computing systems, the role of ECP in CANDLE development, and more.

Interview Transcript

Gibson: Tom, will you begin things for us by telling us about CANDLE?

Brettin: Yeah, sure. CANDLE is a project that started about five years ago through DOE in partnership with the National Cancer Institute of the National Institutes of Health. And we identified three key challenges that were facing the cancer community that might be solved and in the near future using the Department of Energy’s exascale computing systems. We outlined these challenges. There are three of them.

The first was a drug-response problem, and that would be to be able to predict how a tumor would respond to a particular small molecule.

The second challenge was called the RAS pathway problem, and in this challenge, it was to understand the molecular basis of key protein interactions in the RAS/RAF pathways. Mutations in the RAS/RAF pathway are present in almost 30 percent of the cancers, and yet it’s not really well understood what the molecular mechanism is.

The last challenge was what we called the treatment-strategy challenge, and that would be to automate analysis and extraction of information from millions of cancer records to determine optimal cancer strategies and also to perhaps predict the likelihood of recurrence. So overall, CANDLE embraced these three challenges, and with a focus on machine learning and how we would be able to solve these challenges while building a single scalable neural-network framework that would support computational investigations into these three problems.

Gibson: How will CANDLE benefit from DOE’s upcoming exascale systems?

Brettin: CANDLE will benefit in a number of ways—primarily, though, our ability to increase the complexity of the computational experiments that we would run against the three challenge problems that I just explained.

Gibson: Tom, what role has the Exascale Computing Project played with respect to CANDLE development?

Brettin: Well, that’s a good question. It’s actually played a couple of key roles. I think one important role that often might be overlooked is that the Exascale Computing Project facilitated the partnering of four national labs on the CANDLE development. And that was really nice, because it brings the strength of multiple national labs to bear on both the development of CANDLE and the scientific challenges that it’s attempting to solve in cancer.

And on a second front, what the Exascale Computing Project has done for us is it’s given us target hardware platforms that’s going to allow us to actually take our modeling down to the individual level. And what I mean by individual level is along the lines of what we might commonly refer to as precision medicine. That is the ability to tailor certain AI algorithms to an individual so that we’re now really achieving the dream of precision medicine. I don’t think that would be possible on previous systems.

Gibson: All right. This one is for Venkatram. Why do you believe exascale supercomputers like Argonne’s Aurora system will be well equipped for an application like CANDLE?

Vishwanath: So, thanks, Scott, for that question. The systems such as Aurora provide the unique capabilities of really enabling one to run simulation, data, and learning together on one platform to [address] really tough and challenging questions in a wide number of scientific domains, including the CANDLE project here. So, it provides the hardware architecture, as well as the software environments, wherein we can run these workloads. This will enable us to take outputs from simulation, feed it to an AI or use AI to better steer simulations or to run problems much faster. Additionally, it will help us with the kind of storage and architecture capabilities to really process vast amounts of data, such as those required in the CANDLE project. So, it brings together the software and hardware features that, again, couples simulation, data and learning to help with novel discoveries here.

Gibson: Harry, can you tell us about the preparations for Aurora that are being pursued at Argonne through the ALCF’s Aurora Early Science Program?

Yoo: Yes. Among the CANDLE benchmarks, we are using the Uno model. The model aims to predict the tumor response to both a single and a paired combination of drugs. The model learns from data, which we prepared by combining molecular features of tumor cells, drug descriptors, and cell growth data from multiple data sources. So, we implement a deep-learning model in TensorFlow and PyTorch, and the code is publicly available. The model is already running at scale on the DOE supercomputers like Theta and Summit. As the first step, we wanted to understand the performance on the current hardware, such as Intel KNL and NVIDIA GPUs.

So, we profiled the code with Intel VTune and NVIDIA nvprof to find the potential bottlenecks that might slow down the execution of the application. So, we also measured the current throughput, the number of samples processed per second.

To enable the execution [on Intel hardware], we used Intel’s extension of TensorFlow and PyTorch. The extension uses DPC++ internally so that it can utilize the Intel GPU optimally. So throughout the multiple iterations, the iteration of both the early-access hardware and frameworks, we identified the bottlenecks and found room to improve. So we are still in this process, and we are eager and happy to see the progress.

Gibson: Okay. Thank you very much. What are some of the challenges in developing deep-learning applications for Exascale?

Yoo: Well, the model itself is hardware-agnostic, which means as long as the framework is running, the model should run; however, we are supposed to run it on a brand-new hardware, which has several new features and different configurations compared to the existing ones. So, the first challenge was how to optimize the extension of popular frameworks like TensorFlow and PyTorch on the Intel GPU, and after that, how to optimize the code to maximize the new hardware features.

The execution plan will be different, and previous bottlenecks may not be a bottleneck anymore because of the fundamental change on the GPU or maybe because of the surrounding system configurations like network, memory, storage, and so on. So, we find new bottlenecks and then we can improve them [iteratively].

Gibson: What tools or resources would you like to highlight for us as having been especially helpful in preparing CANDLE for exascale systems?

Yoo: Well, I would like to start with the profiler. The Intel VTune tells you where the bottleneck is and how much hardware it utilizes so that you can see what’s going on inside and then you can fine-tune the parameters to maximize the throughput.

For the resource-wise, the ALCF workshop was very useful and informative. So, during the workshop, I learned more about the systems, system configuration, useful tools like the Intel VTune and Adviser and also learned the best practices.

And we used the Uno model to understand the performance characteristics of AI accelerators like Cerebras and SambaNova, Graphcore and Groq. So, each machine has own specialization and unique capabilities. We were curious how we can match up with exascale machine and the accelerator science. So, for example, we may be able to run massive simulation on exascale machine, which feeds data to the accelerator to train a model and execute on large-scale inferences on a specialized hardware like Groq. Or exascale machine can train thousands of models at the same time, and then accelerator can validate the model and then guide the large-scale training, etc. So, the exascale machine itself is interesting. And also, we are interested in the combination of all those machines.

Gibson: Very good. Tom, what are the next steps for CANDLE?

Brettin: The next steps for CANDLE are focusing on the large-scale language models. So, in recent years what we’ve been looking at are large ensembles of smaller models in such a way that each model might represent an individual. But with the emergence of the large transformer models, we’re shifting our attention to these models and getting them to be able to run at scale. At first we’ll be looking at one of the new machines in the ALCF, Polaris, as a stepping stone to the two larger machines, Frontier and Aurora.

Gibson: Finally, Venkataram, do you have lessons learned or advice that may help researchers in preparing their code for GPU-accelerated exascale systems?

Vishwanath: We definitely have several pieces of advice, and I would say one of the key things is to start small. Pick your code. Port it onto the systems to ensure that you have sufficient functionality running on these systems. Again, you have to remember that these systems are the first of their class at scale, in some cases have really new and novel features. So, the key is to take the code and ensure that you have sufficient functionality, and the code runs on the system. So that next leads to the ability to start benchmarking, making it more performant. You have a wide array of profilers that give you enough insights into what might be either bottlenecks in the code, or perhaps you require a new algorithm so that you can fully leverage the functionalities of the architectures here.

I would say two key things are to ensure you have sufficient functional coverage so that the code runs, and the next step is to make this performant so that you are able to achieve the required time to solution and the convergence criteria. And I would say that on systems such as these, you start small. You ensure that you fully utilize the testbeds that we have at ALCF to prepare you for Aurora. Ensure that your code is functional and performant at a small scale. Next steps, you would start trying to increase and run it at the larger scales.

The existing systems that you have access to, such as the Theta supercomputer that we have, the new Polaris system that is being deployed, are excellent systems to help prepare deep-learning codes to really scale up and identify any application optimizations that are needed. And I would say that you’ll get a very familiar environment on systems such as Aurora that you’re very used to on existing systems. And so, this is a good way for you to develop and port applications and prepare for Aurora.

Gibson: This has been really informative. Thanks to each of you for being on Let’s Talk Exascale.

Vishwanath: Thank you so much.

Gibson: We’d like to hear from you, the listener. Feel free to send comments and suggestions for subjects you’d like us to cover or people you’d like us to interview. The email address is [email protected]. Thanks for joining us, and so long for now, from Let’s Talk Exascale.

Related Links