Running efficient and scalable deep learning applications on leadership computing systems, including future exascale supercomputers, requires good use of popular deep learning frameworks, such as TensorFlow, Horovod, and PyTorch. In this ESP Webinar, we covered the basics of when you should use these frameworks, how to build and deploy models on HPC systems, and how to get good performance. Additionally, deep learning workloads on HPC also require care when scaling to multi-node jobs, and HPC systems offer opportunities to perform hyperparameter searches as well. The presenters discussed some techniques for profiling deep learning workloads on HPC systems and how to solve bottlenecks.
- Haritha Siddabathuni Som (ALCF)
- Ray Loy (ALCF)
- Yasaman Ghadar (ALCF)