Machine Learning with Tensorflow, Horovod and PyTorch on HPC

September 4

Calendar

When:

September 4, 2019 @ 12:00 pm – 1:00 pm

2019-09-04T12:00:00-05:00

2019-09-04T13:00:00-05:00

Contact:

Yasaman Ghadar

Training Event

training ideas IDEAS Productivity Agile Project Management Kanban

Abstract

Running efficient and scalable deep learning applications on leadership computing systems, including future exascale supercomputers, requires good use of popular deep learning frameworks, such as TensorFlow, Horovod, and PyTorch. In this ESP Webinar, we covered the basics of when you should use these frameworks, how to build and deploy models on HPC systems, and how to get good performance. Additionally, deep learning workloads on HPC also require care when scaling to multi-node jobs, and HPC systems offer opportunities to perform hyperparameter searches as well. The presenters discussed some techniques for profiling deep learning workloads on HPC systems and how to solve bottlenecks.

Organizers

Haritha Siddabathuni Som (ALCF)
Ray Loy (ALCF)
Yasaman Ghadar (ALCF)

Webinar materials

Slides
Video