Search

Home > This Week in Machine Learning & Artificial Intelligence (AI) Podcast > Networking Optimizations for Multi-Node Deep Learning on Kubernetes with Erez Cohen - #345
Podcast: This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Episode:

Networking Optimizations for Multi-Node Deep Learning on Kubernetes with Erez Cohen - #345

Category: Technology
Duration: 00:34:00
Publish Date: 2020-02-05 11:33:06
Description:

Today we conclude our KubeCon ‘19 Series joined by Erez Cohen, VP of CloudX & AI at Mellanox. In our conversation, we discuss:

  • Erez’s talk “Networking Optimizations for Multi-Node Deep Learning on Kubernetes.” where he discusses problems and solutions related to networking discovered during the journey to reduce training time. 
  • NVIDIA’s recent acquisition of Mellanox, and what fruits that relationship hopes to bear. 
  • The evolution of technologies like RDMA, GPU Direct, and Sharp, Mellanox’s solution to improve the performance of MPI operations, which can be found in NVIDIA’s NCCL collective communications library.
  • How Mellanox is enabling Kubernetes and other platforms to take advantage of the various technologies mentioned above. 
  • Why we should care about networking in Deep Learning, which is inherently a compute-bound process. 

The complete show notes for this episode can be found at twimlai.com/talk/345.

Total Play: 0