|
Description:
|
|
Today we conclude our KubeCon ‘19 Series joined by Erez Cohen, VP of CloudX & AI at Mellanox. In our conversation, we discuss: - Erez’s talk “Networking Optimizations for Multi-Node Deep Learning on Kubernetes.” where he discusses problems and solutions related to networking discovered during the journey to reduce training time.
- NVIDIA’s recent acquisition of Mellanox, and what fruits that relationship hopes to bear.
- The evolution of technologies like RDMA, GPU Direct, and Sharp, Mellanox’s solution to improve the performance of MPI operations, which can be found in NVIDIA’s NCCL collective communications library.
- How Mellanox is enabling Kubernetes and other platforms to take advantage of the various technologies mentioned above.
- Why we should care about networking in Deep Learning, which is inherently a compute-bound process.
The complete show notes for this episode can be found at twimlai.com/talk/345. |