The modern deep learning models are growing at an exponential rate, and those latest models could grow their parameters from million to billions. To train those modern models within hours, distributed training is a better option for those big models.
Read ArticleOptimizing DLRM by Using PyTorch with oneCCL Backend
Article • February 5, 2021