Back to Blog

Speed Training 8X Using PyTorch with oneCCL Backend

November 18, 2021

Train Complex Models in Hours, Not Days, with Distributed Training

oneCCL Blog picture

Fig.1 Software stacks for PyTorch DistributedDataParallel. CCL is one of the communication backend options.

As today’s deep learning models expand exponentially, so do their parameters, growing by orders of magnitude. The solution for processing those models in a timely manner: distributed training. This article recounts how a research team distributed its deep learning recommender model (DLRM) by using PyTorch with different backends, to reveal an 8.5X performance increase training AI topologies with the Intel oneAPI Collective Communications Library (oneCCL) backend. The Intel oneCCL, designed to promote compatibility and enable developer productivity, assists developers and researchers to train new, more complex models employing optimized communication patterns to distribute model training across multiple nodes.

Read Full Article

< Previous Post Next Post >

Join us at the UXL Foundation Mini Summit @ OSS in Denver, ColoradoJune 26, 2025

Join us at the UXL Foundation Mini Summit @ OSS in Denver, Colorado
June 26, 2025