DeveloperSummit AI & HPC 2022

Efficient Inference and Training of Large Neural Network Models

The memory consumption and computational cost of state-of-the-art deep neural network models are dramatically increasing. Therefore, it is beneficial to apply efficient deep learning to both inference and training. In this talk, we present our progress regarding this topic. First, we introduce LTP, which uses pruning to accelerate inference. Then we talk about staged training for transformers, and TASC, which are designed to accelerate training. Finally, we show our solutions to efficiently implement large recommendation models, where we systematically apply quantization in DQRM, and leverage the sparsity in DLRM to better support hot embeddings. Our methods achieve great performance and have decent generalization ability.

Download Presentation Deck

Speaker(s)

Zhen Dong - UC Berkeley/Postdoc Student

Zhen Dong - UC Berkeley/Postdoc Student

Zhen Dong received B.S. from Peking University in 2018 and Ph.D. from University of California at Berkeley in 2022. He is currently a postdoc at UC Berkeley working with Prof. Kurt Keutzer. Zhen received the Outstanding Graduate Award at Peking University and the distinguished Berkeley University Fellowship.
His research interests include efficient deep learning, quantization, model compression, and hardware-software co-design.

Kurt Keutzer - UC Berkeley/Professor

Kurt Keutzer - UC Berkeley/Professor

Kurt Keutzer is a Professor of EECS at University of California, Berkeley where he is a member of the BAIR Lab and co-director of the Berkeley Deep Drive research consortium. His research covers all aspects of Deep Learning. His collaboration on the LARS and LAMB algorithms reduced the training time of ImageNet and BERT to minutes. His “Squeeze” family of DNNs were among the first DNNs suitable for mobile applications. Previously, Kurt was CTO at Synopsys, and his contributions to Electronic Design Automation were recognized at the 50th Design Automation Conference where he was noted as a Top 10 most-cited author, author of a Top 10 cited paper, and one of only three people to win four Best Paper Awards in the 50-year history of that conference. As an entrepreneur Kurt has been an investor and advisor to over 30 startups. His most recent exits have been DeepScale (co-founder), acquired by Tesla, and BabbleLabs (investor and advisor), acquired by Cisco.

Join us at the oneAPI DevSummit Hosted by UXL Foundation
September 17, 2025

Watch Replay

Efficient Inference and Training of Large Neural Network Models

Speaker(s)

Zhen Dong - UC Berkeley/Postdoc Student

Kurt Keutzer - UC Berkeley/Professor

Join us at the oneAPI DevSummit Hosted by UXL FoundationSeptember 17, 2025

Join us at the oneAPI DevSummit Hosted by UXL Foundation
September 17, 2025