oneAPI DevSummit for AI 2023

Efficient Inference and Training of Large Neural Network Models

The memory consumption and computational cost of SOTA NN models are dramatically increasing. Efficient deep learning can be applied to both inference and training. In this talk, we present our progress regarding quantization and sparsification. First, we show our solutions to efficiently implementing large recommendation models, where we systematically apply quantization in DQRM with the assistance of oneAPI. Then we talk about our efforts in compressing large language models (LLMs) and diffusion models. Finally, we introduce TASC, which is designed to accelerate distributed training. Our methods achieve excellent performance and obtain decent generalization ability.

Download Presentation

Speaker(s)

Zhen Dong - Postdoc Student/University of California, Berkeley

Zhen Dong - Postdoc Student/University of California, Berkeley

Zhen Dong received his B.S. from Peking University in 2018 and Ph.D. from University of California at Berkeley in 2022. He is currently a postdoc at UC Berkeley working with Prof. Kurt Keutzer. Zhen received the Outstanding Graduate Award at Peking University and the distinguished Berkeley University Fellowship.
His research interests include efficient deep learning, quantization, model compression, and hardware-software co-design.

Prof. Kurt Keutzer - Professor/University of California, Berkeley

Prof. Kurt Keutzer - Professor/University of California, Berkeley

Kurt Keutzer is a Professor of EECS at UC Berkeley, specializing in Deep Learning. His algorithms reduced training time for ImageNet and BERT. His “Squeeze” DNNs are ideal for mobile apps. Kurt was CTO at Synopsys and recognized for his contributions to Electronic Design Automation. He’s an experienced entrepreneur, investor, and advisor to over 30 startups, with recent exits including DeepScale (co-founder) and BabbleLabs (investor).

Coleman Hooper - PhD Student/ University of California, Berkeley

Coleman Hooper - PhD Student/ University of California, Berkeley

Coleman is a graduate student at UC Berkeley, CA, USA, pursuing a PhD in Electrical Engineering, affiliated with the Specialized Computing Ecosystems (SLICE) lab and with Berkeley AI Research (BAIR). His research interests are in hardware acceleration and hardware-software co-design for machine learning with a particular focus on NLP applications. Previously, he received a B.S. degree in Electrical Engineering from Harvard University, MA, USA.

Join us at the oneAPI DevSummit Hosted by UXL Foundation
September 17, 2025

Watch Replay