Sheng Shen

Ph.D. student in BAIR, EECS at the University of California, Berkeley.

profile Berkeley, California
Email: sheng.s@berkeley.edu
Google Scholar
Github
Twitter
Linkedin

At Berkeley, I am advised by Prof. Kurt Keutzer and Prof. Trevor Darrell. I also work closely with Prof. Dan Klein and Prof. Michael Mahoney. Prior to Berkeley, I received my bachelor degree in computer science from Peking University, advised by Prof. Xuanzhe Liu. I have received the Lotfi A. Zadeh Prize for my research work.
My research interests focus on compute-optimal (multimodal) language modeling, including efficient training/tuning methods, model compression techniques, and the integration of vision-language models.

Preprints

  • Aligning Large Multimodal Models with Factually Augmented RLHF
  • Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
  • Publications (selected)

  • K-LITE: Learning Transferable Visual Models with External Knowledge NeurIPS 2022
  • Staged Training for Transformer Language Models ICML 2022
  • How Much Can CLIP Benefit Vision-and-Language Tasks? ICLR 2022
  • Multitask prompted training enables zero-shot task generalization ICLR 2022
  • Learned Token Pruning for Transformers KDD 2022
  • Reservoir Transformers ACL 2021
  • ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning AAAI 2021
  • Noisy Self-Knowledge Distillation for Text Summarization NAACL 2021
  • PowerNorm: Rethinking Batch Normalization in Transformers ICML 2020
  • Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers ICML 2020
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020
  • Pragmatically Informative Text Generation NAACL 2019 short
  • Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification WWW 2019
  • Experience

    Google, Student Researcher
    Advised by Le Hou and Denny Zhou, Mar. 2023 - Aug. 2023

    Microsft, Research Intern
    Advised by Zhewei Yao and Chunyuan Li, Feb. 2022 - Aug. 2022

    Allen Institute for Artificial Intelligence, Research Intern
    Advised by Iz Beltagy, Matthew Peters and Jesse Dodge, May. 2021 - Aug. 2021

    Facebook AI Research, Research Intern
    Advised by Douwe Kiela and Michael Auli, May. 2020 - Dec. 2020

    Berkeley AI Research, Junior Specialist II
    Advised by Prof. Kurt Keutzer, Prof. Dan Klein and Prof. Michael Mahoney, Jun. 2019 - May. 2020

    Tencent AI Lab, Research Intern
    Advised by Yaliang Li and Wei Fan, Apr. 2018 - Sept. 2018

    University of Illinois at Urbana-Champaign, Research Intern
    Advised by Prof. Aditya Parameswaran, Jun. 2017 - Sept. 2017

    Teaching

    CS267 Applications of Parallel Computers, Spring 2023
    CS282 Deep Neural Networks, Fall 2022