"Keynote Title: ""Adaptive Inference in Transformers""
Speaker Biography
Xifeng Yan is a professor at the University of California, Santa Barbara, where he holds the Venkatesh Narayanamurti Chair in Computer Science. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2006 and was a research staff member at the IBM T. J. Watson Research Center from 2006 to 2008. His current research objective is to explore foundation models in artificial intelligence, leverage these models for knowledge discovery, and develop cross-disciplinary applications. His work has been widely cited. He has received numerous honors, including the NSF CAREER Award, IBM Invention Achievement Award, ACM SIGMOD Dissertation Runner-Up Award, IEEE ICDM 10-Year Highest Impact Paper Award, 2022 PLDI Distinguished Paper Award, 2022 VLDB Test of Time Award, and first place in the Amazon SocialBot Grand Challenge 5. His team is the creator of the first Transformer-based time series forecasting model, initiating a new research direction in the field.
Abstract
Transformer-based large language models (LLMs) have achieved remarkable success across both language and vision tasks, with their impact now extending into robotics—for example, through VLA models in robotic manipulation. Despite these advances, many open questions remain. In this talk, I will focus on one fundamental question: Do all tokens require the same amount of computation within a Transformer? I will share insights into this question and present preliminary approaches to adaptive inference, in which different tokens are generated using varying numbers of Transformer layers. Actually many layers can be automatically skipped without compromising output quality. The overarching goal is to demonstrate how such methods can enhance the efficiency of Transformer-based models and improve their applicability to domains beyond LLMs.
"
Comments
Want to join the conversation?
Loading comments...