Fuchun Sun - Knowledge-Guided Tactile VLA

•January 28, 2026

0

IEEE Robotics and Automation Society

IEEE Robotics and Automation Society•Jan 28, 2026

Why It Matters

Embedding tactile and physics knowledge into VLA systems promises more reliable, generalizable robotic behavior and higher sim-to-real transfer performance, crucial for industrial automation and safe real-world interaction.

Summary

Fuchun Sun outlines a knowledge-guided approach to embodied vision-language-action (VLA) agents that integrates tactile sensing and physical awareness with large language models. He argues tactile feedback closes the semantic–physics gap—enabling fine force control, collision detection, and perception of material properties—critical for tasks like assembly where vision alone fails. Sun proposes a three-part agent (perception, cognition, action) supported by a physical-digital simulation that tokenizes tactile, geometric and dynamic properties to improve policy transfer from simulation to the real world. He also highlights LLMs as planners and generalizers for scenario and policy adaptation and for sequencing sub-tasks in complex embodied workflows.

Original Description

Speaker Biography

Dr. Fuchun Sun is a Tenured Professor in the Department of Computer Science and Technology at Tsinghua University, where he also serves as the Director of the Intelligent Robotics Center at the Institute of Artificial Intelligence and Deputy Director of the Committee of Tenured Professors. He is currently the Vice Chairman of the Chinese Association for Artificial Intelligence (CAAI) and an Executive Director of the Chinese Association for Automation (CAA). His research focuses on robotic perception, skill learning, cross-modal learning, and intelligent control. Dr. Sun has led teams to win championships in the Autonomous Grasp Challenges at IROS in 2016 and 2019, and at ICRA in 2015 and 2024. He was elected IEEE Fellow and CAAI Fellow in 2019, and CAA Fellow in 2020. He is also a recipient of the Excellent Doctoral Dissertation Award of China (2000) by the Chinese Ministry of Education, the Choon-Gang Academic Award by Korea (2003), and was recognized as a Distinguished Young Scholar by the National Natural Science Foundation of China in 2006. He has served as Editor-in-Chief of Cognitive Computation and Systems and AI and Autonomous Systems, and as an Associate Editor for IEEE Transactions on Fuzzy Systems.

Abstract

The Vision-Language-Action (VLA) paradigm has significantly advanced robotic control through Internet-scale pre-training. However, its application to real-world manipulation tasks, particularly those requiring high precision in contact-rich scenarios or dealing with complex dynamics, is often limited by a lack of fine-grained physical grounding. To address this, we propose a Knowledge-Guided Tactile VLA framework that enhances traditional vision-language-action models with robust physical reasoning capabilities through tactile sensing and world modeling. Our Unified Digital Physics System (UDPS) incorporates tactile perception with physical knowledge prior via a novel tokenization scheme that encodes geometry, physics, and tactile cues into a unified representation. The cross-domain alignment distilled from geometry invariances substantially improving sim-to-real transfer for contact-rich manipulation. Simultaneously, physical token enables the modelling of dynamic and complex physical process, including soft-body deformation and contact transitions. The framework is rigorously validated in two demanding tasks: precision 3C assembly and humanoid handkerchief dancing. In 3C assembly, UDPS taking tactile feedback as position offset in sim-to-real transfer and achieves sub-millimeter precision in connector mating in a zero-shot manner. For handkerchief manipulation, the physical tokens models complex fabric dynamics, enabling stable rhythmic motions through whole-body coordination. These results demonstrate the critical importance of integrating physical knowledge and tactile sensing for solving complex, contact-rich manipulation tasks in real-world environments without real-world fine-tuning.

0

Comments

Want to join the conversation?

Loading comments...