Interview with Xiang Fang: Multi-Modal Learning and Embodied Intelligence
Summary
In this interview, PhD candidate Xiang Fang discusses his multi‑modal learning research at NTU, covering efficient video understanding, out‑of‑distribution detection for trustworthy AI, and embodied intelligence for vision‑language navigation. He highlights a standout project that adapts biological reaction‑diffusion patterns to fuse video and text, illustrating his interdisciplinary, mathematically‑driven approach. Looking ahead, he aims to build unified vision‑language‑action models that handle incomplete inputs, maintain robustness in the wild, and remain efficient for real‑time deployment. Fang’s background in geological engineering and competitive mathematics fuels his drive to create AI agents that can both see and act in the physical world.
Interview with Xiang Fang: Multi-modal learning and embodied intelligence
Comments
Want to join the conversation?
Loading comments...