Interview with Xiang Fang: Multi-Modal Learning and Embodied Intelligence

•January 20, 2026

AIhub•Jan 20, 2026

Summary

In this interview, PhD candidate Xiang Fang discusses his multi‑modal learning research at NTU, covering efficient video understanding, out‑of‑distribution detection for trustworthy AI, and embodied intelligence for vision‑language navigation. He highlights a standout project that adapts biological reaction‑diffusion patterns to fuse video and text, illustrating his interdisciplinary, mathematically‑driven approach. Looking ahead, he aims to build unified vision‑language‑action models that handle incomplete inputs, maintain robustness in the wild, and remain efficient for real‑time deployment. Fang’s background in geological engineering and competitive mathematics fuels his drive to create AI agents that can both see and act in the physical world.

Interview with Xiang Fang: Multi-Modal Learning and Embodied Intelligence

Summary

Ask Pulse AI:

Comments

AI Pulse