Robotics Videos
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

Robotics Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
RoboticsVideosLearning on the Fly: Rapid Policy Adaptation via Differentiable Simulation (RA-L 2026)
Robotics

Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation (RA-L 2026)

•January 14, 2026
0
UZH Robotics and Perception Group
UZH Robotics and Perception Group•Jan 14, 2026

Why It Matters

Rapid, data‑driven policy adaptation cuts development cycles and improves safety for autonomous drones operating in unpredictable environments.

Key Takeaways

  • •Differentiable simulation enables real‑time policy adaptation within seconds
  • •Residual dynamics model continuously refines low‑fidelity simulator online
  • •Achieves up to 81% error reduction versus L1 MPC
  • •Outperforms PPO‑based adaptive tracking in large disturbance scenarios
  • •Works with vision‑based policies without explicit state estimation

Summary

The paper introduces a novel online learning framework—Rapid Policy Adaptation via Differentiable Simulation (RA‑L 2026)—that lets quadrotor controllers adjust to unknown disturbances in seconds during real‑world deployment.

The method starts with a low‑fidelity, fully differentiable dynamics model to train a policy via analytical gradients. During flight, real‑world data are used to learn a residual dynamics model, which is injected back into the simulator, enabling an alternating cycle of policy refinement and model update that is both sample‑efficient and computationally light.

Experimental results show up to 81 % error reduction compared with L1 MPC and 55 % versus a PPO‑based deep adaptive tracking controller. The approach maintains performance in large‑disturbance scenarios where prior methods fail, adapts vision‑based policies without explicit state estimation, and scales to larger quadrotors with different mass and thrust characteristics.

By collapsing the gap between simulation and reality, the technique accelerates deployment of robust autonomous aerial systems, reduces reliance on exhaustive system identification, and opens pathways for adaptive control in other robotics domains.

Original Description

Learning control policies in simulation enables rapid, safe, and cost-effective development of advanced robotic capabilities. However, transferring these policies to the real world remains difficult due to the sim-to-real gap, where unmodeled dynamics and environmental disturbances can degrade policy performance. Existing approaches, such as domain randomization and Real2Sim2Real pipelines, can improve policy robustness, but either struggle under out-of-distribution conditions or require costly offline retraining. In this work, we approach these problems from a different perspective. Instead of relying on diverse training conditions before deployment, we focus on rapidly adapting the learned policy in the real world in an online fashion. To achieve this, we propose a novel online adaptive learning framework that unifies residual dynamics learning with real-time policy adaptation inside a differentiable simulation. Starting from a simple dynamics model, our framework refines the model continuously with real-world data to capture unmodeled effects and disturbances such as payload changes and wind. The refined dynamics model is embedded in a differentiable simulation framework, enabling gradient backpropagation through the dynamics and thus rapid, sample-efficient policy updates beyond the reach of classical RL methods like PPO. All components of our system are designed for rapid adaptation, enabling the policy to adjust to unseen disturbances within 5 seconds of training. We validate the approach on agile quadrotor control under various disturbances in both simulation and the real world. Our framework reduces hovering error by up to 81% compared to L1-MPC and 55% compared to DATT, while also demonstrating robustness in vision-based control without explicit state estimation.
Reference:
J. Pan, J. Xing, R. Reiter, Y. Zhai, E. Aljalbout, and D. Scaramuzza,
"Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation."
IEEE Robotics and Automation Letters RA-L
PDF: https://rpg.ifi.uzh.ch/docs/RAL26_Pan.pdf
Project page: https://rpg.ifi.uzh.ch/lotf/
Code: https://github.com/uzh-rpg/learning_on_the_fly
More info on our research in Drone Racing:
https://rpg.ifi.uzh.ch/research_drone_racing.html
More info on our research in Agile Drone Flight:
https://rpg.ifi.uzh.ch/aggressive_flight.html
More info on our research on Machine Learning:
https://rpg.ifi.uzh.ch/research_learning.html
Affiliations:
J. Pan, J. Xing, R. Reiter, Y. Zhai, E. Aljalbout, and D. Scaramuzza are with the Robotics and Perception Group, Dep. of Informatics, University of Zurich, https://rpg.ifi.uzh.ch/
0

Comments

Want to join the conversation?

Loading comments...