Actor-Critic MPC: Differentiable Optimization Meets Reinforcement Learning for Agile Flight (TRO'25)

•January 19, 2026

0

UZH Robotics and Perception Group

UZH Robotics and Perception Group•Jan 19, 2026

Why It Matters

By combining model‑based predictability with reinforcement‑learning flexibility, ACMPC delivers robust, data‑efficient control for complex aerial robots, accelerating real‑world adoption of autonomous flight.

Key Takeaways

•Hybrid actor-critic architecture integrates differentiable MPC for agility
•Differentiable MPC supplies dynamics prior knowledge before training data
•Cost‑map neural network learns observation‑to‑cost mapping dynamically for MPC
•ACMPC outperforms standard MPC in out‑of‑distribution scenarios significantly
•Model predictive value expansion boosts sample efficiency during training

Summary

The paper presents Actor‑Critic Model Predictive Control (ACMPC), a hybrid framework that merges a differentiable MPC module with an actor‑critic reinforcement‑learning architecture to achieve agile flight in highly nonlinear quadrotor systems.

By embedding a dynamics model directly into the MPC, the agent receives prior knowledge before any data is collected, while a deep cost‑map network translates raw observations into the MPC’s cost function. Experiments demonstrate that ACMPC maintains robustness in out‑of‑distribution conditions and adapts to substantial variations in system parameters without retraining, outperforming both model‑free RL and conventional MPC.

On the split‑S track, ACMPC’s success rate remained high as dynamics parameters were perturbed, whereas standard MPC degraded sharply. The authors also introduce Model Predictive Value Expansion, leveraging MPC predictions to refine the critic’s value function, which yields markedly better sample efficiency. Visualizations of the learned value function reveal rapid shifts toward upcoming gates, producing emergent mode‑switching behavior that traditional MPC cannot replicate.

These results suggest a pathway to safer, more interpretable autonomous flight controllers that require less hand‑tuning and can generalize across changing environments, opening opportunities for deployment in commercial drones, delivery services, and other high‑risk robotics applications.

Original Description

A key open challenge in agile quadrotor flight is how to combine the flexibility and task-level generality of model-free reinforcement learning (RL) with the structure and online replanning capabilities of model predictive control (MPC), aiming to leverage their complementary strengths in dynamic and uncertain environments. This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. This integration allows for short-term predictive optimization of control actions through MPC, while leveraging RL for end-to-end learning and exploration over longer horizons. Through various ablation studies, conducted in the context of agile quadrotor racing, we expose the benefits of the proposed approach: it achieves better out-of-distribution behavior, better robustness to changes in the quadrotor’s dynamics and improved sample efficiency. Additionally, we conduct an empirical analysis using a quadrotor platform that reveals a relationship between the critic’s learned value function and the cost function of the differentiable MPC, providing a deeper understanding of the interplay between the critic’s value and the MPC cost functions. Finally, we validate our method in a drone racing task on different tracks, in both simulation and the real world. Our method achieves the same superhuman performance as state-of-the-art model-free RL, showcasing speeds of up to 21 m/s. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior.

Reference:

A. Romero, E. Aljalbout, Y. Song, D. Scaramuzza,

"Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight",

IEEE Transactions on Robotics 2025

PDF: https://rpg.ifi.uzh.ch/docs/TRO25_ACMPC_Romero.pdf

Code: https://github.com/uzh-rpg/acmpc_public

For more info about our research on:

Agile Drone Flight: https://rpg.ifi.uzh.ch/aggressive_flight.html

Drone Racing: https://rpg.ifi.uzh.ch/research_drone_racing.html

Machine Learning: https://rpg.ifi.uzh.ch/research_learning.html

Affiliations:

A. Romero, E. Aljalbout, Y. Song, and D. Scaramuzza are with the Robotics and Perception Group, Dep. of Informatics, University of Zurich, and Dep. of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland

https://rpg.ifi.uzh.ch/

0

Comments

Want to join the conversation?

Loading comments...