
The paper presents Actor‑Critic Model Predictive Control (ACMPC), a hybrid framework that merges a differentiable MPC module with an actor‑critic reinforcement‑learning architecture to achieve agile flight in highly nonlinear quadrotor systems. By embedding a dynamics model directly into the MPC, the agent receives prior knowledge before any data is collected, while a deep cost‑map network translates raw observations into the MPC’s cost function. Experiments demonstrate that ACMPC maintains robustness in out‑of‑distribution conditions and adapts to substantial variations in system parameters without retraining, outperforming both model‑free RL and conventional MPC. On the split‑S track, ACMPC’s success rate remained high as dynamics parameters were perturbed, whereas standard MPC degraded sharply. The authors also introduce Model Predictive Value Expansion, leveraging MPC predictions to refine the critic’s value function, which yields markedly better sample efficiency. Visualizations of the learned value function reveal rapid shifts toward upcoming gates, producing emergent mode‑switching behavior that traditional MPC cannot replicate. These results suggest a pathway to safer, more interpretable autonomous flight controllers that require less hand‑tuning and can generalize across changing environments, opening opportunities for deployment in commercial drones, delivery services, and other high‑risk robotics applications.

The video introduces a multitask reinforcement‑learning framework that trains a single, generalist controller for quadrotors capable of handling stabilization, high‑speed racing, and velocity‑tracking commands. By partitioning sensor inputs into shared and task‑specific observations, the system feeds each through a common...

The paper introduces a novel online learning framework—Rapid Policy Adaptation via Differentiable Simulation (RA‑L 2026)—that lets quadrotor controllers adjust to unknown disturbances in seconds during real‑world deployment. The method starts with a low‑fidelity, fully differentiable dynamics model to train a policy...