Training a Unitree G1 to Walk W/ Reinforcement Learning

•December 19, 2025

0

Harrison Kinsley

Harrison Kinsley•Dec 19, 2025

Why It Matters

By showing that a hobbyist can reliably bridge simulation and reality for a commercial quadruped, the video signals a democratization of advanced robot locomotion, accelerating research and commercial applications in autonomous service robots.

Summary

The video chronicles a creator’s effort to teach a Unitree G1 quadruped to walk using reinforcement‑learning techniques, emphasizing the transition from pure simulation (Sim2Sim) to real‑world deployment (Sim2Real). After years of attempting Sim2Real, the presenter finally succeeded thanks to advances in actuator quality and a more faithful simulation environment.

Key technical takeaways include the choice of MJLab as the simulation platform, the shift from an implicit to an explicit PD controller to avoid privileged data, and the practical training setup—running dozens of parallel environments on an RTX 4090, with policies converging after roughly 5,000 to 50,000 iterations depending on terrain complexity. The presenter also highlights the importance of matching observation spaces, noting that the real G1 lacks direct linear‑velocity measurements, which complicates transfer.

Notable moments feature the creator’s exuberant reaction when the robot first balanced on its own, a shout‑out to Kevin Zaka (author of MJLab) for community support, and a candid discussion of the robot’s blind navigation—struggling on stairs and relying solely on proprioception. The policy, while not perfect, demonstrates stable walking on uneven foam pits and modest rough‑terrain handling, underscoring both the promise and current limits of the approach.

The broader implication is that Sim2Sim pipelines, when carefully aligned with real‑world control loops, can now be leveraged by individual developers rather than large labs. This lowers the entry barrier for deploying learning‑based locomotion on affordable quadrupeds, paving the way for more complex tasks such as household assistance, provided sensor gaps (e.g., velocity estimation) are addressed.

Original Description

Using mjlab and PPO to train the Unitree G1 humanoid to walk inside and outside

Neural Networks from Scratch book: https://nnfs.io

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join

Discord: https://discord.gg/sentdex

Reddit: https://www.reddit.com/r/sentdex/

Support the content: https://pythonprogramming.net/support-donate/

Twitter: https://twitter.com/sentdex

Instagram: https://instagram.com/sentdex

Facebook: https://www.facebook.com/pythonprogramming.net/

Twitch: https://www.twitch.tv/sentdex

0

Comments

Want to join the conversation?

Loading comments...