In a world of PPO everything for reinforcement learning, I've been tinkering with SAC for training a quadruped gait. This gait is trained purely on CPU (training on one of the Dell GB10s) on a single environment. Training any particular run is obviously slower than PPO on an RTX Pro 6000 with 8092 envs, if you already know the exact hyperparams/rwd function for your PPO algo... but, if we're honest with ourselves, then we know we usually spend days tuning our PPO algo and fighting it to do what we want. In contrast, SAC has kind of been a breath of fresh air, very amenable to changing the reward function to tune behavior. So far, my first attempts to tune things have consistently just worked immediately rather than 15 different variations of reward hacking only to find previous tuned behaviors got lost in the process. There is also FastSAC, which I've not yet tried, but can speed things up potentially and introduce scale back into the equation. My main painpoint in getting SAC to work for gait was actually getting it to learn to step. It seems as though SAC is not as good as PPO at significant exploration on its own. I ended up starting with a sinusoidal gait (basically just a rule to make legs swing) as training wheels then blended it out through training as phase 1, then began working on smoothing things out after this. I think if we look at end to end dev time rather than any particular run that finally managed to work, SAC may actually be the "faster" algorithm to train. Quadruped gaits are inherently easier than bipedal and maybe there are areas where SAC falls short, but I'll definitely be spending more time with SAC.
it's deeply unfortunate that incentives have aligned such that all these frontier AI labs and other tech companies are way better off staying private and the general public cannot invest in them despite desperately wishing they could.
still very much in r&d but one of my fav projects being worked on at lucky is these agents that make creating the scene and what you actually want the robot(s) to do faster, just such a cool concept.
I’ve joined Lucky Robots as director of eng & AI to help build the robotics simulator I wish we had right now. If you’re someone who likes to make robots do cool stuff or want to shape a sim being built...
a more in-home-acceptable hand-centric policy. sim2sim shown, it does work sim2real, but need to improve localisation w/ lidar. Changes: Good hand rotation, torso/head/camera actually faces where we intend to go. Semi decent walking, crouch, and general pose. Still all PPO https://t.co/hK5F0sJuJL
further reward crafting for hand-centric (tm) PPO model. Closing in on what I want. need to slow down max velocities and I want more stepping/stride + less tip toe on left foot ideally. almost cant believe this is actually just RL! https://t.co/ro56yI8yaV
some improvements to the hand-centric policy crouches really well and even crouch walks, we can definitely pick stuff up off the floor with this I think I'd like to penalize torso velocity beyond a certain speed steerable dynamic hand rotation. I initially always...
we do a bit of reaching. modifying the mjlab velocity demo reward function to do custom behaviors is so satisfying because the base function just seems to learn so darn fast. It only takes a few minutes to start seeing...
New video is out, teaching a Unitree G1 humanoid to walk using reinforcement learning (PPO). First time I've ever got sim2real to actually work with robotics, sharing what I've learned and testing out how good the policy actually is by...