Harrison Kinsley

Creator

0 followers

gpus and tractors Neural networks from Scratch book: https://t.co/hyMkWyUP7R https://t.co/8WGZRkUGsn

SAC Simplifies Quadruped Gait Tuning over PPO

In a world of PPO everything for reinforcement learning, I've been tinkering with SAC for training a quadruped gait. This gait is trained purely on CPU (training on one of the Dell GB10s) on a single environment. Training any particular run is obviously slower than PPO on an RTX Pro 6000 with 8092 envs, if you already know the exact hyperparams/rwd function for your PPO algo... but, if we're honest with ourselves, then we know we usually spend days tuning our PPO algo and fighting it to do what we want. In contrast, SAC has kind of been a breath of fresh air, very amenable to changing the reward function to tune behavior. So far, my first attempts to tune things have consistently just worked immediately rather than 15 different variations of reward hacking only to find previous tuned behaviors got lost in the process. There is also FastSAC, which I've not yet tried, but can speed things up potentially and introduce scale back into the equation. My main painpoint in getting SAC to work for gait was actually getting it to learn to step. It seems as though SAC is not as good as PPO at significant exploration on its own. I ended up starting with a sinusoidal gait (basically just a rule to make legs swing) as training wheels then blended it out through training as phase 1, then began working on smoothing things out after this. I think if we look at end to end dev time rather than any particular run that finally managed to work, SAC may actually be the "faster" algorithm to train. Quadruped gaits are inherently easier than bipedal and maybe there are areas where SAC falls short, but I'll definitely be spending more time with SAC.

By Harrison Kinsley

Social•Feb 13, 2026

Incentives Keep Frontier AI Labs Private, Excluding Public Investors

it's deeply unfortunate that incentives have aligned such that all these frontier AI labs and other tech companies are way better off staying private and the general public cannot invest in them despite desperately wishing they could.

By Harrison Kinsley

Social•Jan 6, 2026

New Agents Accelerate Robot Scene Creation and Task Design

still very much in r&d but one of my fav projects being worked on at lucky is these agents that make creating the scene and what you actually want the robot(s) to do faster, just such a cool concept.

By Harrison Kinsley

Social•Jan 5, 2026

Building the Robotics Simulator I Dreamed Of

I’ve joined Lucky Robots as director of eng & AI to help build the robotics simulator I wish we had right now. If you’re someone who likes to make robots do cool stuff or want to shape a sim being built...

By Harrison Kinsley

Social•Dec 29, 2025

Hand‑Centric Policy Works Sim‑to‑Real, Needs Lidar Localization

a more in-home-acceptable hand-centric policy. sim2sim shown, it does work sim2real, but need to improve localisation w/ lidar. Changes: Good hand rotation, torso/head/camera actually faces where we intend to go. Semi decent walking, crouch, and general pose. Still all PPO https://t.co/hK5F0sJuJL

By Harrison Kinsley

Social•Dec 23, 2025

Fine‑tuning PPO Rewards Yields Smoother, Slower Gait

further reward crafting for hand-centric (tm) PPO model. Closing in on what I want. need to slow down max velocities and I want more stepping/stride + less tip toe on left foot ideally. almost cant believe this is actually just RL! https://t.co/ro56yI8yaV

By Harrison Kinsley

Social•Dec 22, 2025

Hand‑Centric Policy Gains Flexibility and Real‑World Viability

some improvements to the hand-centric policy crouches really well and even crouch walks, we can definitely pick stuff up off the floor with this I think I'd like to penalize torso velocity beyond a certain speed steerable dynamic hand rotation. I initially always...

By Harrison Kinsley

Social•Dec 21, 2025

Custom Reward Tweaks Yield Rapid, Satisfying Learning

we do a bit of reaching. modifying the mjlab velocity demo reward function to do custom behaviors is so satisfying because the base function just seems to learn so darn fast. It only takes a few minutes to start seeing...

By Harrison Kinsley

Social•Dec 19, 2025

First Successful Sim‑to‑Real RL Walk for Unitree G1

New video is out, teaching a Unitree G1 humanoid to walk using reinforcement learning (PPO). First time I've ever got sim2real to actually work with robotics, sharing what I've learned and testing out how good the policy actually is by...

By Harrison Kinsley

Technology Pulse

Harrison Kinsley

Recent Posts

SAC Simplifies Quadruped Gait Tuning over PPO

Incentives Keep Frontier AI Labs Private, Excluding Public Investors

New Agents Accelerate Robot Scene Creation and Task Design

Building the Robotics Simulator I Dreamed Of

Hand‑Centric Policy Works Sim‑to‑Real, Needs Lidar Localization

Fine‑tuning PPO Rewards Yields Smoother, Slower Gait

Hand‑Centric Policy Gains Flexibility and Real‑World Viability

Custom Reward Tweaks Yield Rapid, Satisfying Learning

First Successful Sim‑to‑Real RL Walk for Unitree G1

Technology Pulse

Harrison Kinsley

Recent Posts

SAC Simplifies Quadruped Gait Tuning over PPO

Incentives Keep Frontier AI Labs Private, Excluding Public Investors

New Agents Accelerate Robot Scene Creation and Task Design

Building the Robotics Simulator I Dreamed Of

Hand‑Centric Policy Works Sim‑to‑Real, Needs Lidar Localization

Fine‑tuning PPO Rewards Yields Smoother, Slower Gait

Hand‑Centric Policy Gains Flexibility and Real‑World Viability

Custom Reward Tweaks Yield Rapid, Satisfying Learning

First Successful Sim‑to‑Real RL Walk for Unitree G1