
New RoboReward Dataset and Models Automate Robotic Training and Evaluation
Why It Matters
Automated reward modeling cuts costly human labeling, accelerating robot skill acquisition and lowering development expenses for the robotics industry.
New RoboReward dataset and models automate robotic training and evaluation
By Ingrid Fadelli · Edited by Gaby Clark · Fact‑checked by Robert Egan · Published January 15 2026

The authors introduce a dataset and evaluation for general‑purpose reward models for robotics. Credit: Lee et al.
The advancement of artificial intelligence (AI) algorithms has opened new possibilities for the development of robots that can reliably tackle various everyday tasks. Training and evaluating these algorithms, however, typically requires extensive efforts, as humans still need to manually label training data and assess the performance of models in both simulations and real‑world experiments.
Researchers at Stanford University and UC Berkeley have introduced RoboReward, a dataset for training and evaluating AI algorithms for robotics applications, specifically vision‑language reward‑based models (VLMs).
Their paper, published on the arXiv preprint server, also presents RoboReward 4B and 8B, two new VLMs that were trained on this dataset and outperform other models introduced in the past.
“From our experience, training autonomous robots is expensive, because it requires lots of human labor and intervention,” Tony Lee, first author of the paper, told Tech Xplore.
“One part of the process is that a human often must watch and label many robot rollouts as successes or failures. We wanted to study whether vision‑language models (VLMs) can automate part of the training, serving as reward models that score how well a robot performed a task, enabling us to train robot policies with less manual human supervision.”

The authors trained a general‑purpose vision‑language reward model that closes much of the performance gap to human‑given rewards for real‑world robot training, as shown on two tasks (open drawer and pick‑and‑place monkey). Credit: Lee et al.
RoboReward: The new dataset and benchmark
RoboReward, the new dataset compiled by Lee and his colleagues, contains many videos showing real robots completing various real‑world tasks, coupled with plain‑language text descriptions and progress scores. The dataset is designed for training VLMs to judge how well robots completed specific tasks.
“We built this dataset by augmenting existing success‑heavy robot demonstration datasets with realistic failures and near‑misses,” explained Lee.
“We then trained VLMs on this data, so they can watch a robot’s video carrying out a task (as given by the task description) and output a high‑quality reward signal during training. Best of all, we open‑sourced everything in the spirit of open science, including the dataset, evaluation set, trained models, and the leaderboard.”
Using the RoboReward dataset, Lee and his colleagues trained two new general‑purpose VLMs for robotics applications. Both models were found to perform remarkably well, allowing robots to rapidly and reliably acquire new skills without continuous human feedback.
“We also introduced RoboRewardBench, a human‑verified evaluation suite,” said Lee. “This benchmark shows that today’s frontier models are still not as reliable as automatic reward models across embodiments and scenes, as physical reasoning for these models remains a big challenge.”
“RoboReward 4B and 8B were found to outperform much larger state‑of‑the‑art VLMs (e.g., Gemini Robotics‑ER 1.5) on reward accuracy. Our trained model also closes much of the performance gap to training with human‑provided rewards in real‑world robot experiments.”

The authors trained a general‑purpose vision‑language reward model that closes much of the performance gap to human‑given rewards for real‑world robot training, as shown on two tasks (open drawer and pick‑and‑place monkey). Credit: Lee et al.
Contributing to the advancement of robotics vision‑language models
The RoboReward dataset and the models developed by this research team are open‑source and can be accessed on the team’s website.
In the future, they could guide the development of new models that teach robots to complete tasks via reward‑based processes, while also allowing developers to reliably assess their algorithms.
“We hope that this work will inform the development of new general‑purpose reward models for robotics that can automate parts of robot training,” added Lee.
“Looking ahead, we want to extend reward modeling to longer‑horizon tasks and make automated reward models more reliable and calibrated when deployed for real‑world training. We also hope RoboReward motivates broader improvements in large vision‑language models, so they become better at physical reasoning and understanding fine‑grained spatial and temporal details.”
Reference
Tony Lee et al., RoboReward: General‑Purpose Vision‑Language Reward Models for Robotics, arXiv 2026. DOI: 10.48550/arxiv.2601.00675.
Comments
Want to join the conversation?
Loading comments...