DexWild-System is a portable, high-fidelity, and embodiment-agnostic data collection system capable of efficiently gathering dexterous human hand data across many environments. We employ a team of untrained data collectors and collect 9,290 demonstrations across 93 diverse environments at 4.6× the speed of teleoperation.
DexWild aligns both the observation space and the action space between humans and robots. We take advantage of palm mounted cameras, that provide us with a wide field of view that can see the task and environment clearly, but cannot see much embodiment specific information. This allows us cotrain data collected from different embodiments effectively. We convert all hand poses to a 17 dimensional joint angle space and all wrist poses to relative actions 6D space.
Training with DexWild data, our policies are capable of zero-shot transfer to a new robot arm and few-shot transfer to a new robot hand. This is because DexWild data is embodiment-agnostic. The cameras only capture task-relevant information and the human hand data can be retargeted to any end effector-we train on LEAP V2 Advanced and Xarm6 yet can transfer to Franka Panda arms and LEAP Hands. This enables us to deploy on entirely new robots, in unseen environments, and with never-before-seen objects.
Policies trained using DexWild data are capable of generalizing to completely unseen
environments. DexWild enables generalization in indoor, outdoor, crowded, and cluttered environments
with different lighting conditions! Our models demonstrate strong performance in these environments,
largely attributed to the diversity and quality of the DexWild dataset.
DexWild also faciliates effective transfer of skills from the spraying task to the pouring task, requiring no additional robot data!
💪 and 🦾 indicates number of human and robot demonstrations, respectively.
*pouring task: 0 | spray task: 388
Scaling robot data is challenging—it’s expensive, tedious, and often confined to a single robot, environment, and limited objects. As a result, policies trained solely on robot data tend to overfit to specific embodiments or settings and struggle to generalize. We show that even with varied environments, robot-only training fails to generalize to new scenes and objects. Cotraining with DexWild introduces a harder, more diverse learning problem, forcing the policy to better understand the task and scene. This results in more robust, generalizable behavior.
@article{tao2025dexwild,
title={DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies},
author={Tao, Tony and Srirama, Mohan Kumar and Liu, Jason Jingzhou and Shaw, Kenneth and Pathak, Deepak},
journal={Robotics: Science and Systems (RSS)},
year={2025}}