DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

DexWild enables generalization to unseen objects, environments, and robot embodiments by scaling up human demonstrations collected with a low-cost, mobile capture system. With 9,290 demos across 93 environments, our co-training framework leverages both human and robot data to learn robust policies that generalize across indoor/outdoor, cluttered scenes, and diverse tasks—achieving 4× higher success rate in novel environments and 5.8× better embodiment transfer than robot-only training.

Let's go on an adventure together…


A peaceful morning stroll—until we spot a messy table! Time to tidy up.

What lovely flowers! Let’s place them carefully into the vase.

Back to work—this bathroom needs a good clean!

I'm thirsty! I’ve never poured a drink before*… but I’ve seen it done—and handled similar bottles!

*pouring a drink — unseen in robot data, seen in human data

Let's try a new Robot Hand*—this one works just as well! I can barely tell a difference....

*LEAP Hand — LEAPV2 Advanced + Human + few shot LEAP Hand data

Laundry night! Let's fold it up neatly.

Time to recharge—ready to do it all again tomorrow!

DexWild-System

DexWild System Photo
DexWild-System Hardware
Example Demonstration

DexWild-System is a portable, high-fidelity, and embodiment-agnostic data collection system capable of efficiently gathering dexterous human hand data across many environments. We employ a team of untrained data collectors and collect 9,290 demonstrations across 93 diverse environments at 4.6× the speed of teleoperation.

We scale up data collection in diverse environments...

DexWild Aligns Humans and Robots

DexWild aligns both the observation space and the action space between humans and robots. We take advantage of palm mounted cameras, that provide us with a wide field of view that can see the task and environment clearly, but cannot see much embodiment specific information. This allows us cotrain data collected from different embodiments effectively. We convert all hand poses to a 17 dimensional joint angle space and all wrist poses to relative actions 6D space.

Human
LEAPV2 Advanced
LEAP Hand

DexWild enables X-Embodiment Transfer

Training with DexWild data, our policies are capable of zero-shot transfer to a new robot arm and few-shot transfer to a new robot hand. This is because DexWild data is embodiment-agnostic. The cameras only capture task-relevant information and the human hand data can be retargeted to any end effector-we train on LEAP V2 Advanced and Xarm6 yet can transfer to Franka Panda arms and LEAP Hands. This enables us to deploy on entirely new robots, in unseen environments, and with never-before-seen objects.

New-Hand

New-Arm

DexWild Generalizes to Completely Unseen Environments

Policies trained using DexWild data are capable of generalizing to completely unseen environments. DexWild enables generalization in indoor, outdoor, crowded, and cluttered environments with different lighting conditions! Our models demonstrate strong performance in these environments, largely attributed to the diversity and quality of the DexWild dataset. DexWild also faciliates effective transfer of skills from the spraying task to the pouring task, requiring no additional robot data!

💪 and 🦾 indicates number of human and robot demonstrations, respectively.

All Videos Autonomous 1×


Task: Florist 💐

💪 1,545 | 🦾 236


Task: Clothes Folding 👕

💪 1,124 | 🦾 290


Task: Pouring 🍾

💪 621 | 🦾 0*

*pouring task: 0 | spray task: 388


Task: Spraying 🔫

💪 3,000 | 🦾 388


Task: Toy Cleanup 🧸

💪 3,000 | 🦾 370


Why using Robot Data only doesn't work

Scaling robot data is challenging—it’s expensive, tedious, and often confined to a single robot, environment, and limited objects. As a result, policies trained solely on robot data tend to overfit to specific embodiments or settings and struggle to generalize. We show that even with varied environments, robot-only training fails to generalize to new scenes and objects. Cotraining with DexWild introduces a harder, more diverse learning problem, forcing the policy to better understand the task and scene. This results in more robust, generalizable behavior.

All Videos Autonomous 1×, Trained with Robot Data Only

BibTeX

@article{tao2025dexwild,
      title={DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies},
      author={Tao, Tony and Srirama, Mohan Kumar and Liu, Jason Jingzhou and Shaw, Kenneth and Pathak, Deepak},
      journal={Robotics: Science and Systems (RSS)},
      year={2025}}

Acknowledgements

We would like to thank Yulong Li, Hengkai Pan, and Sandeep Routray for thoughtful discussions. We’d also like to thank Andrew Wang for setting up compute and Yulong Li for helping with robot system setup. Lastly, we’d like to express thanks to Hengkai Pan, Andrew Wang, Adam Kan, Ray Liu, Mingxuan Li, Lukas Vargas, Jose German, Laya Satish, Sri Shasanka Madduri for helping collect data. This work was supported in part by AFOSR FA9550-23-1-0747 and Apple Research Award

You've reached the end...