ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback

Sirui Chen*¹

Chen Wang*¹

Kaden Nguyen¹

Li Fei-Fei¹

C. Karen Liu¹

¹Stanford University

* Equal contribution

Abstract

Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products.

AR Feedbacks

ARCap send visual haptic feedback when virtual robot violate constraints.

Portable System Design

ARCap is portable and can be carried inside one backpack.

Test-time Calibration

With ARCap Unity App, hand eye calibration during test time become aligning virtual robot with the actual robot.

Diffusion Policy trained from collected data

Manipulation in cluttered scene

Collected using 30 minutes data using ARCap, without data collected from teleoperation.

Long-horizon manipulation with different embodiment

Collected using 60 minutes data using ARCap, without data collected from teleoperation.

Cross embodiment bimanual manipulation

Collected using 60 minutes data using ARCap, without data collected from teleoperation.

User Study

We invite 20 users to collect data with ARCap and DexCap. They have different familiarity with robot learning and AR/VR.

Most user find ARCap visual haptic feedback helpful

Combining data from 20 users, we can train autonomous policy

Citation

@article{chen2024arcap,
            title={ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback},
            author={Chen, Sirui and Wang, Chen and Nguyen, Kaden and Fei-Fei, Li and Liu, C Karen},
            journal={arXiv preprint arXiv:2410.08464},
            year={2024}
          }

Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project. It was adapted to be mobile responsive by Jason Zhang for PHOSA. The code can be found here.