ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback


Sirui Chen*1
Chen Wang*1
Kaden Nguyen1
Li Fei-Fei1
C. Karen Liu1


Stanford University

* Equal contribution








Abstract

Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products.




AR Feedbacks

Feedback
ARCap send visual haptic feedback when virtual robot violate constraints.



Portable System Design

System
ARCap is portable and can be carried inside one backpack.



Test-time Calibration

calibration
With ARCap Unity App, hand eye calibration during test time become aligning virtual robot with the actual robot.



Diffusion Policy trained from collected data

Manipulation in cluttered scene

tennis
Collected using 30 minutes data using ARCap, without data collected from teleoperation.

Long-horizon manipulation with different embodiment

lego
Collected using 60 minutes data using ARCap, without data collected from teleoperation.

Cross embodiment bimanual manipulation

lego
Collected using 60 minutes data using ARCap, without data collected from teleoperation.



User Study

UserStudy
We invite 20 users to collect data with ARCap and DexCap. They have different familiarity with robot learning and AR/VR.
UserStudy2
Most user find ARCap visual haptic feedback helpful
UserStudy3
Combining data from 20 users, we can train autonomous policy



Citation

@article{chen2024arcap,
            title={ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback},
            author={Chen, Sirui and Wang, Chen and Nguyen, Kaden and Fei-Fei, Li and Liu, C Karen},
            journal={arXiv preprint arXiv:2410.08464},
            year={2024}
          }




Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project. It was adapted to be mobile responsive by Jason Zhang for PHOSA. The code can be found here.