8.8 C
New York

New Apple study teaches robots to act by watching first person videos of humans

Published:

A new paper entitled “
Humanoid Policy is Human Policy
Apple researchers suggest an interesting way of training humanoid robotics. It involves wearing an Apple Vision Pro.

Robot see, robot act

This project is a collaborative effort between Apple, MIT Carnegie Mellon University of Washington and UC San Diego. It examines how first-person footage from people manipulating objects could be used to train general purpose robot models.

The researchers collected over 25,000 human demos and 1,500 robotic demonstrations (a dataset called PH2D) and fed them into an unified AI policy which could then control a humanoid robot on the physical world.

According to the authors

Explain
:

Training manipulating policies for humanoid robotics with diverse data enhances robustness and generalization of their tasks and platforms. Learning solely from robot demos is labor intensive and requires expensive teleoperated collection of data, which is difficult for scale. This paper investigates an easier-to-scale data source: egocentric human demos. These can be used as cross-embodiment data for robot training.

What is their solution? Let the human show you the way.

Cheaper and faster training

The team developed an Apple Vision Pro application that captures video using the device’s bottom left camera and uses Apple’s ARKit for tracking 3D hand and head motion.

To explore a cheaper solution, the team also 3D printed a mount that allows a ZED Mini Stereo to be attached to other headsets like the Meta Quest 3 which offers similar 3D tracking at a low cost.

The result was a setup that let them record high-quality demonstrations in seconds, a pretty big improvement over traditional robot tele-op methods, which are slower, more expensive, and harder to scale.

And here’s one last interesting detail: since people move way faster than robots, the researchers slowed down the human demos by a factor of four during training, just enough for the robot to keep up without needing further adjustments.

The Human Action Transformer, or HAT

is the key to the entire study. It is a model that has been trained on both robot and human demonstrations using a common format.

HAT does not split the data into two sources (humans and robots), but instead learns one policy that is generalized across both types. This makes the system more flexible, and efficient with data.

This shared training approach, compared to traditional methods, helped the robot perform more challenging tasks.

PH2D dataset sizes compared to traditional methods

The study is interesting and worth reading for those who are interested in robotics.

Do you find the idea of a humanoid robot in your home exciting, terrifying, or even pointless? Tell us in the comments. Add 9to5Mac’s Google News feed to your Google News.FTC: we use auto affiliate links that earn income. More.




www.roboticsobserver.com

Related articles

spot_img

Recent articles

spot_img