Researchers at Cornell University have developed a new robotic platform powered by artificial intelligence called RHyME, which allows robots to be taught tasks by watching just one how-to video.
Some robots are finicky. In the past, robots have required precise, step by step instructions to complete basic tasks. They also tend to give up when things go wrong, such as after losing a screw or dropping a tool. RHyME could, however, speed up the development and deployment robotic systems by reducing the amount of time, energy, and money required to train them. Kushal Kedia is a doctoral candidate in computer science. “That’s not how humans do tasks. We look at other people as inspiration.”
Kedia’s paper “One-Shot Imitation under Mismatched Execution,” will be presented in Atlanta, at the Institute of Electrical and Electronics Engineers International Conference on Robotics and Automation in May.
Home robotic assistants are still far away because they lack the intelligence to navigate the physical universe and its many contingencies. Researchers like Kedia train robots with what are essentially how-to videos – human demonstrations of different tasks in a laboratory setting. This approach, a branch called “imitation learning,” of machine learning, is hoped to help robots learn a series of tasks faster and adapt to real-world situations. Sanjiban Choudhury is an assistant professor of computer sciences and the senior author of
“Our work is like translating French to English — we’re translating any given task from human to robot,” .
The translation task is still a challenge. Humans move too fluidly to be tracked and mimicked by a robot, and robots need a lot of video for training. Researchers said that video demonstrations of robots performing tasks such as picking up a napkin, or stacking plates, must be performed slowly, and with perfect precision, because any mismatch between the robot’s actions and the video has historically been fatal for robot learning. Choudhury explained. “Our thinking was, ‘Can we find a principled way to deal with this mismatch between how humans and robots do tasks?'”
RHyME, the team’s solution, is a scalable method that makes robots more adaptable and less finicky. It allows a robot to connect the dots and use its memory when performing tasks that it has only seen once. RHyME-equipped robotic systems, for example, will search their bank of videos to find similar actions – like grasping a cup or lowering an object – when shown a video showing a human fetching and placing a mug in a sink. Researchers said that RHyME allows robots to learn multi-step sequences, while reducing the amount of robot data required for training. RHyME only requires 30 minutes of robot data. In a lab setting robots trained with the system achieved more than 50% greater task success compared to other methods.