3.6 C
New York

DeepMind’s Quest for Self-Improving table tennis Agents

Published:

It is rare that a day passes without new robotic platforms being developed by academic labs or commercial startups around the world. Humanoid robotics, in particular, appear to be capable of helping us in factories and in the future in homes and hospitals. To be truly useful, these machines need sophisticated “brains”, which control their robotic bodies. Programming robots traditionally involves experts spending countless grueling hours scripting complex behaviors, and tuning parameters such as controller gains or movement-planning weights to achieve desired performance. Machine learning (ML), while promising, still requires human oversight and reengineering for robots that must learn complex behaviors. At

Google DeepMind
asked: How can we enable robots learn and adapt more holistically, continuously, and reduce the bottleneck for expert intervention to every significant improvement or new ability?

The answer to this question has been the driving force behind our robotics work. We are exploring paradigms in which two robotic agents can improve themselves by playing against each other. This is a step beyond systems that have been preprogrammed or that are only narrowly adaptive. Instead, we are looking at agents that can acquire new skills on the job. We built on our previous ML work with systems such as AlphaGo and AlphaFold to turn our attention to the demanding sports of

Table tennis as a testing platform

Table tennis is a great example of a highly dynamic and constrained environment that encapsulates the most difficult challenges in robotics. Table tennis requires a robot’s mastery of a number of difficult skills. It is not just about perception; it also involves extremely precise control in order to intercept the ball with the correct angle and speed, and strategic decision-making so as to outmaneuver the opponent. These elements make it a perfect domain for developing and testing robust learning algorithms capable of handling real-time interaction and complex physics. They also require high-level reasoning, and adaptive strategieswhich are directly transferable into applications such as manufacturing and potentially unstructured home environments.

The Self-Improvement Challenge.

Standard approaches to machine learning often fail when it comes time to enable continuous, autonomous learning. Imitation learning is a method where robots learn by imitating an expert. This requires us to collect a large number of human demonstrations to demonstrate each skill or variation. In the same way, reinforcement learning, where agents are trained through trial-and error guided by rewards or penalties, requires that human designers carefully engineer complex mathematical reward function to capture desired behaviors for multiple-faceted tasks. They then have to adapt them when the robot needs to learn new skills or improve. Both of these well-established methods have traditionally involved substantial human involvement, particularly if the goal was to have the robot continuously improve beyond its initial programming. Therefore, we posed a direct challenge to our team: Can robots learn and enhance their skills with minimal or no human intervention during the learning-and-improvement loop?

Learning Through Competition – Robot vs. Robot

We explored an innovative approach that mirrors the AlphaGo strategy: Agents learn by competing with themselves. We tried having two robot arms compete against each other in table tennis, a simple but powerful idea . As one robot improves its strategy, the opponent is forced to adjust and improve, creating an ever-escalating cycle of skill levels.



DeepMind

In order to enable the extensive training required for these paradigms we engineered a completely autonomous table-tennis system. This setup was able to operate continuously, with automated ball collection and remote monitoring and controlling, allowing us the opportunity to run experiments over extended periods of time without direct involvement. In a first step we successfully trained an agent robot (replicated independently on both robots) to play cooperative rallies using reinforcement learning. We fine-tuned our agent for a few real-world hours in the robot-versus-robot set-up, resulting in an agent policy that could hold long rallies. Then we tackled the robot-versus-robot game in a competitive environment.

The cooperative agent did not work well out of the box in competitive play. This was expected because in cooperative play the rallies would settle down into a narrow area, limiting the number of balls that the agent could hit back. Our hypothesis was that as we continued to train with competitive play, the distribution of balls would slowly increase as we rewarded robots for beating their opponents. Training systems through competitive self play in the real world were not without their challenges. The model’s limited size made it difficult for the model to increase distribution. It was difficult for the model learn to deal effectively with the new shots without forgetting the old shots. We quickly hit a local minima in the training, where after a brief rally, one robot would strike an easy winner and the second robot could not return it.

While the robot-on-robot competition has been a difficult problem to solve, our team also looked into

How the robot could compete with humans
During the early stages, humans were better at keeping the ball in the game, increasing the number of shots the robot could learn. We had to create a policy architecture that consisted of low-level players with detailed skill descriptors, and a higher-level player who chose the low-level skills. We also needed to enable a zero-shot simulation-to-real technique to allow our system adapt to unseen competitors in real time. In a user-study, the robot won all its matches despite losing all its matches to the most advanced players. It also won about half of the matches it played against intermediate players. This shows that the robot is performing at a human-level. With these innovations and a better starting place than cooperative play we are in an excellent position to return to robot-versus robotic training.


DeepMind.

AI Coach: VLMs enter the game.

Another intriguing idea we explored leverages the power and flexibility of vision language models, like Gemini. Could a VLM be a coach by observing a player robot and giving guidance for improvement?

DeepMind

This project has provided an important insight that VLMs are useful for explainable robot policy searches . This insight led us to develop the SAS Prompt (summarize analyze synthesize) which allows for iterative learning of robot behavior. The VLM’s ability is to retrieve, reason and optimize in order to synthesize a new behavior. Our approach can be considered an early example of an entirely implemented family of explainable policies-search methods. There is also no reward function. The VLM infers that the reward comes directly from the observations made in the task description. The VLM can become a coach who constantly analyzes a student’s performance and gives suggestions on how to improve.

DeepMind.

Towards Truly Learned robotics: An Optimistic Perspective

For the future of robots, it is important to move beyond the limitations of traditional ML and programming techniques. We are developing methods that enable autonomous self-improvement. These methods reduce the need for human effort. Our table-tennis project explores pathways towards robots that can learn and refine complex skills autonomously. These approaches, while presenting significant challenges (stabilizing robot-versus robotic learning and scaling VLM based coaching are formidable task), offer a unique opportunity. We are confident that research in this area will lead to machines that are more adaptable and capable of learning the skills required to operate safely and effectively in our unstructured environment. The journey may be long, but the potential rewards of intelligent and helpful robot partners are worth it.

We are grateful to the Google DeepMind Robotics Team, in particular David B. D’Ambrosio and Saminda Abeyruwan. Also, Laura Graesser and Atil Iscen. Alex Bewley and Kristareymann have made a significant contribution to the development of this work.








www.roboticsobserver.com

Related articles

spot_img

Recent articles

spot_img