(19659001) Google DeepMind today (19659002)
Gemini Robotics will bring Gemini and AI into the physical world, with new models that can “perform a wide range of real-world activities than ever before.”
The goal is to build general-purpose robots with CEO Sundar Piichai.
adding how Google has “always thought of robotics as a helpful testing ground for translating AI advances into the physical world.”
“Gemini Robotics” is a vision-language-action (VLA) model built on Gemini 2.0 “with the addition of physical actions as a new output modality for the purpose of directly controlling robots.”
Going in, Google has “three principal qualities” for robotic AI models:
Generality: “able to adapt to different situations”
- Gemini Robotics is “adept at dealing with new objects, diverse instructions, and new environments,” including “tasks it has never seen before in training” by leveraging Gemini’s underlying world understanding.
Dexterity : “can do things that people can do with their fingers and hands, like carefully manipulating objects.”
-
Google announced the Gemini Robotics -ER (“embodied reason”) vision-language with enhanced spatial “understanding the world in ways needed for robotics” and allowing roboticists to connect with their existing low-level controllers.
When shown a coffee mug the model can intuit a two-finger grip for picking it up by its handle and a safe path for approaching it.
These model run on various robot forms (including bi-arms and humanoid robotics), with trusted testes like Agile Robots.