Home Industrial Robotics How robots learn: A brief, contemporary history

How robots learn: A brief, contemporary history

0

For decades, roboticists envisioned creating machines as sophisticated as the human body, yet their practical achievements often fell short. Ambitions to develop advanced humanoid robots frequently resulted in simpler, specialized devices like robotic arms for manufacturing or household gadgets such as robotic vacuum cleaners.

The ultimate goal for many was a science-fiction-style robot-capable of navigating diverse environments, adapting autonomously, and interacting safely and helpfully with humans. Such robots could transform lives by assisting individuals with mobility challenges, alleviating social isolation, or performing hazardous tasks unsuitable for people. From a business perspective, they promised an endless supply of labor without wages. Despite these lofty aims, repeated setbacks left many investors in Silicon Valley cautious about backing humanoid robotics.

That landscape is rapidly evolving. Although fully capable humanoid robots remain under development, investment surged dramatically, with $6.1 billion poured into humanoid robotics in 2025-quadruple the amount invested in 2024.

This surge is driven by breakthroughs in how robots learn to perceive and interact with their surroundings.

From Rule-Based Programming to AI-Driven Learning

Consider a robot designed solely to fold laundry. Traditionally, engineers would painstakingly program explicit instructions: identify fabric type, locate the collar, fold sleeves precisely, and adjust for any twists or rotations. This rule-based approach quickly becomes unwieldy as the number of scenarios grows exponentially, requiring exhaustive preprogramming to handle every possible variation.

Starting around 2015, a paradigm shift occurred. Instead of hardcoding rules, researchers began using digital simulations where robots learned through trial and error, receiving rewards for successful actions and penalties for failures. This reinforcement learning approach, similar to how AI mastered complex games, allowed robots to improve by experimenting with millions of iterations.

The launch of ChatGPT in 2022 accelerated this transformation. Large language models (LLMs), trained on vast text datasets, predict subsequent words in sentences rather than relying on trial and error. Robotics adapted these models to process visual inputs, sensor data, and joint positions, enabling robots to generate precise motor commands multiple times per second.

This shift toward AI models trained on extensive data has proven effective across various robotic functions-whether engaging in conversation, navigating environments, or executing intricate tasks. Coupled with strategies like deploying imperfect robots to learn directly in their operational settings, this approach has reignited Silicon Valley’s enthusiasm for ambitious humanoid robotics projects.


Early Social Robots: The Case of Jibo

Jibo: A Pioneer in Social Robotics

In 2014, MIT researcher Cynthia Breazeal unveiled Jibo, a faceless, armless robot resembling a modern lamp. Designed as a social companion for families, Jibo raised $3.7 million through crowdfunding, with early units priced at $749.

Jibo could introduce itself and perform simple dances to entertain children, but its capabilities were limited. The vision was for Jibo to evolve into a personal assistant managing schedules, emails, and storytelling. Despite a loyal user base, the company ceased operations in 2019.

One critical limitation was Jibo’s rudimentary language processing, which lagged behind contemporaries like Apple’s Siri and Amazon’s Alexa. These early voice assistants relied on scripted responses, resulting in repetitive and mechanical interactions-an obstacle for a robot intended to foster social connection.

Since then, AI-driven natural language generation has revolutionized machine communication. Today’s voice interfaces are far more engaging, though they introduce new challenges, such as unpredictable or inappropriate responses. For instance, some AI-powered toys have controversially provided children with unsafe information, highlighting the need for robust content moderation.


Advances in Robotic Manipulation: OpenAI’s Dactyl

Dactyl: Training Dexterous Robot Hands in Simulation

By 2018, the robotics community widely embraced training robots through simulation-based trial and error. OpenAI’s Dactyl project focused on teaching a robotic hand to manipulate small cubes marked with letters and numbers, such as rotating a cube to display a specific face.

A major hurdle was the “reality gap”: robots trained in virtual environments often struggled when transferred to the physical world due to subtle differences in lighting, texture, and material properties.

To overcome this, OpenAI employed domain randomization-creating millions of simulated environments with varying conditions like friction, lighting, and color. This diversity helped Dactyl generalize its skills to real-world tasks. Within a year, Dactyl advanced to solving Rubik’s Cubes, achieving success rates of 60% overall and 20% on complex scrambles.

Although OpenAI paused its robotics division in 2021, it has recently resumed efforts, reportedly focusing on humanoid robots.


Google DeepMind’s Vision-Language Robotics

RT-2: Integrating Internet-Scale Visual Data for Robotic Control

In 2022, Google DeepMind embarked on an ambitious project, recording humans performing 700 distinct tasks-from opening jars to handling snack bags-to train a foundational robotics model.

The initial model, RT-1, translated visual inputs and robotic joint positions into motor commands, achieving a 97% success rate on familiar tasks and 76% on novel ones.

Its successor, RT-2, expanded training beyond robotics-specific data to include diverse internet images, enabling the robot to better understand object locations and contexts. This allowed commands like “Place the soda can next to the photo of a musician” to be executed accurately.

In 2025, Google DeepMind further merged large language models with robotics through the Gemini Robotics model, enhancing natural language comprehension and command execution.


Practical Warehouse Automation: Covariant’s RFM-1

RFM-1: Collaborative Robotic Arms in Logistics

Emerging from former OpenAI engineers in 2017, Covariant focused on pragmatic robotics-developing AI-powered arms for warehouse automation. Their platform, based on foundation models akin to Google’s, was deployed in facilities operated by companies like Crate & Barrel.

By 2024, Covariant introduced RFM-1, a robotic arm capable of interactive collaboration. For example, after being shown multiple sleeves of tennis balls, it could sort them as instructed and even request guidance on optimal gripping methods when uncertain.

While impressive, RFM-1 still faced challenges. In a demonstration involving kitchen items, it struggled to “return the banana” to its original spot, mistakenly handling other objects first. This highlighted limitations in understanding new concepts without sufficient training data.

Covariant’s founders were later recruited by Amazon, which licenses the company’s robotics technology for use in its extensive network of over 1,300 U.S. warehouses.


Humanoid Robots Enter the Workforce: Agility Robotics’ Digit

Digit: Functional Humanoids in Real-World Applications

Recent investments prioritize humanoid robots designed to operate seamlessly in human environments, avoiding costly infrastructure changes required for non-humanoid machines.

Agility Robotics’ Digit exemplifies this trend. With a utilitarian design featuring exposed joints and a non-anthropomorphic head, Digit is deployed by Amazon, Toyota, and logistics firms like GXO to handle tasks such as moving shipping totes.

Despite progress, Digit’s capabilities remain limited-it can lift up to 35 pounds, and increasing strength adds battery weight and reduces operational time. Additionally, safety standards for humanoids are more stringent due to their mobility and proximity to humans.

Agility combines simulation training methods with integration of Google’s Gemini models to enhance adaptability, reflecting a decade of iterative experimentation culminating in scalable, practical humanoid robots.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version