Unitree founder disputes VLA consensus and backs video-trained robot models

Unitree Robotics: Rethinking AI and Embodied Models in Robotics

Beyond Hardware: A New Vision from Unitree’s Founder

While Unitree Robotics has traditionally been recognized for its expertise in robot hardware, founder Wang Xingxing introduced a fresh perspective at the recent World Robot Conference (WRC). His keynote shifted focus from physical machines to the underlying algorithms and large-scale models that drive robotic intelligence.

Challenging the Status Quo: Critique of the VLA Framework

Wang openly criticized the prevailing vision-language-action (VLA) framework, which many embodied robots currently rely on. He bluntly described this architecture as “relatively simplistic,” emphasizing that its limitations stem largely from insufficient and low-quality data rather than the model design itself.

According to Wang, the industry’s heavy reliance on amassing vast datasets-whether through real-world robot interactions or simulated environments-has not yielded the expected breakthroughs. Instead, he advocates for a fundamental redesign of embodied models, arguing that current architectures lack the necessary sophistication and integration.

Reevaluating Data and Model Development Strategies

Contrary to popular belief, Wang suggests that the key to advancing robotics lies not in collecting more data but in enhancing model architecture. He pointed out that many companies focus excessively on foundational datasets, overlooking the potential of more unified and intelligent model designs.

Previously, Unitree emphasized its hardware capabilities, leading some to assume the company was less invested in AI development. Wang dispelled this notion, revealing that Unitree’s AI models are substantial, albeit smaller than those of major tech giants. He stressed that innovation does not require massive budgets or large teams, highlighting Unitree’s agile approach to model creation.

Innovating with Video-Driven Models

Unitree is exploring video-based model training, a method gaining traction after Google’s recent release of a world model trained on video data. This technique involves generating videos of robots performing tasks-such as tidying a room-and then using these videos to instruct real robots to replicate the actions.

Wang believes this video-centric approach could surpass VLA methods in efficiency and speed, despite the high computational demands of producing high-resolution video content. To address these challenges, he envisions deploying large-scale, cost-effective distributed computing clusters tailored for robotics, especially in environments like factories where multiple robots operate simultaneously and require low-latency communication.

Unitree’s Public Presence vs. Practical Applications

Unitree’s robots have gained visibility through performances at high-profile events such as Lunar New Year Galas and dynamic demonstrations at the World Artificial Intelligence Conference. Meanwhile, newer companies focus on practical household and industrial tasks like screwing bolts, folding laundry, and making beds.

Despite some critics dismissing Unitree’s robots as mere showpieces, Wang firmly disagrees. He explains that, at present, deploying robots for everyday factory or home use remains a significant challenge. Demonstrations and performances serve as realistic milestones while the technology matures.

The Road to Versatile, Multifunctional Robots

Wang envisions future robots that transcend simple chores, evolving into adaptable, general-purpose machines capable of diverse roles-such as serving tea in a factory setting and then performing on stage. This multifunctionality represents a key goal for Unitree’s ongoing research and development efforts.

When Will Robotics Experience Its “ChatGPT” Moment?

Asked about the timeline for a breakthrough in robotics comparable to AI’s recent leaps, Wang projected a horizon of three to five years for initial significant progress, with a broader wave of embodied intelligence transforming the field within a decade.

He imagines a future where humanoid robots freely navigate spaces, responding seamlessly to human commands-a defining moment signaling the arrival of truly intelligent, embodied machines.

Unitree founder disputes VLA consensus and backs video-trained robot models

Get in Touch

Get in touch

Email

Phone

Social media

Find us

Unitree founder disputes VLA consensus and backs video-trained robot models

Unitree Robotics: Rethinking AI and Embodied Models in Robotics

Beyond Hardware: A New Vision from Unitree’s Founder

Challenging the Status Quo: Critique of the VLA Framework

Reevaluating Data and Model Development Strategies

Innovating with Video-Driven Models

Unitree’s Public Presence vs. Practical Applications

The Road to Versatile, Multifunctional Robots

When Will Robotics Experience Its “ChatGPT” Moment?

Related articles

Actuators Market by Actuation: Trends Growth Segmentation and Future Projections

AI in Computer Vision Market Trends in Robotics and Automation

The USC Professor Who Pioneered Socially Assistive Robotics

AI company deletes the 3 million OKCupid photos it used for facial recognition training

Recent articles

Actuators Market by Actuation: Trends Growth Segmentation and Future Projections

AI in Computer Vision Market Trends in Robotics and Automation

The USC Professor Who Pioneered Socially Assistive Robotics

AI company deletes the 3 million OKCupid photos it used for facial recognition training