Home Uncategorized More than a hardware play: Robot Era’s founder sets the record straight...

More than a hardware play: Robot Era’s founder sets the record straight on the company’s true ambitions

0

Chen Jianyu, assistant professor at Tsinghua University’s Institute for Interdisciplinary Information Sciences, founded Robot Era in August 2023. On July 7, the company revealed that it had raised RMB 500,000,000 (USD 70,000,000) in a Series-A funding round led by CDH VGC. Houxue Capital and Meridian Capital joined the round, as did Xianghe Capital and Fore. Crystal Stream, Tsinghua, and earlier investors Tsinghua and Crystal Stream also participated.

Robot Era, despite being less than two-years-old, has already released a number of hardware products, including dexterous robots, wheeled platforms and humanoid robotics. Some have mistakenly classified the company as a robotics hardware manufacturer. Chen said that some people mistakenly think the company is a maker of dexterous arms. But that’s certainly not what Chen envisions for Robot Era.

Chen set his goal nearly a decade after he first saw AlphaGo to build an intelligent robot that could be used for a variety of purposes. It’s important to build a robot that is not only a mechanical form but also has a “brain”, which can adapt to different real-world environments.

“Building the brain and body may seem difficult, but for me it’s a very natural choice, because I can do both,” Chen stated.

Chen’s interdisciplinary training is unique among founders of the embodied-intelligence space. His academic background spans both intelligent control and physical systems. In 2011, he joined Tsinghua University’s Department of Precision Instruments. This department is a pioneer of bipedal humanoid robotic research in China. He then focused on model predictive control (MPC), end-to-end learning, and reinforcement learning while pursuing a PhD at the University of California in Berkeley. These are two fundamental elements of robotic intelligence today.

Chen’s algorithmic work is probably more well-known than his hardware designs. He developed DWL, an advanced learning framework for humanoid robotics that was nominated by the Robotics: Science and Systems conference. His team also presented VPP, a embodied AI built on generative models of the world, which was highlighted at ICML, the International Conference on Machine Learning.

In the three-hour interview 36Krabout half of the time was spent discussing algorithm and what Chen calls “brains.” His focus is building a complete hardware-software stack.

On the software side, Robot Era has developed ERA-42, a vision-language-action (VLA) model that fuses perception with generative capabilities. This allows robots the ability to interpret their environment and anticipate events in real-time.

The company is developing modular platforms which can be reconfigured into bipedal or wheeled forms, depending on the task. Robot Era, because robotics supply chains have not yet developed, is designing and manufacturing its own foundational parts: joint modules and control units, motors and reducers. This two-pronged approach explains the company’s rapid hardware development. Robot Era has three commercial products on the market: the XHand 1 (also known as the Q5), the STAR1, and the STAR1.

Chen frequently refers to a favorite concept: “laying the eggs along the road.” The idea is to release each component as a separate product. This helps to recoup costs and reduces financial pressure. It also generates real-world information that can be fed back into the research loop of the company.

By June, Robot Era delivered more than 200 units. Hundreds more are in production. Nine of the top ten tech companies in the world, including Haier Smart Home and Lenovo, are among its clients. The following transcript was edited and consolidated to ensure clarity and brevity.

36Kr – Given that your academic background includes both robotics and AI, did you consider focusing on one area when you launched Robot Era? Or was it really never a question?

Chen Jianyu: There was never a real question. I made two early judgments:

  • Do robots need both a brain and a body? Absolutely. A robot without a mind is just scrap metal. A robot without a brain is not a robot. We need both to commercialize properly.
  • Are we able to do both? Yes. It’s a natural choice for me because I’ve done it both.

I began my own journey with hardware and mechatronics. In grad-school, I moved to systems integration and control. I’ve worked on AI for robots since nearly a decade. It all started around the AlphaGo era in 2016 or 2017.

How did the rise in large AI models by 2022 change your direction?

CJ: We’ve gone through several phases.

During the first phase of ChatGPT’s launch in 2023, it was used as if it was a robot. It had to plan how a robot would use its sensors, identify targets, and perform steps. It worked surprisingly. We published the first paper in the world on integrating large-scale language models (LLMs), with humanoid robotics.

Then, we addressed issues of alignment between high-level language plans and low-level reinforcement-learning strategies. The second phase was based on Google’s work and focused on VLA models. We were the first in China who replicated RT-2. We recognized its limitations and developed our own solution. A two-system VLA architecture that combines “slow” cognitive control with a “fast action system” for fine-motor coordination.

We published the PAD Framework in September 2024 and introduced the idea of world models fusion. Later that same year, we released VPP architecture which merged pretrained video prediction models and PAD. Both were accepted at ICML.

We introduced UP-VLA in January of this year. It is a unified learning model that integrates policy learning, understanding, and prediction. It can predict future frames, and generate precise joint-level robot action. It’s like a brain that is always anticipating what’s next.

36Kr: It sounds like you have the software foundations down pat. What’s missing?

CJ: Data. LLMs benefited from pre-existing corpora. Robots don’t. Waymo released a massive driving dataset recently, but that’s still a small fraction of what’s required for robot training.

Robotics lacks the infrastructure that autonomous driving has, which generates massive amounts of real-world data. We’d have to collect data over thousands of years in order to match ChatGPT corpora.

36Kr>: Do you use teleoperation to create training data?

CJ: We use a hybrid method. We begin with large video datasets for pretraining generalist models and then fine-tune teleoperation data of high quality. We can avoid costly real-world interactions by relying on teleoperation data.

What type of data is useful?

36Kr: Variety is key. If you only train with clean, ideal data, then your model will not be able to handle messy or hazardous situations.

When pouring water you can’t always use the same cup and the same position. Water behaves differently when poured from different cups, angles and backgrounds. Multidimensional variation is needed to train models which generalize.

How important is that a robot look human?

CJ: Very. Humanoids are a powerful foundation that you can use to scale down to other forms. Many components can be reused in different configurations.

We’re not building humanoid robotics as an end in itself, but rather as a tool. The human form allows us to leverage vast video data. It is in line with our method, which is direct learning from large-scale datasets of human behavior.

: Some founders claim that the brain is not important as long as your body is good. What do you think about this?

You need a body to train AI first, followed by data collection and then learning. This is slower than building the body.

As a startup, however, we believe that monetizing is a process. Our dexterous hands products are already profitable. Unit economics will improve when we scale up production. We’re also commercializing our machines and, eventually, our platforms and models.

36Kr> Could a company that only uses its brains outperform you?

CJ: I doubt it. There is no resource inflow without a commercialization cycle. You’ll need to integrate data pipelines into each platform if you train on more than one third-party platform. This makes it difficult to scale.

36Kr: The VLA Framework is hot but critics say that it’s fragmented, and data-limited. What do you think?

Right Now, the “L” is too dominant. The models are initially trained as language models and then extended to vision and then to action.

But the evolution didn’t happen that way. Control was first, followed by vision, and then language. Even monkeys are able to perform dexterous actions without speaking.

Now we’ve reversed it. Many robotic tasks do not require language. You want a robot who can act and not talk. We are now investigating joint pretraining for language, vision and action.

36Kr: Some companies layer tasks or functions to structure robot brains. Do you agree with the method?

People split models horizontally or vertically by task type. The problem with segmentation is that it prevents synergy. Even if you create 1,000 task-specific model, nothing emerges.

Our goal is unification. When fine-tuned to a vertical task our generalist model outperforms small specialized models and learns quicker.

What is the role of reinforcement-learning in all of this?

CJ : Today, most VLA models do not use reinforcement learning. They are taught offline by watching others.

You can learn ping-pong by watching videos, and then having a coach demonstrate a few moves. You might still suck.

Reinforcement Learning means watching, learning and then practicing in real time. This is especially true for tasks that are physically grounded. You can’t master the complexity without it.

36Kr: You said earlier that there is a disconnect between what you are actually building and how others perceive Robot Era. Do you still think that way?

Chen Jianyu: Absolutely. People haven’t seen everything. We’ve created a system that is pretty comprehensive. It’s universal. Some people haven’t figured it out, or perhaps we haven’t communicated clearly enough.

What do you mean when you say “system”? CJ: Hardware and software.

Think about the hardware as Lego blocks. We have developed all the small components in-house, including joint modules, motors and gearboxes. We’ve made everything modular and compatible.

Our robotic hand, for example, is a modular product that can be used with different robot types. Even the joints can easily be removed and reconfigured to create a completely different robot.

We also have software that is general-purpose. It can adapt quickly and easily to different tasks and forms. One stack for all.

36Kr> Is it easy to expand the robot to other forms once that base is in place?

Very simple. We can build any robot shape you can imagine using the same foundational parts.

36Kr: It sounds like you’re modularizing your entire robot. How would you define the product form?

In the future, the humanoid robot is likely to be the most popular form. We build different shapes because the needs vary depending on the scenario.

When a task involves stairs, we use bipedal robotics. Wheeled robots are better suited to flat terrain. In a 3C plant, where you are replacing a single station, you may only need a torso.

What is your current shipment volume?

36Kr: More than 200 units. Our customer base is diverse. Nine of the top ten tech companies in the world are our customers. Some companies buy dozens of them at a go and use them immediately.

How do you choose which scenarios to target?

There are two filters: high value, and reusability.

Jobs with high wages for human workers are considered to be jobs of high value. We are looking for high-value tasks that can be performed by robots. Currently, we are focusing on two types of robots: industrial and service.

The industrial robot is a combination of mobility, strength and intelligence. Our service robot is smaller, and more focused on aesthetics, and interaction. This is important for service-related industries.

How smart are your robots currently? CJ: There are two levels of robots: demo-grade and product-grade.

Demo bots are capable of performing tasks such as drilling screws, scanning barcodes and scooping up water with high success rates.

The requirements for robots that are used in manufacturing are more stringent. In logistic settings, they are now able to locate labels, scan code and sort items with high success rates. They are now being used in real-world environments.

What’s the next exciting scenario, beyond logistics?

CJ: Manufacturing. It requires more precision.

Logistics is mainly moving boxes, and is relatively straightforward. Manufacturing requires fine-motor skills, as well as the use of tools. Flipping parts, applying labeling, or using custom tooling are all examples of tasks that require fine-motor skills.

36Kr> Are most humanoid robot parts off the shelf or custom?

It’s too expensive to customize parts. We do internal design right down to motor level. All of our motors, gears and control boards are self-designed. We own the blueprints.

35Kr: What jobs in factory settings can’t yet be replaced by robots? What if they can all be replaced one day?

In principle, any repetitive task can be automated, but it is still difficult to do right now.

They will eventually be replaced and this will lead to a massive social change. But I think it’s a positive thing. Robots can do the dirty, dangerous and boring jobs that young people don’t want.

This allows people to do more meaningful jobs while improving efficiency. Everything becomes cheaper.

Robots will become a brand new type of consumer endpoint. The scale is somewhere between smartphones and cars. In five years’ time, families may own one or two robots as companions or for service.

What does the latest funding round mean for you?

Chen Jianyu : The future is about preparing. Even though commercialization is just beginning, the competition will be fierce.

The funding for robotics today is small compared to the electric vehicles and large models. We’ll eventually need manufacturing and AI on a larger scale. We’re still in the early stages.

How long before robots become commonplace in the home?

CJ: Gradually. In three to five year’s time, you will probably see early forms of the product in high-net worth households.

The requirements for the mass market will be more stringent. We’ll need to find something that is affordable and highly generalizable.

36Kr: Haier, Midea and others have been talking about robotics for many years. How do you see large corporations in the robotics space?

There’s always a balance between competition and cooperation. We can be a supplier of hardware to internet giants. We could provide software to traditional manufacturers.

36Kr: Every automaker has an autonomous driving group. Many are now looking at end to end architectures. Will the automotive industry pivot towards robotics?

Robotics are a natural extension to smart vehicles, but some will not make the leap.

The big companies are cautious when it comes to new bets. Their investments are currently small and roughly equal to ours when it comes to personnel.

LLMs are converged around just a few key players. Will robotics follow a similar pattern?

CJ: No. Robotics is fragmented.

LLMs can be used by everyone. Release one and everyone will be able to use it instantly. It centralizes the power.

The robots are hardware. They are local, messy, and diverse. This makes the market more open and forgiving. There will be a lot of players.

KrASIA Connection includes translated and adapted material that was originally published 36Kr. This article for 36Kr was written by Qiu Zhiaofen.

www.roboticsobserver.com

Exit mobile version