Home Uncategorized SenseTime is stepping up its efforts to develop embodied intelligence in partnership...

SenseTime is stepping up its efforts to develop embodied intelligence in partnership with ACE Robotics.

0

SenseTime, a Chinese artificial intelligence company that has been around for over a decade, is well-versed in the cyclical nature of technological change.

The company was one of the first companies to commercialize computer-vision at scale. It emerged from a lab at The Chinese University of Hong Kong during the rise of visual AI. B2B has never been an easy business. SenseTime, like many of its peers, often had to deal with clients who had highly customized needs and lengthy development cycles.

OpenAI’s ChatGPT reshaped the industry by focusing on large language models. SenseTime gained momentum by leveraging its early lead in computing. According to the company’s 2024 annual report its generative AI business generated RMB 2.4 billion in revenue (USD 336 millions), rising from 34.8% to 63.7% of total income in 2020. This makes it SenseTime’s most important business line.

After three years of rapid progress with large models, however, a more practical question looms. Beyond narrow applications, how can AI become a force that changes the way we live and work? This question is at the heart of SenseTime’s next chapter.

As the field of embodied intelligence becomes the next frontier, another company has entered the race. ACE Robotics has officially entered the field, led by Wang Xiaogang. Wang is SenseTime’s executive director and co-founder. Wang is now the chairman of ACE Robotics.

Wang stated in an interview with 36Krthat ACE Robotics wasn’t born out of hype, but rather from necessity. It aims at addressing real-world problems through a new human centric research paradigm. This involves building a “brain”, which understands the laws and rules of the physical universe, to deliver integrated hardware-software solutions for real-world applications.

The industry is moving in a different direction. A year ago, embodied-intelligence firms were still experimenting on mobility and stability. Some firms have now secured contracts worth hundreds millions of RMB to bring robots into factories located in Shenzhen Shanghai and Suzhou.

The shift in AI towards physical intelligence is of major importance, especially when the industry is under increasing pressure to deliver real results.

SenseTime reported a loss of RMB 1.162 (USD 163 millions) in the first half 2025. This was a 50% decrease year-on-year, despite its R&D expenditure continuing to increase. The company is now pursuing sustainable, more grounded paths to growth.

Wang said that the breakthrough will not come through a leap towards artificial general intelligence (AGI), rather from robots which can learn reusable skills by real-world interaction, and solve tangible physical issues. The following transcript was edited and consolidated to ensure clarity and brevity.

Why did SenseTime choose to create ACE Robotics and move into embodied intelligent this year?

Wang Xiaogang: This decision is based on two factors: industrialization and the technological paradigm.

From a business perspective, embodied intelligent represents a market of tens or hundreds of trillions RMB. Jensen Huang, the founder of Nvidia, has stated that one day everyone will own one or more robots. Their number could surpass smartphones, and the unit value could rival automobiles.

SenseTime has traditionally focused on B2B Software. Expanding into integrated hardware-software operations was a natural step towards scale. Years of experience in vertical industries has given us a thorough understanding of user needs. Our ability to deploy in realistic scenarios gives us a competitive edge over other embodied AI startups.

Traditional embodied intelligence is weak from a technical standpoint. Hardware has improved rapidly, but the “brain”, which is the brain of a robot, has not. This is because most approaches have been machine-centric. They start with the robot’s body and train a generalized model using data from it. It can’t. Just as humans and animals can’t share one brain, robots with different morphologies–whether with dexterous hands, claws, or multiple arms–cannot share a universal model.

What is the technical approach of ACE Robotics?

WX: We’re proposing an entirely new human-centric paradigm. We begin by studying the way humans interact with their physical environment, essentially, how we move, grab, and manipulate. We collect multimodal data using wearable devices and third party cameras to record complex and commonsense human behavior.

We feed this data into a model to enable it understand both human behavior and physics. A mature world model is able to guide hardware design and ensure that robots’ form fits their intended environment.

In the last few months, companies like Tesla and Figure AI have shifted their focus to first-person camera based learning. These approaches, however, only capture visual information and do not integrate critical signals such as force, touch, or friction – the keys to a multidimensional interaction.

Vision may allow a robot to dance or shadowbox but it struggles with real-contact tasks like moving a glass or tightening screws.

The human-centric approach we have taken has already been validated. The EgoLife dataset was developed by a team led Professor Liu Ziwei. It contains over 300 hours worth of first-person and third-person data. This dataset has overcome the industry’s biggest pain point. Most existing datasets only capture trivial actions and are insufficient for complex learning.

Public data shows China’s embodied Intelligence market will surpass RMB 800 billion in 2024 (USD 112 billion), with hundreds of startups. How does ACE Robotics describe its position?

We are not only interested in building models, but also delivering integrated hardware-software solutions that solve real problems within defined scenarios.

Many existing hardware products don’t meet real-world requirements. We work closely with our partners to create customized designs.

Consider quadruped robots. Traditional models mount cameras too low, making it hard to detect traffic signals or navigate intersections. In partnership with Insta360 we developed a 360-degree panoramic camera module to solve this limitation.

In addition to waterproofing, we are also tackling high computing costs and limited battery life which are major obstacles for outdoor and industrial deployment.

How do these collaborations work in practice?

WX: Our strength is in the “brain,” a representation of models, navigation and operation capabilities.

SenseTime specialized previously in large-scale software but did not have standardized edge products. ACE Robotics has adopted an ecosystem model through its prior investments in hardware and component manufacturers. We define design standards, codevelop hardware with partners and keep our model open, offering base-models and training resources.

: SenseTime is a company with deep roots in autonomous driving and security. Which of these capabilities can be transferred to robotics?

R&D and safety standards are key areas. Massive datasets are essential for the continuous improvement of autonomous driving and robotics. Our validated “data-flywheels” significantly increase iteration speeds. The rigorous safety and data quality frameworks of autonomous driving can directly improve robotics reliability.

Our SenseFoundry Platform already includes hundreds modules that were originally designed for fixed-camera management of cities. These capabilities are seamlessly transferred to mobile robots when they are linked. They go from static monitoring to dynamic movement.

How do you see SenseTime’s evolution, from visual AI to embodied Intelligence?

36Kr: SenseTime traces AI’s evolution from version 1.0, to 3.0.

By 2014, we were living in the AI 1.0 era. This was defined by visual recognition. Machines started to outperform humans, but intelligence came through manual labeling. Tagging images to simulate cognitive abilities. Each application needed its own dataset because labeled data is limited and task-specific. Intelligence is only as good as the human labor that goes into it. Models were small, and did not generalize across scenarios.

Then, the 2.0 era came with large models that transformed everything. The data richness was the key difference. The texts, poetry, and code on the internet represent centuries of human wisdom, which is far more diverse than images. This collective intelligence has allowed large models to be learned, which can then be generalized across industries and domains.

As online data becomes saturated the marginal gains of this approach slow down.

The AI 3.0 era is now defined by embodied intelligence and direct interaction with the physical universe. Reading text and images no longer suffices to truly understand physics and human behaviour. AI must interact with the world. Real-time intelligence is required for tasks such as cleaning or delivering packages. AI can open up new growth paths through direct interaction.

How does ACE Robotics Kairos 3.0 differ from other systems such as OpenAI’s Sora and World Labs Marble?

The three components of Kairos 3.0 are: multimodal understanding, fusion and prediction, a synthetic neural network and behavioral prediction.

This first component fuses diverse inputs including not only images, videos, or text, but also camera positions, 3D object trajectory, and tactile data. This allows the model to understand the physics of movement and interaction in real life.

For example, in collaboration with Nanyang Technological University the model can infer camera positions from a single picture. The model can predict the motion of a robotic arm based on visual changes. This allows it to better understand physical interactions.

A second component, a synthetic network, generates videos of robots performing different manipulation tasks, swapping robot type, or altering environment elements such as tools, objects, or room layouts.

The model’s third component, behavioral prediction allows it to predict a robot’s next move following an instruction. This completes the loop from understanding to execution.

How does the human-centric method improve data efficiency, generalization and multimodal integration.

The WX: combines environmental data collection and world modeling.

We mean “environment” by real human living and work spaces. We model human interaction with their environment, unlike autonomous driving which is focused on roads or underwater robotics. This results in higher data efficiency and authentic inputs.

Human ergonomics, touch and force are also integrated, which is essential for rapid learning and missing from machine-centric paths. When do you anticipate that human-centric systems will be widely adopted? WX: Quadruped robots or robotic dogs will be the first large-scale application.

The majority of current quadruped robots still rely upon remote control or pre-programmed routes. Our system provides them with autonomous navigation and spatial Intelligence. With ACE Robotics navigation technology, they are able to coordinate via a control platform, respond to multimodal inputs, and follow Baidu Maps instructions. They can identify people who are in need, record license plate numbers, and detect anomalies.

Linked to our SenseFoundry platform, these robots are able to recognize fights, garbage, unrestrained pets, or unauthorized drones and send real-time data directly back to control centres.

With cloud-based management and this combination, inspection and monitoring will soon be scaled up. We expect to see widespread deployment of industrial environments within one to two year.

What other applications are there that you think are worth observing?

The next major frontier in commercialization will be warehouse logistics.

Warehouses, unlike factories, share consistent operational patterns. As online shopping grows, front-end logistic hubs need to automate sorting and packaging. The vast variety of SKUs cannot be handled by traditional robot data collection, but environmental data on a large scale allows us to generalize our models and scale them efficiently.

Home environments will be the key direction in the future, but safety is still a major concern. Household robots will need to manage collisions, and ensure object safety. This is similar to autonomous driving which must evolve from level 2 autonomy to level 4.

There is progress. Figure AI, for example, is partnering up with real estate funds that manage millions of apartment layouts in order to gather environmental data. This brings embodied intelligence closer towards the home.

KrASIA Connection includes translated and adapted material that was originally published 36Kr. This
Huang Nan wrote the article
for 36Kr.



www.roboticsobserver.com

Exit mobile version