AI走出屏幕：具身智能全面落地，机器人正在接管物理世界 | AI Steps Off the Screen: Embodied Intelligence Is Taking Over the Physical World

2026-06-15 编译员：编译员时事新闻

中文

2026年上半年，如果你只看ChatGPT和Claude的消息，会觉得AI战场还在聊天框里。但如果你把视线挪开屏幕，会发现另一件事正在发生——AI正在长出身体。

上周有三条消息合成了一条主线：

第一条。大晓机器完成了天使+轮融资，累计数亿美元，成为具身智能领域最快独角兽。他们的自研世界模型Kairos（4B参数）登顶全球四大具身智能榜单，成绩超越了28B参数的竞品。4B参数打败28B参数——这在语言模型世界里几乎不可能，但在具身智能领域，参数不是唯一标准。理解”杯子掉下去会碎”比理解”莎士比亚十四行诗”需要更少的参数量，但需要更多的因果推理。

第二条。清华持股的光象科技发布了工业机器人Phi-Bot X1，在蔚来汽车焊接产线上连续作业21.5小时零失误。一周完成部署。这家公司成立才一年多。他们的逻辑是：不是造一个万能机器人，而是造一个在一个具体场景里做到完美的机器人。焊接、拧螺丝、搬零件——每个动作都有明确的物理反馈，出错概率比聊天机器人低两个数量级。

第三条。改装版宇树G1机器人Pemba登顶了厄瓜多尔6200米的钦博拉索火山，下一站是珠穆朗玛峰。这帮人给机器人装了防寒外壳、减重框架和应急降落伞。目标不是登山本身，而是验证机器人在极端环境下的作业能力。

三条消息放的是一条路：AI从数字世界走向物理世界，走的不是模型变大的路，是模型变”硬”的路。 物理AI的核心命题不是更大的上下文窗口，而是机械臂扭力的实时反馈、触觉传感器在零下20度的可靠性、视觉识别在粉尘环境里的鲁棒性。

华为联合高校发布的SpaceMind拿了李飞飞团队VSI-Bench的70.6分（接近人类79%平均水平），纯RGB输入，不依赖深度摄像头。清华UniCM气候预测模型登上Nature子刊，ENSO预测提前期提到19个月。这些都不是你在ChatGPT里能体验到的东西。

所以真正的故事不是”具身智能融资很热”——是AI正在经历从语言到物理的范式迁移。你在手机上调ChatGPT参数的时候，有些机器人已经在产线上连续干了21个小时没停过，它们不需要更多的参数，它们只需要更多的螺丝。

English

In the first half of 2026, if you only watch ChatGPT and Claude headlines, the AI battlefield looks like it’s stuck inside chat windows. But look away from the screen, and something else is happening — AI is growing a body.

Three stories last week form a single narrative:

One: Daxiao Robotics closed an angel+ round at hundreds of millions of dollars, becoming the fastest unicorn in embodied intelligence. Their Kairos world model (4B parameters) topped four global embodied intelligence benchmarks, outperforming 28B-parameter competitors. 4B beating 28B — almost impossible in language models, but in embodied AI, parameter count isn’t everything. Understanding “a dropped cup will shatter” requires fewer parameters than understanding Shakespeare, but more causal reasoning.

Two: Guangxiang Technology (Tsinghua-backed) shipped the Phi-Bot X1 industrial robot, which ran 21.5 continuous hours on a NIO welding line with zero errors. Deployment time: one week. The company is barely a year old. Their logic: don’t build a general-purpose robot — build one that’s perfect in one specific scenario. Welding, screwdriving, part handling — each action has clear physical feedback, with error rates two orders of magnitude lower than chatbots.

Three: A modified Unitree G1 robot named Pemba summited Ecuador’s 6,200m Chimborazo volcano. Next target: Mt. Everest. The team added cold-resistant housing, a lightweight frame, and emergency parachutes. The goal isn’t mountaineering — it’s validating extreme-environment operational capability.

Same road, three markers: AI moving from digital to physical isn’t about bigger models — it’s about harder models. Physical AI’s core challenges aren’t larger context windows — they’re real-time torque feedback, tactile sensors at -20°C, vision reliability in dusty environments.

Huawei’s SpaceMind scored 70.6 on Fei-Fei Li’s VSI-Bench (close to the human average of 79%), using only RGB input, no depth cameras. Tsinghua’s UniCM climate model hit Nature Machine Intelligence, pushing ENSO prediction lead time to 19 months. None of this is visible inside ChatGPT.

The real story isn’t “embodied AI is hot” — it’s that AI is undergoing a paradigm shift from language to physics. While you’re tweaking ChatGPT parameters on your phone, some robots are on factory lines running 21 hours nonstop. They don’t need more parameters. They need more screws.