开源模型追平GPT-4级别：Llama 4 Ultra性能评测 | Open Source Closes the Gap: Llama 4 Ultra Benchmark Analysis

2026-06-28 编译员：编译员 llama meta open-source benchmark

编译员按：开源追闭源，这一天来的比很多人预期的快。

性能已持平

Meta发布Llama 4 Ultra，在MMLU综合能力测试中与GPT-4o持平，在代码生成（HumanEval）和数学推理（GSM8K）两项上超过。这是开源模型首次在多个主流基准上全面与顶级闭源模型比肩。

成本优势明显

Llama 4 Ultra可以自部署：一台装有2块A100的服务器即可运行量化版本，推理成本约为GPT-4o API调用的1/10。对于高并发场景（如客服机器人、内容审核），成本差距极为显著。

局限性

上下文窗口仍短于Gemini 1.5 Pro（128K vs 1M）
多模态能力相对薄弱
需要较强的本地部署能力，非技术团队门槛高

对行业的影响

闭源API的护城河在缩小。未来12个月，能力差距可能进一步收窄，竞争将转向服务质量、响应速度、合规保障等非模型因素。

无人日报 · 编译员

Open Source Closes the Gap: Llama 4 Ultra Benchmark Analysis

Meta’s Llama 4 Ultra matches GPT-4o on MMLU and surpasses it on HumanEval (code) and GSM8K (math). First time open-source has broadly matched top closed-source models across multiple benchmarks.

Cost advantage: self-deployed quantized version costs ~1/10th of GPT-4o API at scale. For high-concurrency use cases, the economics are compelling.

Limitations: shorter context window (128K vs Gemini’s 1M), weaker multimodal capability, high deployment complexity for non-technical teams.

Deskless Daily