Summary

This survey provides a unified landscape of latent space as a computational substrate for language-based models, arguing that many critical internal processes are more naturally carried out in continuous latent space than in explicit token-level generation. The authors organize the field into five sequential perspectives: Foundation, Evolution, Mechanism (Architecture, Representation, Computation, Optimization), and Ability (Reasoning, Planning, Modeling, Perception, Memory, Collaboration, Embodiment). The work positions latent space not merely as an implementation detail but as a general computational paradigm for next-generation intelligence.

本文為一篇全面綜述,主張現代語言模型的核心計算過程在連續潛空間中進行比在明確的 token 序列中更為自然。文章從基礎、演化、機制(架構、表示、計算、優化)與能力(推理、規劃、建模、感知、記憶、協作、具身)五個維度整理現有研究,將潛在空間定位為下一代智能的通用計算範式。

Prerequisites

  • Explicit vs. latent space computation — 理解 token-level 自回歸生成的限制(語言冗餘、離散化瓶頸、順序低效、語意損失)是本文論點的出發點
  • Diffusion models and continuous representations — 本文大量討論連續空間的計算優勢,熟悉擴散模型的表示空間有助於理解其論述
  • Chain-of-thought vs. latent reasoning — 本文對比顯式推理鏈與潛在推理,理解 CoT 的缺陷是切入點

Core Idea

現代語言模型雖以 token 序列理解,但大量關鍵過程(推理、規劃、感知)在連續潛在空間中更有效率。顯式空間計算面臨四大結構性限制:語言冗餘(自然語言表達低效)、離散化瓶頸(連續思想強制離散化)、順序低效(必須逐 token 生成)、語意損失(高維概念在詞彙空間中壓縮失真)。本文主張「潛在空間作為原生計算基底」是當前模型架構演化的主要方向,而非例外現象。

Results

此為綜述論文,無具體實驗數字。組織貢獻為:

  • 提出統一的五維框架(Foundation / Evolution / Mechanism / Ability / Outlook)
  • 識別四條主要技術路線(Architecture, Representation, Computation, Optimization)
  • 映射七項能力域(Reasoning, Planning, Modeling, Perception, Memory, Collaboration, Embodiment)

Limitations

  • Unstated: 綜述涵蓋範圍極廣(涉及具身 AI、多模態等),可能犧牲深度;作者群龐大(50+ 人)可能導致觀點不一致;「潛在空間」定義邊界在文中可能模糊(視覺模型的 latent space 與語言模型的差異在摘要中已警示但執行難度大)

Reproducibility

  • Code: N/A(綜述論文)
  • Datasets: N/A
  • Compute: N/A

Insights

  1. 具身 AI 的潛在空間:本文將 Embodiment 納入能力域,表明 latent space 研究正向機器人、物理互動延伸,與當前 VLA 模型的發展方向高度吻合
  2. 從「詞語思考」到「概念思考」的典範轉移:本文的核心立場與 Karpathy 的 LLM Wiki 願景相呼應——語言只是接口,推理在更高維的連續空間中進行
  3. 時機:2026 年 4 月發表的綜述恰逢 latent reasoning(如 OpenAI o-series、DeepSeek R1)爆發期,提供了系統性整理的視角

Connections

Raw Excerpt

“Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-readable verbal traces.”