Summary

Anthropic Engineering’s framing of “context engineering” as the natural successor to prompt engineering — from crafting words in prompts to managing the entire token state available to an LLM during inference. Covers context rot, attention budgets, system prompt altitude, tool design, just-in-time context retrieval, and message history pruning strategies.

Anthropic 工程部門將「上下文工程」定位為提示工程的自然繼承者——從在提示中精心組織語言,到管理 LLM 推理期間可用的整個 token 狀態。涵蓋上下文腐化、注意力預算、系統提示高度、工具設計、即時上下文檢索和消息歷史剪枝策略。

Key Points

  • Context engineering: optimizing the full token state (system prompt + tools + examples + message history + retrieved data) for desired agent behavior — broader than prompt engineering
  • Context rot: as context grows, LLM recall and reasoning quality degrade; treat context as a finite resource with diminishing marginal returns
  • Attention budget: transformer O(n²) attention over n tokens; more tokens = more competition for each relationship; models are “stretched thin” at long contexts
  • System prompt altitude: Goldilocks zone — not so specific that it’s brittle, not so vague that it provides no signal; organize with XML/Markdown sections
  • Tools: minimal viable set; tools should be self-contained and have non-overlapping functionality; if a human can’t decide which tool to use, neither can an LLM
  • Just-in-time context: agents maintain lightweight identifiers (file paths, query templates) and dynamically load data at runtime rather than front-loading everything
  • Message history: summarize completed subtask steps rather than retaining raw tool call/response chains; strip boilerplate; prune early successful steps

Insights

The “context rot” concept is the key empirical finding this article builds on. The mechanism (n² attention relationships spread thin) explains why LLM performance degrades with context length in a principled way — it’s not a hard cliff but a gradient. The “just-in-time context” framing generalizes Claude Code’s approach (targeted queries rather than loading entire codebases) to agentic systems broadly: agents should act like humans navigating a file system, not like LLMs trying to memorize a library. The “smallest possible set of high-signal tokens” principle is the operational formula for context engineering.

Connections

Raw Excerpt

Good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome. Context, therefore, must be treated as a finite resource with diminishing marginal returns.