Summary

Anthropic’s canonical guide to building effective LLM agents, distilled from working with dozens of production teams. The central argument: complexity should be justified by measurable improvement. Most tasks need an augmented single LLM call or a simple workflow — true autonomous agents are warranted only when the problem is open-ended and the step count can’t be predicted. Five composable workflow patterns and three core principles are described.

Anthropic 從大量生產環境案例提煉的 Agent 建構指南。核心論點:複雜度需以可量化的效果提升為前提。定義五種工作流模式與三項核心原則,建議先從最簡方案開始。

Key Points

  • Workflows vs agents: workflows have fixed, predictable steps (more controllable); agents dynamically decide steps (more flexible, higher risk)
  • Five workflow patterns: (1) Prompt chaining — sequential LLM calls; (2) Routing — classify input, dispatch to specialist; (3) Parallelization — split subtasks, aggregate results; (4) Orchestrator-workers — central LLM dynamically assigns subtasks; (5) Evaluator-optimizer — generator + critic in a loop
  • When to use agents: open-ended tasks where step count can’t be predicted and model judgment is trusted; sandbox extensively before deploying
  • Three principles: maintain simplicity; show the work explicitly (transparency); have human oversight at checkpoints
  • Tool design matters: invest as much in “agent-computer interfaces” as in human-computer interfaces — tool schemas need the same prompt engineering as system prompts
  • Frameworks caveat: use LLM APIs directly when possible; framework abstractions often obscure prompts/responses and encourage premature complexity
  • Real-world validation: SWE-bench Verified — Claude agents can solve real GitHub issues from PR description alone, but human review remains essential

Insights

The “optimize single LLM calls first” advice contradicts the instinct to build multi-step systems. The insight is that retrieval + in-context examples often matches agentic performance on well-scoped tasks, at a fraction of the latency and cost.

The absolute filepaths example from SWE-bench is a vivid illustration of ACI (agent-computer interface) engineering: the model made mistakes with relative paths after changing directories, so the tool schema was changed to require absolute paths — eliminating an entire class of agent errors through interface design rather than prompting.

The evaluator-optimizer pattern is essentially LLM-based test-driven development: generate code, run tests, feed failures back to the generator. The “two signs of good fit” heuristic (LLM responses improve with feedback, and LLM can provide that feedback) is a practical test for whether the loop adds value.

Connections

Raw Excerpt

The most successful implementations weren’t using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns. You should consider adding complexity only when it demonstrably improves outcomes.