Summary

Anthropic engineers share their best practices for designing tools that AI agents can use effectively via MCP (Model Context Protocol). The core shift is treating tools as contracts between deterministic systems and non-deterministic agents — requiring a different design mindset than traditional APIs. The article covers a workflow of prototyping, evaluation, and iterative improvement using agents themselves to generate evaluation tasks and analyze results.

Anthropic 工程師分享為 AI 代理設計有效工具的最佳實踐。核心轉變是將工具視為確定性系統與非確定性代理之間的契約,需要不同於傳統 API 的設計思維。文章涵蓋原型設計、評估和迭代改進的工作流程,並使用代理自身生成評估任務和分析結果。

Key Points

  • Tools are contracts between deterministic systems and non-deterministic agents — agents may call, skip, or misuse tools unpredictably
  • Start with a quick prototype, wrap it in a local MCP server, and test manually before running formal evaluations
  • Generate evaluation tasks using agents: prompt-response pairs grounded in realistic data, avoiding sandbox oversimplification
  • Strong eval tasks require multiple tool calls (potentially dozens); weak tasks are single-step or trivial
  • Avoid overly strict verifiers that reject correct responses due to formatting differences
  • Tools ergonomic for agents also tend to be intuitive for humans

Insights

The observation that “tools most ergonomic for agents also tend to be surprisingly intuitive for humans” suggests that agent-friendliness and human-friendliness converge — good tool design is about clarity of purpose and interface, regardless of who (or what) is using the tool. The use of agents to generate their own evaluation datasets is a self-referential but practical approach to bootstrapping evaluation at scale.

Connections

Raw Excerpt

Tools are a new kind of software which reflects a contract between deterministic systems and non-deterministic agents… This means fundamentally rethinking our approach when writing software for agents.