本文由 AI 分析生成
建立時間: 2026-03-26 來源: https://x.com/itsolelehmann/status/2033919415771713715
Summary
Applying Karpathy’s “autoresearch” method — iterative AI-driven self-improvement via small changes tested against a scoring checklist — to Claude skill prompts. One landing page copy skill went from 56% to 92% pass rate with zero manual work. The method: define what “good” looks like as a yes/no checklist, then let an agent loop of test → score → keep/revert improvements until quality converges.
將 Karpathy 的「autoresearch」方法(透過小幅度修改並對照評分清單測試的 AI 自我改進迴圈)應用於 Claude 技能提示。一個登陸頁文案技能從 56% 提升至 92% 合格率,且無需任何手動操作。方法:以是/否清單定義「好」的標準,讓 agent 迴圈(測試→評分→保留/回退)直到品質收斂。
Key Points
- Karpathy’s autoresearch: instead of manually improving, let AI agent improve in a loop
- Each iteration: try one small change → test → score → keep if better, revert if worse
- The only human input: a yes/no checklist defining what “good output” means
- Applied to Claude skills (prompt files): agent tests the skill, scores output, modifies the skill prompt
- Result: landing page skill 56% → 92% pass rate, zero manual work
- Works on “anything you can measure and improve”
Insights
This is essentially gradient descent for prompts — not gradient-based, but structurally similar: small perturbations evaluated against a loss function (the checklist), keeping steps that minimize loss. The critical insight is that the quality scoring definition is the only hard intellectual work; the iteration is fully automatable. This could be applied to any system prompt, any eval suite, or any workflow where “good” can be defined as a set of observable criteria.
Connections
Raw Excerpt
My landing page copy skill went from passing its quality checks 56% of the time to 92%. With zero manual work at all. The agent just kept testing and tightening the prompt on its own.