本文由 AI 分析生成
建立時間: 2024-12-17
Summary
EN: PromptWizard is Microsoft’s automated prompt optimization system that uses a two-stage feedback loop: first refining the instruction itself, then jointly optimizing the instruction with few-shot examples. It dramatically reduces the number of LLM API calls needed compared to prior approaches (69 vs 18,600 for PromptBreeder) while achieving competitive or better performance on MMLU, GSM8k, and other benchmarks.
ZH: PromptWizard 是微軟推出的自動化提示優化系統,採用兩階段反饋迴路:先優化指令本身,再聯合優化指令與少樣本示例。相比 PromptBreeder 的 18,600 次 API 呼叫,它僅需 69 次,同時在 MMLU、GSM8k 等基準測試上達到相當或更優的表現。
Key Points
- Two-stage optimization: Stage 1 refines the instruction text using self-critique; Stage 2 jointly optimizes instruction + few-shot examples
- Efficiency: ~69 API calls vs 18,600 for PromptBreeder — a 270x reduction
- Performance: Competitive on MMLU, GSM8k, and additional reasoning benchmarks
- Uses the target LLM itself as the optimizer (self-evolution via feedback)
- No gradient access required — pure black-box optimization via generation + evaluation
- Microsoft Research project; designed for practical deployment at scale
Insights
- The dramatic reduction in API calls makes automated prompt optimization economically viable — 18,600 calls at GPT-4 prices is prohibitive; 69 is reasonable
- Self-evolution (using the model to critique and improve its own prompts) is powerful but carries the risk of optimizing for the model’s own blind spots
- The joint optimization of instruction + examples is the key insight — treating them as separate levers misses their interaction effects
Connections
- Connects to Claude prompt library: manual techniques PromptWizard automates
- Relates to DSPy (same vault): DSPy also uses automated optimization of LLM pipelines, but via gradient-like methods on compiled programs
- The efficiency angle echoes SkillsBench: focused, well-crafted context dramatically outperforms exhaustive but unfocused approaches
Raw Excerpt
“PromptWizard uses just 69 LLM API calls to converge on optimized prompts, compared to 18,600 for PromptBreeder — making automated prompt optimization practical rather than academic. The key is treating instruction and examples as a joint optimization target, not independent variables.”