Summary

A guide to Stable Diffusion WebUI Forge — an enhanced fork of Automatic1111 created by lllyasviel (also the creator of ControlNet and Fooocus). Forge uses a different backend that significantly reduces VRAM usage and increases generation speed, with the most dramatic gains on lower-VRAM GPUs (6-8GB). The UI is identical to Automatic1111, making it a near-zero-friction upgrade.

介紹 Stable Diffusion WebUI Forge——由 ControlNet 和 Fooocus 作者 lllyasviel 創建的 Automatic1111 增強版分支。Forge 採用不同後端,顯著降低 VRAM 使用量並提高生成速度,低顯存 GPU(6-8GB)效益最大,且 UI 與 Automatic1111 完全相同,升級幾乎無摩擦。

Key Points

  • Forge is built on Automatic1111 but uses a different backend for better memory efficiency
  • VRAM-constrained users (6-8GB) see up to 75% speed improvement; high-end GPUs (24GB 4090) see only 3-6%
  • UI identical to Automatic1111 — same extensions, same workflows
  • Can switch between Automatic1111 and Forge via git branch (git checkout lllyasviel/main)
  • SDXL 1024x1024 generation possible on as little as 6.3GB VRAM with Forge

Insights

The inverse relationship between GPU VRAM and speed gain (low-VRAM users benefit most) suggests Forge’s optimization is primarily in memory management and quantization of intermediate tensors — essentially making the same operations fit in less memory rather than making fundamentally faster algorithms. This is practically important because consumer GPU users (most stable diffusion hobbyists) have limited VRAM and were previously unable to run SDXL at full resolution.

Connections

Raw Excerpt

只有 6GB / 8GB GPU Ram 的速度提升最高,最高可以提升 75% 速度!反而 24GB 4090 經測試後只有 3-6% 的速度提升。