When comparing GPUs for Stable Diffusion, the headline metric is iterations per second (it/s) — how many denoising steps the GPU completes each second. Higher is faster, but the number only becomes meaningful when you translate it into actual workflow time, and there's a bigger caveat: VRAM tier often matters more than raw it/s, because it decides whether you can run a model at all. This article explains the it/s math and why VRAM is the first thing to check.
It pairs with our Stable Diffusion local build guide and the broader MLPerf explainer, and leans on how much VRAM you need.
The it/s Math
An image is generated over a set number of denoising steps (iterations). If a GPU does 20 it/s and you run 30 steps, that image's denoising takes roughly 30 ÷ 20 = 1.5 seconds. Double the it/s and you halve that time. So it/s translates directly to per-image speed — useful when you generate many images and want to know your throughput. A faster card genuinely saves time across a big batch.
Why VRAM Often Matters More
Here's the caveat that overrides raw speed: VRAM determines what you can run at all. Modern models (SDXL, Flux) and higher resolutions, larger batches, and extra features (ControlNet, upscaling) all consume VRAM. If a model and its workflow don't fit in your card's VRAM, you hit slow fallbacks (offloading to system RAM) or simply can't run it — and at that point a high it/s on paper is irrelevant. A card with enough VRAM at moderate it/s beats a faster card that can't fit your model. This is why VRAM tier is the first thing to check, exactly as in our SD build guide.
How to Read SD Benchmarks
- Confirm the model fits first: check the card has enough VRAM for the models and workflows you'll run (SDXL/Flux at your resolution, with your extensions) before comparing speed.
- Then compare it/s for throughput: among cards that fit your workflow, higher it/s means faster batches — translate it to per-image seconds for your step count.
- Match the benchmark to your model: it/s for SD 1.5 differs from SDXL or Flux, so compare like-for-like on the model you actually use.
Frequently Asked Questions
What does iterations-per-second mean in Stable Diffusion? It's how many denoising steps the GPU completes per second. An image runs for a set number of steps, so per-image denoising time is roughly steps ÷ it/s — e.g. 30 steps at 20 it/s ≈ 1.5 seconds. Higher it/s means faster generation across a batch.
Why does VRAM matter more than it/s? Because VRAM decides whether you can run a model at all. SDXL, Flux, higher resolutions, larger batches, and extensions all consume VRAM; if they don't fit, you hit slow fallbacks or can't run them — making a high it/s irrelevant. A card with enough VRAM at moderate speed beats a fast card that can't fit your model.
How should I compare GPUs for Stable Diffusion? First confirm the card has enough VRAM for the models and workflows you'll run, then compare it/s among the cards that fit — translating it/s into per-image seconds for your step count. Always compare on the same model (SD 1.5 vs SDXL vs Flux differ).
The One Thing to Remember
Stable Diffusion it/s translates to per-image time (steps ÷ it/s), so higher it/s speeds up batches — but VRAM tier comes first, because it decides whether a model runs at all. A card with enough VRAM at moderate it/s beats a faster card that can't fit SDXL or Flux. Confirm the model fits your VRAM, then compare it/s like-for-like on the model you actually use.
Building for Stable Diffusion? Configure an AI workstation online → or talk to our team → and we'll prioritise the VRAM your models need, then the speed.