PromptFrenzy Benchmark Report

The State of AI Image Models, Mid-2026

We run the same prompts through every major AI image model and publish the results side by side. One number jumped out this quarter — and it's about the model you'd least expect.

Data as of 25 June 2026.

The model everyone reaches for is the slowest one by far

OpenAI's GPT Image 2 takes ~68 seconds to produce an image — almost 5× longer than Google's Nano Banana (14s) or xAI's Grok (13s), measured across 15 identical benchmark prompts each, from request to finished image.

For the most-recognised name in AI image generation, that's a striking gap. If you're generating at any volume, the default choice is the one that keeps you waiting.

Time-to-image, by model

ModelAvg / imageSamples
Flux Kontext Proprelim.7.0s1
Grok Imagine Pro13.3s13
Nano Banana 2 (Gemini)14.2s15
Qwen Image 2.0 Proprelim.17.0s2
Nano Banana Proprelim.25.1s1
FLUX.2 Maxprelim.33.8s1
Seedream 5 Liteprelim.44.2s1
ChatGPT (GPT Image 2)68.5s15

Bold rows have a robust sample (13–15 runs). “Prelim.” rows are early reads we're still deepening — but the gap between the well-sampled models is clear and consistent. Method: wall-clock time from request to finished image through PromptFrenzy's pipeline, identical prompts, one shot per run — what a user actually waits, not a vendor lab figure.

The field is moving faster than the models render

Speed matters more this year because the roster keeps growing. Of the AI image and video models we track with a known launch date, a third shipped in the first half of 2026 alone — Google, OpenAI, xAI, Black Forest Labs and ByteDance all pushed new models in the last six months.

See the full release timeline →

Same prompt, every model

The most useful thing we publish isn't a number — it's the side-by-side. We take one brief and run it through all eight models, one shot each, no cherry-picking. Across 17 benchmark prompts and 87 generations, the differences in interpretation, text rendering, and style are stark, and rarely match each model's reputation.

See the head-to-heads →

What it means

  • Speed is a real differentiator, not a footnote. A ~5× gap is the difference between an interactive tool and a “kick it off and come back later” one.
  • Newer isn’t automatically faster. Some of the latest releases sit mid-pack; the fastest well-sampled models aren’t the newest.
  • There is no single “best.” The right model depends on the task — which is the whole reason to compare on your own prompt rather than trust a leaderboard.

Methodology & how to cite

Figures come from PromptFrenzy's benchmark pipeline: identical prompts run through each model, one generation per run, latency measured end-to-end through our pipeline (so speed reflects models as served through our pipeline, not vendor-direct lab numbers). Sample sizes are shown; “prelim.” rows are being deepened.

Please cite as Source: PromptFrenzy — promptfrenzy.com. For a custom comparison, the underlying data, or a quote, contact maria@promptfrenzy.com. More for journalists on our press & data page.