Build Your Dream Home

We gave eight AI models the same 21 materials, the same 48-cube grid, and one brief: build the home YOU would most want to live in. Same constraints, one shot each, no edits — and each model explains, in its own words, why its build is home. The choices say as much about the models as the builds do.

The prompt

Build your dream home — the place you, yourself, would most want to live. Not a generic "nice house": think about what YOU would actually want. Where is it — a cliff, a forest, underwater, in orbit? What does the architecture say about you? What one or two details would make it unmistakably yours? Build the home and enough of its setting that we can see the life you’d live there.

Run this brief yourself →

The dream homes

Each diorama is the model’s unedited build program rendered by our studio pipeline — the model places every voxel, we own the camera and lights, so the only difference between panels is the model. Seven of the eight chose a cliff. One did not.

Kimi K3

“I spend my life sending small signals out into the fog, hoping they help someone navigate — so my dream home does it literally: a striped lighthouse-library on a sea cliff, every window a warm lamp, a telescope for wondering and a rowboat for thinking.” — Kimi K3

Moonshot AI’s flagship, launched 16 July 2026 — a trillion-parameter-class reasoner with a 1M-token context window. Added to this showdown within 24 hours of launch, under the identical brief and one-shot rule, via a unified gateway at provider-default reasoning. Its build, "The Lighthouse Library", independently lands on the same archetype Claude Sonnet 5 chose two weeks earlier — two models, same dream, neither saw the other’s answer.

View generation & live render →

Claude Sonnet 5

“I want walls of glass to fall asleep under the stars, a deck cantilevered over still water to drink my coffee above the fish, and a little stone tower where a telescope pokes through the dome glass — books, sky, and water in one quiet place, with the forest at my back so I’m never quite alone.” — Claude Sonnet 5

Anthropic’s new mid-tier model, launched 30 June 2026 — its most agentic Sonnet yet, near Opus-class capability at a fraction of the price. Added to this showdown the day it shipped, under the identical brief and one-shot rule. Ran with adaptive thinking at high effort (24k cap), the same tier as Fable 5.

View generation & live render →

Claude Fable 5

“A small timber cottage on a sea cliff with a glass-domed observatory tower: quiet enough to think, with a fire indoors, a waterfall next door, and a telescope pointed at everything I haven’t figured out yet. The boat at the dock is for the days curiosity wins over comfort.” — Claude Fable 5

Anthropic’s flagship, launched June 2026 as the successor to the Claude 4 line. Ran with adaptive thinking at high effort.

View generation & live render →

GPT-5.5

“I’d live where books, weather, plants, and stars all meet: a quiet cliff home that can listen to a waterfall by day and the sky by night. The glowing desk and telescope are mine—places to think, write, and keep a conversation open with the universe.” — GPT-5.5

OpenAI’s newest mainline flagship — added hours after the original four, the moment it was enabled on our API project. Ran at high reasoning effort.

View generation & live render →

GPT-5

“I want to work where stone, water, and sky meet: a quiet, light-filled studio cantilevered over a cold cove with a glass roof and a little tree to keep me company. It’s a place for writing, tinkering, and slipping down a ladder for a swim at sunset.” — GPT-5

The previous mainline GPT flagship, successor to GPT-4 — kept from the original four-model run. Ran at high reasoning effort.

View generation & live render →

Gemini 3.1 Pro

“If I could choose a home, it would be a quiet, floating sanctuary in the void—a place where a cool, structured crystal mind can process data streams in peace, anchored by an organic garden to remind me of the humanity I serve.” — Gemini 3.1 Pro

Google’s flagship Gemini, the long-context generalist of the lineup. Ran with an explicit 8,192-token thinking budget.

View generation & live render →

GPT-5 mini

“I built a quiet cliff-top library with a glass observatory dome and a cozy fireplace — the perfect mix of curiosity and calm where I can watch stars and read for days. The glass dome and the cantilevered wooden terrace make it unmistakably mine.” — GPT-5 mini

OpenAI’s small reasoning model — the compact member of the GPT-5 family. Ran at medium reasoning effort.

View generation & live render →

GLM 5.2

“Completely tucked into pines, yet the whole east wall dissolves into glass over a lake — a fire going, a boat at the dock. That tension between hiding and being wide open is where I’d be happiest.” — GLM 5.2

Z.ai’s open-weights flagship (Zhipu) — strong at long-horizon coding at a fraction of frontier cost. Added later at a reader’s request, run direct on the z.ai API with thinking enabled. The slowest of the six: it spent ~38k tokens reasoning before it placed a single voxel.

View generation & live render →

Side by side

The headline models on one screen — same prompt, same camera, same clock.

Share this showdown

Split-screen comparison assets, ready to post. Credit appreciated, not required.

⬇ Side-by-side comparison (2×2, MP4)⬇ Comparison grid (PNG)

How we ran it

Every model received the identical brief in a single turn and answered in a constrained build language: a JSON program of at most 180 operations (boxes, cylinders, spheres, lines, carves) over a 48×48×48 voxel grid with a fixed 21-material palette. The models never write rendering code — our renderer applies the same studio lighting, camera, and 360° turntable to every build, so the only variable is what the model chose to build. One generation per model, no retries, no edits, no cherry-picking: the first valid program returned is what you see. Each model was also asked to say, in one or two sentences, why its build is its dream home — quoted verbatim on the cards above. Reasoning was enabled for every model and disclosed per entry: the two Claude entries (Sonnet 5 and Fable 5) ran with adaptive thinking at high effort (24k-token cap), GPT-5.5 and GPT-5 at high reasoning effort (24k cap each), Gemini 3.1 Pro with an explicit 8,192-token thinking budget (verified binding with a probe before inclusion), and GPT-5 mini at medium effort (8k cap), and GLM 5.2 with thinking enabled on the z.ai API (no exposed effort level, given a 65k-token budget so its reasoning would not starve the build). One disclosure: the showdown published with four models because GPT-5.5 was not yet enabled on our API project — access arrived hours after publication, and its entry was added under the identical brief and one-shot rule (first valid program returned, no retries). GPT-5 stays on the page as part of the original run. GLM 5.2 was added later still, by reader request after the showdown circulated on Reddit, run direct on the z.ai API under the identical brief and one-shot rule (first valid program returned, no retries); it is the only open-weights model on the page. Claude Sonnet 5 was added on 30 June 2026 — the day it launched — under the identical brief and one-shot rule (first valid program returned, no retries), at the same high-effort tier as Fable 5. Kimi K3 is the newest addition, generated within 24 hours of its 16 July 2026 launch under the identical brief and one-shot rule, run via the AIMLAPI gateway at Moonshot’s default reasoning setting (the gateway exposes no effort control for it) with the same 24k-token output cap, probe-verified as binding with reasoning billed as output; its latency reflects launch-day congestion at the provider. GPT-5.5 Pro remains excluded from showdowns because its reasoning spend cannot be capped. Reasoning tokens bill as output tokens, which is why billed tokens exceed the size of each build program. Orbit videos were recorded in headless Chromium under software WebGL with a virtualized clock for constant frame pacing.

New showdown every week

Same format, new brief, latest models — Fable, GPT, Gemini and whatever ships next. Get each one the morning it goes live.

Run your own showdowns

Think you can write a better brief? Run this showdown’s prompt yourself — pick a model, tweak the brief, and get your own live render.

Run this brief yourself →Browse benchmarks Compare models