1 week ago 7 hours ago

Tutorial: Claude Code vs Codex vs DeepSeek Flight Sim

Three AI coding agents receive the same prompt — build a browser-based Three.js flight simulator — and get measured on speed, cost, planning depth, and output quality. Claude Code wins on visual polish, Codex wins on time-to-playable, and DeepSeek V4 Pro wins on cost by a wide margin. Here is exactly what each agent produced and what the official documentation adds to the picture.

by marketingagent.io 1 week ago7 hours ago

0views

Claude Code vs. Codex vs. DeepSeek V4 Pro: Building a Three.js Flight Simulator

Three AI coding agents. One identical prompt. A browser-based Three.js flight simulator built from scratch, iterated to playability, and measured on speed, cost, and output quality. Work through this comparison and you’ll know which agent earns its place in your stack — and exactly where each one breaks down under real conditions.

Pull up the benchmark scorecard across all three models. On SWEBench Verified and SWEBench Pro, Opus 4.7 leads. GPT-5.5 takes Terminal Bench 2.0 at 82.7, ahead of Opus 4.7’s 69.4. DeepSeek V4 Pro trails by five points or fewer on every coding metric — while costing roughly eight times less.

The benchmark scorecard: GPT-5.5, Opus 4.7, and DeepSeek V4 Pro head-to-head across coding, reasoning, tool use, and long-context tasks.

Check the token pricing before a single line of code is written. GPT-5.5 runs $30 per million output tokens and $5 per million input. Opus 4.7 is $25 output, $5 input. DeepSeek V4 Pro sits at $3.48 output and $1.70 input. OpenAI states that GPT-5.5’s efficiency gains mean real-world costs run only about 20% higher than GPT-5.4 despite the 2× rate increase — worth factoring into any cost projection.

The cost gap is stark: DeepSeek V4 Pro at $3.48/1M output tokens vs. GPT-5.5 at $30 — an ~8x price differential before a single line of code is written.

Configure the three harnesses: Codex for GPT-5.5, Claude Code for Opus 4.7, and Open Code for DeepSeek V4 Pro. Each harness is given the same skill set so the only variable is the model.

Warning: this step may differ from current official documentation — see the verified version below.

Submit an identical prompt to all three agents at the same time: a browser-based Three.js flight simulator with realistic weight, strong visuals, and whatever structure the model judges correct. The deliberate openness in the prompt surfaces architectural divergence between models.

Identical prompt, three agents: 'Build a flight simulator in Three.js that runs in the browser' submitted to Codex, Open Code, and Claude Code at the same moment. — Identical prompt, three agents: ‘Build a flight simulator in Three.js that runs in the browser’ submitted to Codex, Open Code, and Claude Code at the same moment.

Answer each model’s clarifying questions. DeepSeek V4 Pro asks five: flight model type, terrain, camera preference, tooling, and visual style. Codex asks three, including what first-playable experience to prioritize. Claude Code locks choices interactively — study-sim leaning, ocean and islands, keyboard and mouse.
Review the planning outputs before any code is written. DeepSeek’s plan is bare-bones: a project structure and brief notes on physics, environment, camera, and HUD. Codex produces a structured document with implementation details, an explicit test plan, and stated assumptions. Claude Code takes roughly five minutes but goes deepest — stall physics, stall buzzer behavior, aircraft spec, full module breakdown.

Planning phase side-by-side: all three agents building their architectural approach to the same Three.js flight sim prompt — each with a distinct planning style.

Evaluate first-pass results in the browser. Codex delivers a functioning simulator first — terrain, clouds, full HUD instrumentation — but the flight model is too unforgiving to get airborne on a first attempt. DeepSeek produces broken visuals. Claude Code spawns the plane mid-air and immediately stalls it out.

First build result: a working Three.js flight simulator in the browser — KIAS 57, full throttle, with HUD instrumentation and chase cam active.

Issue follow-up prompts to all three: easier flight physics, improved graphics. The point is not one-shot evaluation — it is whether a model can close the gap through iteration, which is how production work actually runs.
Tally the final results after two to three iterations. Codex reaches playable in roughly 10–15 minutes at 66,000 tokens, with an actual run cost around $5.40. Claude Code delivers the most polished visual output but takes ~20 minutes and burns through 150,000 tokens. DeepSeek finishes under 44 cents at 63,000 tokens but never achieves a clean visual across all passes.

After one iteration: the flight sim gains a sunset sky, terrain trees, and a properly aligned runway — the visual quality gap between first and second pass is immediate.

Map the trade-offs. GPT-5.5 via Codex wins on speed-to-playable. Opus 4.7 via Claude Code wins on planning depth and visual quality at the highest token cost. DeepSeek V4 Pro wins on cost — by a wide margin — but underdelivers on visual output for this task type.

How does this compare to the official docs?

The video surfaces real performance gaps under practical conditions, but the harness choices, default configurations, and token-counting methodology all carry assumptions worth checking against what Anthropic, OpenAI, and DeepSeek document about each model’s intended agentic use.

Here’s What the Official Docs Show

The video builds a genuinely useful three-way comparison, and the official documentation fills in a few gaps rather than upending the findings. One model name, one missing access requirement, and one documentation page that wouldn’t load are the details worth having before you replicate this setup.

Step 1 — Benchmark scorecard

No official documentation was found for this step — proceed using the video’s approach and verify independently.

No SWEBench Verified, SWEBench Pro, or Terminal Bench 2.0 scores appear in any official screenshot captured. One naming correction applies here and throughout: as of April 24, 2026, the official deepseek.com homepage identifies the third model as DeepSeek-V4 — the video uses “DeepSeek V4 Pro,” which does not appear anywhere on the official site and may reflect a label surfaced within the Open Code harness interface rather than the canonical product name.

📄 DeepSeek.com homepage confirming the product name is DeepSeek-V4 (preview release) — no “Pro” suffix appears on the official site.

Step 2 — Token pricing

No official documentation was found for this step — proceed using the video’s approach and verify independently.

The video’s per-token figures for Opus 4.7 and DeepSeek-V4 could not be confirmed or contradicted from available screenshots. The Claude pricing visible in official screenshots is subscription-based — Pro at $17/month, Max from $100/month — a separate access model that coexists with API pricing; both can be accurate simultaneously for different access methods. One status flag the video omits: DeepSeek’s official homepage describes DeepSeek-V4 as a preview release, not a production model.

Anthropic's subscription pricing page showing Claude Code included in Pro ($17/mo) and Max ($100+/mo) plans — a separate pricing model from the per-token API rates cited in the video. — 📄 Anthropic’s subscription pricing page showing Claude Code included in Pro ($17/mo) and Max ($100+/mo) plans — a separate pricing model from the per-token API rates cited in the video.

📄 DeepSeek.com footer showing the research model lineup and API pricing platform link — pricing figures not visible in this screenshot.

Step 3 — Configure the three harnesses

The video’s approach here matches the current docs exactly — Claude Code is confirmed as a real product at claude.ai/code with Google and email authentication. One access requirement the video does not state: Claude Code requires at minimum a Pro subscription ($17/month). The Free tier does not include it.

Two additions worth noting: Claude Code ships alongside a paired web IDE called Cowork that the video does not reference, and a separate Claude Code for Enterprise tier is listed as a distinct product. Neither affects the terminal-based workflow shown in the tutorial, but both are visible on the official product page.

OpenAI’s Codex documentation page (openai.com/codex) returned load errors on all three screenshot attempts. Claims about Codex as a GPT-5.5 harness, its pricing, and benchmark scores remain unverifiable from official documentation at this time.

📄 Claude.ai/code sign-in page showing Google and email authentication options alongside the companion Cowork product reference.

📄 openai.com/codex returned a browser-level page-load error at the time of screenshot capture.

Step 4 — Submit the identical prompt

The video’s approach here matches the current docs exactly. Three.js is an active JavaScript 3D library well-suited to browser-based rendering at this complexity level — the version current at time of writing is r184. The official site also ships devtools and an in-browser editor the video doesn’t mention, which are useful for debugging output independently of whichever AI agent you’re running.

📄 Three.js homepage (threejs.org) at version r184, showing the community gallery of browser-based 3D projects.

📄 Three.js r184 site navigation sidebar listing official documentation, devtools, an online editor, and community resources.

Steps 5–10 — Clarifying questions, planning review, build evaluation, iteration, final tally, and trade-off mapping

No official documentation was found for these steps — proceed using the video’s approach and verify independently.

Useful Links

Sign in – Claude — Official Claude Code product page with subscription tier requirements, sign-in options, and reference to the companion Cowork web IDE.
Codex | AI Coding Partner from OpenAI — OpenAI’s Codex documentation page; returned load errors at the time of writing — verify directly before relying on the video’s claims about the Codex harness or GPT-5.5 pricing.
Three.js – JavaScript 3D Library — Official Three.js library homepage at version r184, including documentation, community examples, devtools, and an in-browser editor.
DeepSeek | 深度求索 — Official DeepSeek homepage confirming the DeepSeek-V4 preview release and linking to the API open platform and pricing page.
Home \ Anthropic — Anthropic’s company homepage confirming Claude Opus 4.7 as a current model positioned for coding, agents, and complex professional work.