Tutorial: Claude 1M Context Window & Context Rot

Claude’s 1M Context Window vs. Context Rot: What the Benchmarks Actually Show

Anthropic’s general availability release of a 1 million token context window for Opus 4.6 and Sonnet 4.6 changes more than just the budget ceiling — it potentially changes how you manage context in Claude Code entirely. This tutorial walks through the benchmark evidence behind that claim, introduces the tools used to interrogate the data, and translates the numbers into a revised rule of thumb for context window management. By the end, you’ll know exactly when to clear your context and why the old 100K threshold no longer holds.

Pull up Anthropic’s official announcement at claude.com/blog/1m-context-ga. The page confirms that 1M context is now generally available for Opus 4.6 and Sonnet 4.6, with standard pricing applied across the full context range and a media limit expanded to 600 images or PDF pages per request.

Anthropic's official announcement: 1M token context is now GA for Opus 4.6 and Sonnet 4.6 with no long-context pricing premium. — Anthropic’s official announcement: 1M token context is now GA for Opus 4.6 and Sonnet 4.6 with no long-context pricing premium.

2. Locate Anthropic’s published MRCR v2 8-needle long-context retrieval benchmark. Note the score for Opus 4.6 at the 1M token mark: 78.3. Cross-reference it against the field — GPT-5.4 at 36, Gemini 3.1 Pro at 26, Sonnet 4.5 at 18.5, and Opus 4.5 at roughly 26. The gap at 1M tokens is not marginal; it’s the difference between a usable context window and a theoretical one.

Anthropic's announcement tweet: Opus 4.6 at 78.3% MRCR v2 accuracy at 1M tokens — the headline stat and practical use cases in one frame. — Anthropic’s announcement tweet: Opus 4.6 at 78.3% MRCR v2 accuracy at 1M tokens — the headline stat and practical use cases in one frame.

3. Interpret the drop from 256K to 1M tokens. Opus 4.6 falls roughly 14 percentage points over a 750K token span. In prior model generations, degradation at this scale was steep and often cliff-like — performance didn’t drift downward, it collapsed. A 14% drop distributed across 750K tokens represents a structurally different degradation pattern.

MRCR v2 8-needle benchmark: Opus 4.6 leads all frontier models at the 1M token mark while competitors drop sharply.

4. Open the Chroma context rot study to establish the baseline problem. Chroma’s research, published the prior summer, documented severe performance degradation across multiple models as input token length grew — the defining dataset that triggered aggressive context-clearing strategies across the Claude Code community. The Opus 4.6 curve is a departure from every model tested in that study.

Context rot in action: Chroma's research shows Claude Sonnet 4 and peers degrading measurably as input token length grows — the core problem a 1M window doesn't automatically solve. — Context rot in action: Chroma’s research shows Claude Sonnet 4 and peers degrading measurably as input token length grows — the core problem a 1M window doesn’t automatically solve.

5. Navigate to contextarena.ai and open the MRCR 8-needle leaderboard. Use the interactive tooltip to examine model scores at specific context lengths. The drop-off patterns across models tend to be roughly linear rather than exponential — performance doesn’t plateau and then fall off a cliff; it erodes gradually.

Using Context Arena's interactive tooltip to pinpoint exact model scores at specific context lengths — a practical research move before choosing a model. — Using Context Arena’s interactive tooltip to pinpoint exact model scores at specific context lengths — a practical research move before choosing a model.

6. Derive a working rule of thumb from the available data: approximately 2% effectiveness loss per 100K tokens added in Claude Code. This is an extrapolation assuming linear degradation between the 256K and 1M data points, which the Context Arena data supports as a reasonable assumption — but not a confirmed one.

Warning: this step may differ from current official documentation — see the verified version below.

7. Update your context window management strategy. The previous guidance — clear aggressively at 100K–120K tokens — was a direct response to the Chroma findings. With Opus 4.6’s degradation curve, clearing at 200K or beyond is now defensible depending on your use case. If you can clear at 200K, do it; there’s no reason to absorb unnecessary degradation. But holding a session open past 200K for large codebases or long-running agents no longer requires the workarounds it once did.

8. Confirm plan eligibility. The 1M context window in Claude Code requires the Max, Teams, or Enterprise plan. It is not available on Pro or lower tiers.

9. Note the pricing change. Prior to this release, the API applied a token-count multiplier above roughly 200K tokens. That multiplier is gone — the per-token rate is flat across the full 1M range, which also applies to media inputs up to the new 600-image ceiling.

Running /context in Claude Code reveals 951K tokens of free space in a live Opus 4.6 session — 95% of the 1M window still available.

10. Hold the revised guidance loosely. There are no published data points between 256K and 1M tokens. The linearity assumption is reasonable, but unverified — degradation could steepen at 512K or flatten further. The rule of thumb is a starting point, not a guarantee.

How does this compare to the official docs?

Anthropic’s announcement page and benchmark data tell one side of the story — but the official documentation for Claude Code context management and the MRCR v2 methodology fill in critical gaps that the video leaves open.

Here’s What the Official Docs Show

The video builds a coherent case for raising context-clearing thresholds in Claude Code, and the underlying logic is sound — what follows adds documentation context for each step and flags the specific data points the screenshots couldn’t confirm, so you can calibrate your confidence in each claim. Where the docs are silent, we say so directly.

Step 1 — Anthropic’s announcement

Claude Sonnet 4.6’s existence is confirmed by an announcement card on Anthropic.com dated February 17, 2026, with links to a full announcement page and a separate model details page. The specific MRCR benchmark scores cited in the video are not visible on any captured Anthropic.com page — those numbers live in the full announcement, not the homepage.

📄 Anthropic.com homepage featuring the Claude Sonnet 4.6 announcement card (February 17, 2026) — no benchmark data visible at this level.

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Steps 2–3 — Benchmark scores and the 14% drop

Two label mismatches worth flagging. First, Context Arena’s leaderboard lists Claude entries as claude-sonnet-4 — no .6 version suffix appears for any Claude model in the captured data. Second, as of 2026-03-14, no model labeled GPT-5.4 appears in any leaderboard screenshot; the visible GPT-5 family is gpt-5, gpt-5.2, gpt-5.2:xhigh, and gpt-5-mini. The specific scores (78.3, 18.5) and the 14% drop calculation cannot be confirmed from the available screenshots.

📄 Context Arena leaderboard — claude-sonnet-4 entries show 1M max context with N/A at the 1M performance column; no claude-opus-4.6 or claude-sonnet-4.6 entries visible.

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 4 — Chroma context rot study

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 5 — Context Arena leaderboard

Context Arena is real and does exactly what the video describes — interactive cross-model long-context comparisons with selectable context lengths. One gap: the captured screenshot shows the benchmark set to 2 (Easy) needles, not 8. As of 2026-03-14, the correct needle count visible is 2 — the video references 8, which reflects a different UI configuration. An 8-needle setting likely exists in the selector, but it is not what appears in the documentation screenshots.

📄 Context Arena showing the OpenAI-MRCR benchmark with 2-needle (Easy) setting — top performers are all Gemini variants across context lengths up to 1M tokens.

📄 Context Arena leaderboard upper section — Gemini 3.x and GPT-4.1 models with 2-needle scores at 128k and 1M lengths; no Claude 4.x models on this page.

Step 6 — 2% per 100K working rule

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Steps 7–8 — Claude Code context strategy and plan requirements

Important URL note: claude.ai/code resolves to a landing page for Cowork, a separate Claude-powered task automation product — not the Claude Code CLI. Claude Code CLI documentation lives at docs.anthropic.com/en/docs/claude-code. The Max plan is confirmed on the pricing page at $100/month; whether 1M context is gated specifically to Max or higher is plausible but not visible in the captured feature list, which is cut off in the screenshot.

📄 claude.ai/code landing page showing the Cowork product — this is not the Claude Code CLI documentation page.

📄 claude.ai pricing page — Free, Pro ($17/mo annual), and Max (from $100/mo) plans confirmed; specific 1M context entitlement not visible in the cut-off feature list.

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 9 — Flat pricing above 200K tokens

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Step 10 — Linearity assumption

The claude-sonnet-4 entries show N/A at 1M tokens, which is consistent with the video’s acknowledgment of missing data. No intermediate data points between 256K and 1M are visible in any screenshot — the linearity assumption remains reasonable, but unverified from the captured documentation.

No official documentation was found for this step —
proceed using the video’s approach and verify independently.

Useful Links

Home \ Anthropic — Anthropic’s homepage, where the Claude Sonnet 4.6 announcement card (February 17, 2026) links to the full model announcement and a separate model details page.
Context Arena — Interactive long-context retrieval benchmark leaderboard with selectable needle counts and context-length comparisons across frontier models up to 1M tokens.
Claude Code — Resolves to the Cowork product landing page; Claude Code CLI documentation is separately maintained at docs.anthropic.com/en/docs/claude-code.

What's Your Reaction?