惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

A
Arctic Wolf
T
The Blog of Author Tim Ferriss
月光博客
月光博客
Recent Announcements
Recent Announcements
V
V2EX
Microsoft Azure Blog
Microsoft Azure Blog
博客园 - 三生石上(FineUI控件)
P
Proofpoint News Feed
The Register - Security
The Register - Security
博客园 - 叶小钗
博客园 - Franky
The Cloudflare Blog
雷峰网
雷峰网
罗磊的独立博客
M
MIT News - Artificial intelligence
I
InfoQ
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - 【当耐特】
Engineering at Meta
Engineering at Meta
N
Netflix TechBlog - Medium
爱范儿
爱范儿
博客园 - 司徒正美
Recorded Future
Recorded Future
酷 壳 – CoolShell
酷 壳 – CoolShell
Google DeepMind News
Google DeepMind News
Martin Fowler
Martin Fowler
Microsoft Security Blog
Microsoft Security Blog
F
Full Disclosure
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
B
Blog
大猫的无限游戏
大猫的无限游戏
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
腾讯CDC
WordPress大学
WordPress大学
小众软件
小众软件
K
Kaspersky official blog
Attack and Defense Labs
Attack and Defense Labs
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Forbes - Security
Forbes - Security
aimingoo的专栏
aimingoo的专栏
IT之家
IT之家
The Last Watchdog
The Last Watchdog
N
News and Events Feed by Topic
B
Blog RSS Feed
S
Security @ Cisco Blogs
美团技术团队
量子位
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cloudbric
Cloudbric
Hacker News - Newest:
Hacker News - Newest: "LLM"

Martin Alderson

Expert-aware quantisation: near-Q4 quality at near-Q2 size? A brief history of KV cache compression developments xAI is looking more like a datacentre REIT than a frontier lab Is datacentre sovereignty really that important? I went on the Built for Turbulence podcast What's going on with Gemini? Managed agents are the new Lambda Open weights are quietly closing up - and that's a problem 29th August 2026: a scenario Figma's woes compound with Claude Design A little tool to visualise MoE expert routing Has Mythos just broken the deal that kept the internet safe? What next for the compute crunch? Telnyx, LiteLLM and Axios: the supply chain crisis Using agents and Wine to move off Windows How to use the Qwen 3.5 LLMs to OCR documents No, it doesn't cost Anthropic $5k per Claude Code user Is the AI Compute Crunch Here? Why on-device agentic AI can't keep up Using OpenCode in CI/CD for AI pull request reviews Which web frameworks are most token-efficient for AI agents? Who fixes the zero-days AI finds in abandoned software? Attack of the SaaS clones How to generate good looking reports with Claude Code, Cowork or Codex Self-improving CLAUDE.md files Wall Street just lost $285 billion because of 13 markdown files Two kinds of AI users are emerging. The gap between them is astonishing. Turns out I was wrong about TDD Why sandboxing coding agents is harder than you think The Coming AI Compute Crunch Which programming languages are most token-efficient? I ported Photoshop 1.0 to C# in 30 minutes Why I'm building my own CLIs for agents Travel agents took 10 years to collapse. Developers are 3 years in. Are we dismissing AI spend before the 6x lands? Minification isn't obfuscation - Claude Code proves it AI agents are starting to eat SaaS Has the cost of building software just dropped 90%? Are we in a GPT-4-style leap that evals can't see? I Finally Found a Use for IPv6 How I use Claude Code to manage sysadmin tasks Could Excel agents unlock $1T in economic value? Are we really repeating the telecoms crash with AI datacenters? A non-technical CFO is shipping better code than the agencies he hired Tracking MCP Server Growth Notes from MCP Dev Summit Europe: Where the Protocol Is Headed How I make CI/CD (much) faster and cheaper Google AI Studio API has been unreliable for the past 2 weeks What happens when coding agents stop feeling like dialup? Solving Claude Code's API Blindness with Static Analysis Tools Are OpenAI and Anthropic Really Losing Money on Inference? I gave Claude Code a folder of tax documents and used it as a professional tax agent Beyond the Hype: Real-World MCP Support Across Major AI APIs Welcome to My Blog
Why Claude's new 1M context length is a big deal
Martin Alderson · 2026-03-15 · via Martin Alderson

Last Friday Anthropic released a new (production at least - has been in beta for a while) 1M context window variant of Opus 4.6 and Sonnet 4.6. This is actually a big breakthrough from my initial experiments.

If you struggle to visualise what a token is - a good rule of thumb I use is that a standard A4/letter-sized page tends to contain around 500-1000 tokens of English[1]. So, 1 million tokens is roughly 1,000-2,000 pages - or about 4-5 novels worth of text.

AI is improving on so many dimensions

I think it's important to start by underlining just how many things are improving in the AI space. Across quality, cost, speed (the new Qwen 3.5 models unlock very interesting use cases at a low cost) and now context length, the pace is relentless.

GPT-3.5 had a 4,096 token context window - perhaps a few pages of text - back in late 2022. That steadily increased to around 200K over the intervening couple of years. Now we have a 1M context length on a very capable frontier model (I should add that GPT-5.x also had much longer context windows, but for the most part they were limited to only the APIs).

But wait you might ask - didn't Gemini have this a long time ago? Yes they did, but the results were pretty poor.

Not all context lengths are made equal

There's a concept in LLMs called context rot where as the session with the model grows in length, it tends to drop in quality. It can start 'forgetting' things in its context window you've already said, or worse, confusing concepts and hallucinating more. This has been (one of the many) reasons that most practitioners in the space recommend always starting new sessions as often as possible.

One way to measure this degradation is the 'needle' benchmark. This asks the LLM to recall a certain fact from its context window, and repeat it back (the name comes from the phrase "finding a needle in the haystack").

Like all benchmarks it does have weaknesses, but I thought this chart Anthropic published was very interesting:

Needle-in-haystack benchmark comparison showing Claude maintaining near-perfect recall at 1M tokens while GPT-5.4 and Gemini 3.1 Pro degrade past 256K

You can see here that while GPT-5.4 and Gemini 3.1 Pro both have 1M context lengths, they quickly degrade past 256K - struggling to get above 50% match ratio at 1M length. This is a real problem for long running agentic tasks.

Now we always need to take these benchmark comparisons from the labs with a pinch of salt - Anthropic has every incentive to pick benchmarks that flatter their own models, they all do. But my rough anecdotal experimentation with Opus 4.6 1M does seem to hold up. I've run a few ~500K token sessions with Claude Code after it was released, and the performance seems very good. It kept on task, and I didn't have to "repeat myself" any more than I'd normally do. It felt extremely natural, just like a normal (shorter context) session with Claude Code.

Claude Code session showing context usage at around 500K tokens

Halfway to a million tokens. No signs of amnesia yet.

I'm interested to see external and third party benchmarks over the coming days to see if there are any gotchas with it, but first impressions are very positive.

Why longer context windows are so helpful

Now you may ask why this really matters - how often do you need to have the model remember thousands of pages of text day to day?

The answer is (of course) agentic workflows. If you've used coding agents for any length of time, you'll quickly reach a point where you hit the dreaded 'compaction' stage. Compaction is the process most agents use when they reach the limit of the context window. It condenses earlier parts of the conversation - preserving recent context and key artifacts but losing a lot of the detail from earlier in the session.

While this actually works to some degree, it has a lot of drawbacks.

Firstly, if you're working on a project with many files - which is very common - it often has to start by reminding itself of all these files as the summary isn't detailed enough. This can quickly end up with a bit of a catch 22, where it compacts, reads a tonne of files, then is running low on context to continue. As agents continue to improve in their abilities on very complicated tasks this becomes more and more of an issue.

Secondly, and the more obvious one, is that some agentic tasks do genuinely need to have far more documents in their context window. A classic one is legal tasks - if you are wanting the agent to cross reference hundreds of contracts, it's best for the agent to have the entire contract in its context window, not just summarised/excerpts which can result in poor interpretations. Same for many financial analyst tasks with investor reports.

Ironically software is far better suited to being "excerpted" than many other professional service tasks - programming languages by their nature are far more "modular" and structured than standard "human" documents, and therefore naturally can be searched through and snapshotted with far better results than many other types of documents.

Finally, and an often overlooked one, the models are very good at inferring a lot more from your instructions than is probably obvious. There has been an increasing amount of research on how LLMs can infer a lot about a user's emotional state from "unrelated" messages.

But beyond emotions, I think we end up encoding a lot more subtlety into our agent sessions than we realise. Using compaction - or any form of 'note taking' often loses a lot of this, much like how reading meeting minutes often doesn't capture the actual energy in a meeting.

Cost - the big surprise

The real reason this is a big deal is that Anthropic is not charging any more for it. Google and OpenAI both charge 2x input prices past 200-272K tokens - Gemini 3.1 Pro goes from $2 to $4/M, GPT-5.4 from $2.50 to $5/M. Anthropic used to do the same, but with the 4.6 release they've dropped the surcharge entirely.

They've also included it in the Max and Business subscriptions, which Codex (as of writing) doesn't. The competition is relentless in this space.

I had very much expected this to stay behind some "extra usage" flag and at Anthropic's API pricing, ruinously expensive for all but the most valuable agentic workflows.

This also enables enormous token consumption. You can have one agent with 1M context manage and orchestrate many subagents - each with their own 1M context window. The issue with the 128-200K token lengths before was even if your subagent tasks could fit in that, your 'main' orchestration agent would run out of context.

We'll see how this plays out as third party benchmarks come in and more people push it to its limits. But if first impressions hold, this quietly might be one of the most impactful releases of the year.


  1. This is a gross simplification. Different data types (for example programming languages or structured data like JSON or CSV) can use a lot more tokens for the same amount of 'characters' on a document. If you're interested in learning more about this, I'd recommend this article ↩︎