The dirty secret of AI web search has always been the plumbing. A model fires off a query, fetches half a dozen pages, dumps entire HTML documents into its context window, and then tries to reason over the mess. Most of that content is navigation bars, cookie banners, sidebar ads, footer links — noise that burns tokens and degrades the answer. Anthropic just shipped a fix that's almost embarrassingly straightforward.

Dynamic filtering lets Claude write and execute Python code to parse, filter, and cross-reference search results before they enter the context window. Not after. Before. The model looks at what came back from the web, writes a quick script to extract only the relevant pieces, runs it, and feeds itself the cleaned output. It's the kind of approach an engineer would reach for instinctively — treat the raw HTML like data, run an ETL step, then reason over the result — but it took until now for the models to do it themselves.

The benchmark numbers are significant. On BrowseComp, which tests finding deliberately hard-to-locate information across multiple websites, Opus 4.6 jumped from 45.3% to 61.6%. Sonnet 4.6 went from 33.3% to 46.6%. On DeepsearchQA — multi-answer research queries where you need to find every correct answer — Opus climbed from 69.8% to 77.3%. Average across both benchmarks: 11% accuracy gain while using 24% fewer input tokens.

That last part is the one I keep circling back to. Better and cheaper. Those two things almost never move in the same direction in this industry. Usually you buy accuracy with more compute, longer chains of thought, bigger context windows. Here the gains come from subtraction. Throw away the junk before you think about it, and the thinking gets better because there's less noise competing for attention.

The implementation leverages tools Claude already had — code execution, memory, programmatic tool calling — just wired together differently. It's enabled by default with the new web_search_20260209 and web_fetch_20260209 tool versions on the API for Sonnet 4.6 and Opus 4.6. You need the code execution tool included, which makes sense. The model needs somewhere to run those filter scripts.

I keep thinking about the context bloat problem I wrote about earlier this month — how connecting multiple MCP servers can balloon tool definitions to hundreds of thousands of tokens before an agent even starts working. Dynamic filtering attacks the same fundamental issue from the search side. The pattern is clear: the next round of capability gains won't come from making models smarter. They'll come from making models more disciplined about what they bother reading in the first place.

Sources: