Skip to content

Plutonic Rainbows

Deep Think Crosses the Human Line

Google upgraded Gemini 3 Deep Think yesterday, and the number that matters is 84.6%. That's its score on ARC-AGI-2, the abstract reasoning benchmark designed to resist brute-force pattern matching. Humans average around 60%. Claude Opus 4.6 — which landed last week to genuine excitement — scores 68.8%. GPT-5.2 manages 52.9%. Deep Think clears the human baseline by nearly 25 points and leads the next-best model by almost 16.

I'm trying to figure out what to do with that.

The Codeforces result is harder to dismiss as benchmark theatre. Deep Think hit 3,455 Elo — Legendary Grandmaster territory, better than all but seven active human programmers on the platform. No external tools. No retrieval. Just inference-time compute and whatever Google means by "parallel hypothesis exploration." The top human competitor, Benq, sits at 3,792. That gap is closing fast enough to make competitive programming feel like it has an expiration date.

What changed from the previous version: scope. Earlier iterations of Deep Think were narrowly focused on mathematics. This upgrade pushes into chemistry, physics, and engineering. Gold medals on the written portions of the International Math, Physics, and Chemistry Olympiads. A mathematician at Rutgers used it to peer-review a paper on high-energy physics structures bridging gravity and quantum mechanics. It caught a subtle logical flaw that human reviewers had missed. That's not a benchmark. That's a real research contribution, however narrow.

The architecture Google describes — they call it "Aletheia" — uses a generator, a natural language verifier, and a reviser working in concert. Parallel hypothesis exploration rather than a single reasoning chain. The interesting detail is that the system can acknowledge failure and stop rather than burning compute on dead-end paths. Most reasoning models I've used have no concept of giving up gracefully. They hallucinate forward until they hit a token limit. If Aletheia genuinely knows when it's stuck, that's a meaningful advance in how these systems manage uncertainty.

Google's approach here is fundamentally different from what Anthropic and OpenAI are doing. They're scaling inference-time compute — giving the model more time to think rather than making a bigger model. The base is still Gemini 3 Pro, not some trillion-parameter behemoth. Deep Think is a reasoning mode, not a separate model. The distinction matters because it suggests the ceiling on what you can extract from existing architectures is higher than most people assumed. You don't need a fundamentally new model. You need to let the current one actually think.

That feels right to me, intuitively. When I use extended thinking in Claude, the quality jump over instant responses is enormous — not because the model suddenly knows more, but because it has room to work through contradictions and dead ends before committing to an answer. Google is doing the same thing with significantly more compute thrown at the problem. Anthropic shows you the reasoning. Google hides it. Both approaches produce results that make the non-thinking versions look careless by comparison.

The pricing is interesting. Deep Think is included in the Google AI Ultra subscription at $249.99 per month. API access requires applying for an early programme. I keep thinking about how o3 was positioned as the reasoning breakthrough that would change everything, and then Deep Think shows up a year later scoring nearly 30 times higher on the same class of benchmark. The pace of obsolescence in this space is genuinely disorienting.

Demis Hassabis called it "new records on the most rigorous benchmarks in maths, science & reasoning." MarkTechPost ran with "Is This AGI?" — which, no. But I understand the impulse. A system that reasons better than the average human on abstract pattern recognition, codes better than 99.99% of programmers, and catches errors in peer-reviewed physics papers occupies territory that didn't exist twelve months ago.

Google DeepMind published a research impact taxonomy alongside the release, rating contributions from Level 0 to Level 4. They classify Deep Think's current output at Levels 0-2 — autonomous solutions and publishable collaborations, not landmark breakthroughs. The fact that they felt the need to temper expectations tells you something about the temperature of the conversation. When the company releasing the model is the one saying "calm down," the benchmarks have moved past what anyone's frameworks were built to accommodate.

Sources:

When Tokyo Could Buy Paris

Sacha van Dorssen shot the cover. Gail Elliott — dark hair, brown eyes, that ethnically ambiguous beauty that let her slip between markets without friction — stared back from newsstands across Tokyo in November 1988. The Nikkei was climbing toward a peak it would never reach again. Emperor Hirohito had weeks left. And Marie Claire Japan, the very first international edition of the French title, was selling something more complicated than clothes.

Inside, Yasmin Le Bon wandered Paris in an editorial called "I Love Paris," photographed by Naoki. A Japanese photographer shooting a half-Iranian, half-English model on the streets of Saint-Germain for a Japanese audience. The Bubble Economy distilled into a single editorial concept — the possession of Paris itself. Peter Lindbergh contributed pages. Steve Hiett brought his oversaturated flash. Kirsten Owen, androgynous and sharp, offered the anti-glamour counterweight. Juliette Binoche got an interview off the back of The Unbearable Lightness of Being. Romeo Gigli got a special feature, his soft Renaissance shoulders already dismantling the power suit from the inside.

This was the magazine for a specific woman. Not the Hanako girl buying Louis Vuitton at Isetan — her older sister, the one who wanted to know why she was buying it. Marie Claire monetised cultural capital in an era when financial capital was everywhere. Leos Carax and Terence Trent D'Arby in the same issue as Alaïa runway coverage. The magazine functioned as a passport, not a catalogue.

Fourteen months later, the stock market crashed and budgets like these evaporated. The location shoots dried up. I keep returning to this cover because it captures the apex so precisely — the last autumn when taste and money occupied the same room without anyone noticing the ceiling was about to fall.

The Cover That Outlived the Bubble

Swimwear Thinking Made City-Appropriate

Sun Studios sat on the sixth floor of 628 Broadway, between Bleecker and Houston. On the afternoon of November 5th, 1992, Liza Bruce showed her Spring/Summer 1993 ready-to-wear collection there — a small presentation in a SoHo loft space with draped white backdrops and scattered petals along the runway floor. The petals were the only decorative gesture. Everything else was restraint.

Bruce had built her reputation on swimwear. Lycra bodysuits, minimal seaming, the kind of stretch engineering where the fabric does the structural work instead of the pattern cutting. By 1993 she was translating that logic directly into daywear, and the results looked like nothing else on the New York calendar that season. Second-skin turtlenecks in white ribbed jersey. Ankle-length wrap skirts in warm stone that opened at the front to reveal a lighter underlayer beneath. Column slip dresses with spaghetti straps that owed more to lingerie than to anything you'd normally see at 4 p.m. on a Thursday in Manhattan.

The silhouette logic across both key looks was what the research calls "column plus interruption" — long, close lines broken by a single slit, overlap, or strap. The body organised everything. There was no print, no hardware, no contrast piping. The garment's entire argument was that fit, line, and fabric behaviour were sufficient. The styling reinforced this completely: hair worn long and straight, negligible jewellery, neutral shoes. Nothing competed with the silhouette.

Bruce was stocked at Harvey Nichols in London and Barneys in New York. She wasn't obscure. Yet the collection that people remember — the sheer slip Kate Moss wore, now sitting in the V&A's underwear exhibition — came a season later, for S/S 1994. What the Spring 1993 show demonstrates is that the grammar was already there. The columnar slips, the engineered wraps, the "underwear-as-outerwear" proposition in a quiet, minimalist register. The Moss moment didn't emerge from nowhere. It intensified something Bruce had been building toward on a petal-strewn runway in SoHo, six months earlier.

That second-skin turtleneck, worn without apology