Skip to content

Plutonic Rainbows

Fast Lanes and Locked Gates

Within five days of each other, Anthropic launched Opus fast mode and OpenAI shipped Codex-Spark. Same thesis, different silicon. Anthropic squeezes 2.5x more tokens per second out of Opus 4.6 through inference optimisation. OpenAI distills GPT-5.3-Codex into a smaller model and runs it on Cerebras wafer-scale hardware at over a thousand tokens per second. Both are research previews. Both are gated to developers. Both cost more than their standard counterparts.

The timing isn't coincidence. Coding agents are the first workload where latency translates directly into revenue. A developer staring at a terminal while an agent loops through forty tool calls doesn't care about cost per token — they care about wall-clock minutes. Anthropic charges six times the standard rate for fast mode. OpenAI hasn't published Spark pricing yet, but the Cerebras partnership wasn't cheap. These aren't loss leaders. They're premium tiers aimed at the one audience willing to pay for speed right now.

What interests me is the constraint both companies are accepting. Fast mode is Opus with the same weights, just served differently. Codex-Spark is a distilled, smaller model — OpenAI admits the full Codex produces better creative output. Neither approach is free. You either pay for dedicated inference capacity or you trade quality for velocity. There's no trick that makes frontier intelligence and sub-second latency coexist cheaply.

The question everyone keeps asking — will these become generally available? — misframes the situation. The technology already works. The bottleneck is economics. Anthropic can't offer fast mode to every Claude consumer at six times the compute cost without either raising subscription prices or eating the margin. OpenAI can't run every ChatGPT conversation through Cerebras wafer-scale engines. The hardware doesn't exist in sufficient quantity. Their own announcement says they're ramping datacenter capacity before broader rollout.

So the honest answer is: speed tiers will generalise, but slowly, and probably not in the form people expect. I'd bet on tiered pricing spreading across the consumer products — a fast toggle in Claude.ai, a "turbo" option in ChatGPT — before the end of the year. But it'll cost extra. The idea that baseline inference gets dramatically faster for free requires either a hardware miracle or margins that neither company can sustain.

The deeper pattern is what I wrote about last month. Speed is becoming the axis of competition because capability gains have slowed enough that users notice latency before they notice intelligence improvements. When both labs ship speed products in the same week, that tells you where the demand signal is loudest. Not smarter. Faster.

Sources:

The Loop That Writes Itself

GPT-5.3-Codex helped debug its own training. OpenAI said it plainly: "the first model that was instrumental in creating itself." That was ten days ago. This week, ICLR announced their first workshop dedicated entirely to recursive self-improvement, scheduled for Rio in April. Google's AlphaEvolve already discovered algorithmic improvements that beat Strassen's fifty-six-year-old matrix multiplication record. The pieces are landing on the board faster than anyone expected.

Recursive self-improvement — systems that modify their own code, weights, prompts, or architecture to become more capable, then use that increased capability to improve themselves further — has been a thought experiment for decades. Eliezer Yudkowsky warned about it. Nick Bostrom built philosophical scaffolding around it. And for most of that time it remained comfortably theoretical because the systems weren't good enough at the one thing the loop requires: writing better software than the software that already exists.

That constraint is dissolving. Not because we've achieved some sudden breakthrough in machine consciousness or general reasoning, but because the narrow version of self-improvement turns out to be enough to matter. A model doesn't need to understand itself philosophically to optimise its own training pipeline. It just needs to be good at code. And the current generation is good at code.

The METR data makes the trajectory explicit. AI task-completion horizons have been doubling every four to seven months — depending on which estimate you trust — for the past six years. If that holds for another two years, we're looking at agents that can autonomously execute week-long research projects. Another four years and it's month-long campaigns. The trend line itself isn't the alarming part. The alarming part is that the trend doesn't need to hold perfectly. Even if progress halves, the capability gap closes on a timeline measured in quarters, not decades.

Dean Ball put it starkly in his recent analysis: America's frontier labs have begun automating large fractions of their research operations, and the pace will accelerate through 2026. OpenAI envisions hundreds of thousands of automated research interns within nine months. Dario Amodei cites 400% annual efficiency gains from algorithmic advances alone. These aren't wild extrapolations from startup pitch decks. These are the people running the labs describing what they see happening inside their own buildings.

However. There's a constraint that rarely gets enough attention in the acceleration discourse. Self-improvement only generates reliable gains where outcomes are verifiable. Code that passes tests. Algorithms with measurable performance. Training runs with clear loss curves. The loop works brilliantly in these domains because you can tell whether the modification actually helped. The system generates a change, measures the result, keeps or discards. Simple evolutionary pressure.

The loop breaks — or at least stumbles badly — when it encounters domains where verification is ambiguous. Alignment research. Safety evaluation. Novel hypothesis generation. The things that arguably matter most for whether recursive self-improvement goes well or catastrophically. A system can optimise its own matrix operations all day. Whether it can meaningfully improve its own ability to recognise its blind spots is a much harder question, and I suspect the honest answer is no.

So when will genuine recursive self-improvement arrive? It depends on what you mean. The narrow version — models improving their own infrastructure, training pipelines, and deployment tooling — is already here. GPT-5.3-Codex is doing it in production. The medium version — agents that systematically discover architectural improvements and better training recipes — is probably twelve to eighteen months out, conditional on the METR trendline holding. The strong version — a system that improves its own reasoning capabilities in open-ended domains, including the ability to improve its ability to improve — remains genuinely unclear. I'm not confident it's five years away. I'm not confident it's twenty.

What I am confident about is that we'll get the narrow and medium versions before we have any serious framework for governing them. The ICLR workshop is a start — researchers trying to make self-improvement "measurable, reliable, and deployable." But the gap between academic workshops and deployed production systems has never been wider. OpenAI shipped a self-improving model before anyone published a standard for evaluating self-improving models. That ordering tells you everything about the incentive structure.

The Gödel Agent — a system that modifies its own task-solving policy and learning algorithm — climbed from 17% to 53% on SWE-Bench Verified. SICA did something similar. These are research prototypes, not products, but the delta between prototype and product in this field is about eighteen months and shrinking. Probably less now that the prototypes can help close the gap themselves.

I keep coming back to something Ball wrote: the public might not notice dramatic improvements, dismissing them as "more of the same empty promises." That feels backwards to me. The risk isn't that progress will be invisible. The risk is that it'll be visible to the people building it, acting on it, profiting from it — and invisible to everyone else until the loop is already running too fast to audit.

Sources:

Forty-Seven Percent Would Rather Not

Nearly half of British sixteen-to-twenty-one-year-olds told the BSI they'd prefer to have grown up in a world without the internet. Forty-seven percent. Not a fringe opinion from technophobes or Luddites — a near-majority of the generation that never knew anything else.

The rest of the numbers are worse. Sixty-eight percent said they felt worse about themselves after spending time on social media. Forty-two percent admitted to lying to their parents about what they do online. Forty percent maintain a decoy or burner account. Eighty-five percent of young women compare their appearance and lifestyle to what they see on their feeds, with roughly half doing so often or very often. These aren't edge cases. This is the baseline experience.

What strikes me isn't the individual statistics — we've had versions of these figures for years. Back in 2018, Apple's own investors were pressuring the company over youth phone addiction, citing surveys where half of American teenagers said they felt addicted to their devices. Seven years later, nothing structural changed. The platforms got stickier. The algorithms got sharper. The age of first exposure dropped. And now the generation that grew up inside the experiment is telling us, plainly, that they wish the experiment hadn't happened.

Fifty percent of respondents said a social media curfew would improve their lives. Twenty-seven percent wanted phones banned from schools. Seventy-nine percent believed tech companies should be legally required to build privacy safeguards. That last number is the one I keep returning to — four out of five young people asking for regulation that adults have spent a decade failing to deliver.

The BSI's chief executive, Susan Taylor Martin, put it in corporate language: "The younger generation was promised technology that would create opportunities, improve access to information and bring people closer to their friends." The research, she said, shows it is "exposing young people to risk and, in many cases, negatively affecting their quality of life." This is what institutional understatement sounds like when the data is screaming.

There's an uncomfortable parallel with how the AI industry is repeating social media's mistakes — the same pattern of externalised harm and internalised profit, the same rehearsed contrition at hearings, the same gap between stated commitments and actual behaviour. The platforms knew what they were doing to adolescents. Internal documents confirmed it. Nothing changed because engagement metrics drove revenue, and revenue was the only number that mattered in the boardroom.

Forty-three percent of the respondents started using social media before the age of thirteen — the legal minimum. Not because their parents approved, but because the platforms made it trivially easy to lie about your age. Then those same platforms sold advertising against the attention of children who shouldn't have been there in the first place.

The generation that was supposed to be "digital natives" — fluent, empowered, connected — is telling us they'd trade it all for something quieter. We should probably listen.

Sources:

Virgin Records Press Call, March 1990

Propaganda lined up for Virgin in March 1990 with four faces calibrated for the dark — the ruffled blouse at center doing more work than any press stylist should have to admit.

Deep Think Crosses the Human Line

Google upgraded Gemini 3 Deep Think yesterday, and the number that matters is 84.6%. That's its score on ARC-AGI-2, the abstract reasoning benchmark designed to resist brute-force pattern matching. Humans average around 60%. Claude Opus 4.6 — which landed last week to genuine excitement — scores 68.8%. GPT-5.2 manages 52.9%. Deep Think clears the human baseline by nearly 25 points and leads the next-best model by almost 16.

I'm trying to figure out what to do with that.

The Codeforces result is harder to dismiss as benchmark theatre. Deep Think hit 3,455 Elo — Legendary Grandmaster territory, better than all but seven active human programmers on the platform. No external tools. No retrieval. Just inference-time compute and whatever Google means by "parallel hypothesis exploration." The top human competitor, Benq, sits at 3,792. That gap is closing fast enough to make competitive programming feel like it has an expiration date.

What changed from the previous version: scope. Earlier iterations of Deep Think were narrowly focused on mathematics. This upgrade pushes into chemistry, physics, and engineering. Gold medals on the written portions of the International Math, Physics, and Chemistry Olympiads. A mathematician at Rutgers used it to peer-review a paper on high-energy physics structures bridging gravity and quantum mechanics. It caught a subtle logical flaw that human reviewers had missed. That's not a benchmark. That's a real research contribution, however narrow.

The architecture Google describes — they call it "Aletheia" — uses a generator, a natural language verifier, and a reviser working in concert. Parallel hypothesis exploration rather than a single reasoning chain. The interesting detail is that the system can acknowledge failure and stop rather than burning compute on dead-end paths. Most reasoning models I've used have no concept of giving up gracefully. They hallucinate forward until they hit a token limit. If Aletheia genuinely knows when it's stuck, that's a meaningful advance in how these systems manage uncertainty.

Google's approach here is fundamentally different from what Anthropic and OpenAI are doing. They're scaling inference-time compute — giving the model more time to think rather than making a bigger model. The base is still Gemini 3 Pro, not some trillion-parameter behemoth. Deep Think is a reasoning mode, not a separate model. The distinction matters because it suggests the ceiling on what you can extract from existing architectures is higher than most people assumed. You don't need a fundamentally new model. You need to let the current one actually think.

That feels right to me, intuitively. When I use extended thinking in Claude, the quality jump over instant responses is enormous — not because the model suddenly knows more, but because it has room to work through contradictions and dead ends before committing to an answer. Google is doing the same thing with significantly more compute thrown at the problem. Anthropic shows you the reasoning. Google hides it. Both approaches produce results that make the non-thinking versions look careless by comparison.

The pricing is interesting. Deep Think is included in the Google AI Ultra subscription at $249.99 per month. API access requires applying for an early programme. I keep thinking about how o3 was positioned as the reasoning breakthrough that would change everything, and then Deep Think shows up a year later scoring nearly 30 times higher on the same class of benchmark. The pace of obsolescence in this space is genuinely disorienting.

Demis Hassabis called it "new records on the most rigorous benchmarks in maths, science & reasoning." MarkTechPost ran with "Is This AGI?" — which, no. But I understand the impulse. A system that reasons better than the average human on abstract pattern recognition, codes better than 99.99% of programmers, and catches errors in peer-reviewed physics papers occupies territory that didn't exist twelve months ago.

Google DeepMind published a research impact taxonomy alongside the release, rating contributions from Level 0 to Level 4. They classify Deep Think's current output at Levels 0-2 — autonomous solutions and publishable collaborations, not landmark breakthroughs. The fact that they felt the need to temper expectations tells you something about the temperature of the conversation. When the company releasing the model is the one saying "calm down," the benchmarks have moved past what anyone's frameworks were built to accommodate.

Sources:

When Tokyo Could Buy Paris

Sacha van Dorssen shot the cover. Gail Elliott — dark hair, brown eyes, that ethnically ambiguous beauty that let her slip between markets without friction — stared back from newsstands across Tokyo in October 1988. The Nikkei was climbing toward a peak it would never reach again. Emperor Hirohito had weeks left. And Marie Claire Japan, the very first international edition of the French title, was selling something more complicated than clothes.

I didn't know any of this at the time. I was twenty, living in England, and Japan was a word attached to a band I loved and a country I'd never visited. But I knew Gail's face from London agency boards and magazine tearsheets, and something about this particular cover — the warmth of the palette, the directness of her gaze, the way the typography sat against skin tone — felt like it belonged to a world operating at a frequency I could hear but not quite tune into. That frequency was money, obviously. But it was also confidence. The confidence of a culture that believed it could purchase not just luxury goods but the entire idea of European sophistication and make it its own.

Inside, Yasmin Le Bon wandered Paris in an editorial called "I Love Paris," photographed by Naoki. A Japanese photographer shooting a half-Iranian, half-English model on the streets of Saint-Germain for a Japanese audience. The Bubble Economy distilled into a single editorial concept — the possession of Paris itself. Peter Lindbergh contributed pages. Steve Hiett brought his oversaturated flash. Kirsten Owen, androgynous and sharp, offered the anti-glamour counterweight. Juliette Binoche got an interview off the back of The Unbearable Lightness of Being. Romeo Gigli got a special feature, his soft Renaissance shoulders already dismantling the power suit from the inside.

This was the magazine for a specific woman. Not the Hanako girl buying Louis Vuitton at Isetan — her older sister, the one who wanted to know why she was buying it. Marie Claire monetised cultural capital in an era when financial capital was everywhere. Leos Carax and Terence Trent D'Arby in the same issue as Alaïa runway coverage. The magazine functioned as a passport, not a catalogue.

What gets me now — what I can't shake — is how completely that world has sealed itself off. Not just the Bubble Economy or the specific editorial budgets or the particular alignment of photographers and models and stylists who made this issue possible. All of that is gone, obviously. But the thing underneath it is gone too. The assumption that a magazine could be simultaneously mass-market and intellectually serious, that a fashion editorial could carry philosophical weight without anyone feeling the need to announce it, that a cover photograph could function as both commerce and art and nobody had to choose. That entire mode of cultural production evaporated, and it didn't leave forwarding instructions.

I catch myself doing the maths sometimes. Thirty-seven years. The woman who bought this at Kinokuniya in Shibuya on a Thursday evening in October 1988 would be in her sixties now, if she's still alive. The evening light on Meiji-dori would have been the same amber it always is in autumn, the ginkgo trees just starting to turn. She would have carried the magazine in a bag from somewhere expensive — not ostentatiously so, just well-made in the way things were before fast fashion trained everyone to accept disposability. I can see her clearly. I can feel the weight of the magazine in my own hands. And none of it is real. None of it happened to me. I'm grieving a moment I wasn't present for, in a city I wouldn't visit for another decade, and the grief is real even if the memory isn't.

That's the specific cruelty of this kind of nostalgia. It doesn't require your own experience. It feeds on atmosphere — on the light in a photograph, the typeface on a masthead, the particular grain of a printing process that no longer exists. The past doesn't need you to have been there. It just needs you to understand what was possible, and then to notice that it isn't anymore.

Fourteen months after this issue, the stock market crashed and budgets like these evaporated. The location shoots dried up. The photographers scattered into advertising or retreated into personal projects. The models moved to different markets. Marie Claire Japan continued, of course — magazines don't die the way people do, they just become thinner versions of themselves until someone finally switches off the light — but the specific alchemy of this issue, this moment, this convergence of talent and money and cultural ambition, was finished.

I keep returning to this cover because it captures the apex so precisely — the last autumn when taste and money occupied the same room without anyone noticing the ceiling was about to fall. And because of the gaze. Gail stares straight out of that cover with an expression that hasn't changed in thirty-seven years. Everyone around her — the editors, the advertisers, the readers, the economy that paid for all of it — moved on or collapsed or died. She's still there, looking directly at whoever picks it up, as if the photograph doesn't know what year it is. That's the unnerving thing about a great cover shot. It stares across time without ageing, without context, without any awareness that the world it was made for no longer exists. Her eyes don't know the Bubble burst. They don't know the magazine got thinner. They don't know that the woman who bought this copy at Kinokuniya is sixty-three now and probably hasn't thought about it in decades. They don't know that the model staring out across the decades will soon turn sixty.

And looking back at her reminds me that time doesn't negotiate. It doesn't care what you built or how beautiful it was. It moves forward, and everything it leaves behind becomes unreachable — not gradually, not mercifully, but completely, like a door closing in a room you didn't know you'd never enter again.

The Cover That Outlived the Bubble

Swimwear Thinking Made City-Appropriate

Sun Studios sat on the sixth floor of 628 Broadway, between Bleecker and Houston. On the afternoon of November 5th, 1992, Liza Bruce showed her Spring/Summer 1993 ready-to-wear collection there — a small presentation in a SoHo loft space with draped white backdrops and scattered petals along the runway floor. The petals were the only decorative gesture. Everything else was restraint.

Bruce had built her reputation on swimwear. Lycra bodysuits, minimal seaming, the kind of stretch engineering where the fabric does the structural work instead of the pattern cutting. By 1993 she was translating that logic directly into daywear, and the results looked like nothing else on the New York calendar that season. Second-skin turtlenecks in white ribbed jersey. Ankle-length wrap skirts in warm stone that opened at the front to reveal a lighter underlayer beneath. Column slip dresses with spaghetti straps that owed more to lingerie than to anything you'd normally see at 4 p.m. on a Thursday in Manhattan.

The silhouette logic across both key looks was what the research calls "column plus interruption" — long, close lines broken by a single slit, overlap, or strap. The body organised everything. There was no print, no hardware, no contrast piping. The garment's entire argument was that fit, line, and fabric behaviour were sufficient. The styling reinforced this completely: hair worn long and straight, negligible jewellery, neutral shoes. Nothing competed with the silhouette.

Bruce was stocked at Harvey Nichols in London and Barneys in New York. She wasn't obscure. Yet the collection that people remember — the sheer slip Kate Moss wore, now sitting in the V&A's underwear exhibition — came a season later, for S/S 1994. What the Spring 1993 show demonstrates is that the grammar was already there. The columnar slips, the engineered wraps, the "underwear-as-outerwear" proposition in a quiet, minimalist register. The Moss moment didn't emerge from nowhere. It intensified something Bruce had been building toward on a petal-strewn runway in SoHo, six months earlier.

That second-skin turtleneck, worn without apology

The Thousand-Token Gambit

OpenAI shipped Codex-Spark yesterday — a smaller GPT-5.3-Codex distilled for raw speed, running on Cerebras Wafer Scale Engine 3 hardware at over a thousand tokens per second. Four weeks from a $10 billion partnership announcement to a shipping product. 128k context, text-only, ChatGPT Pro research preview.

The pitch is flow state — edits so fast the latency disappears and you stay in the loop instead of watching a spinner. Anthropic is chasing the same thing with Opus fast mode. Everybody is.

I wrote about speed becoming the only moat last month. Codex-Spark is that thesis made silicon.

Sources:

Why the Seventies and Eighties Feel Like a Threat

Sodium streetlights. That's where it starts for me. Not the event or the era but the colour — that flat, amber wash that turned every pavement into something theatrical and slightly wrong. Modern LEDs render the world in full spectrum. Sodium vapour didn't. It collapsed everything into two tones and left your brain to fill in the rest. The brain, filling in, often chose unease.

Something about the 1970s and 1980s registers as faintly sinister when viewed from here, and the reaction is common enough to suggest it isn't just personal. The textures of that period — film grain, tape hiss, CRT scanlines, the particular softness of analogue video — carry an ambiguity that digital media has largely eliminated. Modern footage is sharp, bright, and hyper-legible. Older material contains noise and shadow. The brain interprets visual ambiguity as uncertainty, and uncertainty triggers a low-grade vigilance that can settle in the body as discomfort. You're not scared, exactly. You're watchful.

Pacing compounds this. Television idents, public information films, educational broadcasts — they moved slowly and left gaps. Long pauses. Static framing. Sparse dialogue. Minimal scoring or none at all. Contemporary media fills nearly every second with motion or sound because dead air is considered a failure. Confronted with silence, the viewer projects meaning into the space, and what gets projected is rarely cheerful. The Protect and Survive films weren't trying to be frightening. Their flat, institutional delivery made them more disturbing than any horror film could manage.

That institutional tone is part of it too. Public messaging in Britain during this period was formal, impersonal, authoritative. It lacked the conversational warmth that modern branding considers mandatory. Government broadcasts addressed you like a patient being told to stay calm — which, if you weren't already anxious, was a reliable way to make you start. The emotional distance reads as cold now. Cold enough to feel ominous.

The broader atmosphere didn't help. Cold War nuclear anxiety sat underneath everything like a bass frequency you couldn't quite identify but could feel in your teeth. Industrial decline, unemployment, urban decay, terrorism coverage, moral panics — even if none of this was consciously processed at the time, it shaped the cultural mood. And cinema absorbed it. Halloween, The Exorcist, Videodrome, Threads — these films used suburban quiet, analogue distortion, and institutional spaces to generate dread. Their visual language has become fused with how we perceive the entire era. I can't look at a 1970s kitchen without half-expecting something terrible to happen in it.

There's also the matter of memory without metadata. Pre-internet life left fewer searchable records. Fewer photographs, no social media archives, no instant documentation. Memories from that period feel less indexed and more dreamlike as a result. Dreamlike states carry an uncanny quality almost by default — the sense that something is present but not quite accountable. Mark Fisher called this hauntology: the persistence of lost futures, visions imagined in the past that never materialised. Media from the period can feel like a signal from an abandoned timeline. That dislocation sits somewhere between nostalgia and dread, and I'm not convinced the two are as different as we'd like them to be.

I keep a folder of screenshots from 1970s Open University broadcasts. I'm not sure why. The lecturers stand in front of beige walls explaining thermodynamics in voices that sound like they're narrating the end of the world. Nothing about the content is threatening. Everything about the atmosphere is.

Sources:

Three Folders and a Paywall

I've been using Dub.co for short link management on this blog — it handles the uneasy.in links that appear on every post. It's a well-made product. The dashboard is clean, the API is solid, and the analytics are genuinely useful. So when they announced folders last year as a way to organise links, I thought: great, I'll set up a proper structure. Blog links in one folder, project links in another, maybe a third for experiments.

Then I hit the limit. Three folders. That's it on the Pro plan at $25 a month.

Three folders is not an organisational system — it's a tease. It's the SaaS equivalent of giving someone a filing cabinet with three drawers welded shut and a price tag dangling from the fourth. If you want twenty folders, you'll need the Business plan at $75 a month. Fifty folders? That's $250 a month on Advanced. The jump from three to twenty costs you an extra $600 a year, and the primary thing you're paying for is the right to put links into named groups.

What makes this especially irritating is how clearly it's designed as a conversion lever rather than a technical constraint. Folders are metadata. They're a label on a row in a database. There's no compute cost, no storage overhead, no bandwidth implication. The limit exists purely to create friction — to let you taste the feature just enough that you'll upgrade when you inevitably need a fourth folder. It's the same pattern I've seen with other services: let you in at a reasonable price, then gate the basics behind a tier that costs three times as much.

The broader context makes it worse. Pro also caps you at 25 tags and 1,000 new links per month. Business unlocks unlimited tags — which tells you exactly how much those tag and folder limits cost Dub to enforce: nothing. They're artificial scarcity on a digital product, and the pricing tiers are structured so that basic organisational hygiene requires a plan designed for teams of ten.

I ended up cancelling the whole folder exercise. Not because I couldn't afford the upgrade, but because I don't want to reward a pricing model that treats elementary features as premium upsells. I'll keep using Dub for what it does well — the API, the custom domain, the click tracking — but the folders will stay empty. Three is too few to be useful and too many to ignore.

This is the quiet frustration of modern SaaS. The product is good. The engineering is thoughtful. And then someone in product or finance decides that the difference between $25 and $75 should be the ability to put things in named groups. It's a small thing, but small things accumulate. Every service you use has its own version of this — the feature that's obviously trivial to provide but sits just above your current tier, daring you to pay more for what should have been included from the start.

I remain a paying customer. But I'm filing this complaint in one of my three allotted folders.