Skip to content

Plutonic Rainbows

The $1-an-Hour Frontier Model

MiniMax released M2.5 this week, and the number I keep returning to isn't the benchmark score — it's the price. One dollar per hour of continuous generation at 100 tokens per second. That's the Lightning variant. The standard version halves even that.

The benchmarks are strong enough to make the pricing genuinely strange. SWE-Bench Verified: 80.2%, which puts it within 0.6 points of Claude Opus 4.6 at 80.8%. On Multi-SWE-Bench — the multilingual coding benchmark — M2.5 actually leads at 51.3% versus Opus's 50.3%. Tool calling scores 76.8%, which is 13 points clear of Opus. These aren't cherry-picked metrics from a press release. OpenHands ran independent evaluations and confirmed the numbers hold up.

The architecture is Mixture of Experts — 230 billion parameters total, 10 billion active per token. That's how you get frontier performance at commodity pricing. MiniMax trained it using what they call Forge, a reinforcement learning framework running across 200,000 simulated real-world environments. Their custom RL algorithm — CISPO — claims a 40x speedup over standard approaches. Whether that number survives independent scrutiny, I don't know. However, the outputs speak for themselves.

The weights are fully open on HuggingFace. You can download M2.5 right now and run it locally. This is the part that matters more than any single benchmark. When DeepSeek dropped R1 as open source thirteen months ago, it triggered genuine panic in Silicon Valley. MiniMax is doing the same thing but with a model that competes at the very top of the leaderboard, not just near it.

MiniMax itself is an interesting company to watch. They IPO'd in Hong Kong in January, raising HK$4.8 billion. The stock has more than tripled since. Over 70% of their revenue comes from overseas markets — primarily through Talkie, their companion app, and Hailuo, their video generation tool. CEO Yan Junjie recently met with Premier Li Qiang. This isn't a scrappy lab operating out of a garage. It's a well-funded operation with state-level attention.

What I find myself thinking about: the cost differential is now 10-20x between M2.5 and the American frontier models. Opus 4.6 charges $5/$25 per million tokens. M2.5 charges $0.15/$1.20. At those margins, the question isn't whether open-weight models are good enough. It's whether the closed models can justify the premium when the gap is this thin.

Sources:

Two Billion in Efficiency Savings and What Gets Lost

Barclays posted £9.1 billion in pre-tax profit for 2025 — up 13% — and CEO C.S. Venkatakrishnan used the results announcement to outline an AI-driven efficiency programme targeting £2 billion in gross savings by 2028. Fraud detection, client analytics, internal process automation. Fifty thousand staff getting Microsoft Copilot, doubling in early 2026. Dozens of London roles relocated to India. A £1 billion share buyback and £800 million dividend to round things off. The shareholders are happy. The spreadsheet is immaculate.

I don't doubt the savings are real. Every bank running these numbers is finding the same thing — operations roles that involve documents, repeatable steps, and defined rules are precisely where large language models excel. Wells Fargo is already budgeting for higher severance in anticipation of a smaller 2026 workforce. JPMorgan reports 6% productivity gains in AI-adopting divisions, with operations roles projected to hit 40-50%. Goldman Sachs has folded AI workflow redesign directly into its headcount planning. This isn't speculative anymore. The back offices are getting thinner.

What bothers me is the framing. "Efficiency" is doing a lot of heavy lifting in these announcements. When Barclays says it will "harness new technology to improve efficiency and build segment-leading businesses," what that means in practice is fewer people answering phones, fewer people reviewing transactions, fewer people in the building. The GenAI colleague assistant that "instantly provides colleagues with the information needed to support customers" is, by design, an argument for needing fewer colleagues. The call handling times go down. Then the headcount follows.

The banking industry's own estimates are stark. Citigroup found that 54% of financial jobs have high automation potential — more than any other sector. McKinsey projects up to 20% net cost reductions across the industry. Yet 76% of banks say they'll increase tech headcount because of agentic AI. The jobs don't disappear. They migrate — from the person who knew the process to the person who maintains the model that replaced the process. Whether that's a net positive depends entirely on which side of the migration you're standing on.

Barclays will likely hit its targets. The efficiency savings will materialise. The return on tangible equity will climb toward that 14% goal. The question nobody at the investor presentation is asking — because it isn't their question to ask — is what a bank actually is when you've automated the parts where humans used to make judgement calls about other humans. A fraud model is faster than a fraud analyst. It's also completely indifferent to context, to the phone call where someone explains they've just been scammed and needs a person, not a pipeline, to understand what happened.

Two billion pounds is a lot of understanding to optimise away.

Sources:

175,000 Open Doors

SentinelOne and Censys mapped 175,000 exposed AI hosts across 130 countries. Alibaba's Qwen2 sits on 52% of multi-model systems, paired with Meta's Llama on over 40,000 of them. Nearly half advertise tool-calling — meaning they can execute code, not just generate it. No authentication required.

While Western labs retreat behind API gates and safety reviews, Chinese open-weight models fill the vacuum on commodity hardware everywhere. The guardrails debate assumed someone controlled the deployment surface. Nobody does.

Sources: