MiniMax released M2.5 this week, and the number I keep returning to isn't the benchmark score — it's the price. One dollar per hour of continuous generation at 100 tokens per second. That's the Lightning variant. The standard version halves even that.

The benchmarks are strong enough to make the pricing genuinely strange. SWE-Bench Verified: 80.2%, which puts it within 0.6 points of Claude Opus 4.6 at 80.8%. On Multi-SWE-Bench — the multilingual coding benchmark — M2.5 actually leads at 51.3% versus Opus's 50.3%. Tool calling scores 76.8%, which is 13 points clear of Opus. These aren't cherry-picked metrics from a press release. OpenHands ran independent evaluations and confirmed the numbers hold up.

The architecture is Mixture of Experts — 230 billion parameters total, 10 billion active per token. That's how you get frontier performance at commodity pricing. MiniMax trained it using what they call Forge, a reinforcement learning framework running across 200,000 simulated real-world environments. Their custom RL algorithm — CISPO — claims a 40x speedup over standard approaches. Whether that number survives independent scrutiny, I don't know. However, the outputs speak for themselves.

The weights are fully open on HuggingFace. You can download M2.5 right now and run it locally. This is the part that matters more than any single benchmark. When DeepSeek dropped R1 as open source thirteen months ago, it triggered genuine panic in Silicon Valley. MiniMax is doing the same thing but with a model that competes at the very top of the leaderboard, not just near it.

MiniMax itself is an interesting company to watch. They IPO'd in Hong Kong in January, raising HK$4.8 billion. The stock has more than tripled since. Over 70% of their revenue comes from overseas markets — primarily through Talkie, their companion app, and Hailuo, their video generation tool. CEO Yan Junjie recently met with Premier Li Qiang. This isn't a scrappy lab operating out of a garage. It's a well-funded operation with state-level attention.

What I find myself thinking about: the cost differential is now 10-20x between M2.5 and the American frontier models. Opus 4.6 charges $5/$25 per million tokens. M2.5 charges $0.15/$1.20. At those margins, the question isn't whether open-weight models are good enough. It's whether the closed models can justify the premium when the gap is this thin.

Sources: