Super App, Same Engine

OpenAI shipped GPT-5.5 on Thursday. The model is available now to paying ChatGPT and Codex users in standard, thinking, and pro flavours, with API access to follow. The pitch from Greg Brockman in the press briefing was that this one is built for work, coding, computer use, research, and that it can take an unclear problem and decide what needs to happen next without much hand-holding.

Stripped of the briefing-room varnish, the message is that ChatGPT, Codex, and the browser tooling are converging into a single product, and 5.5 is the engine that makes the convergence plausible. Brockman called it the foundation for "how we're going to do computer work going forward." That is super-app language, and it has been the open secret of OpenAI's product strategy for about a year. The model release is the part that gets the headlines; the strategy is the part that decides whether the next twelve months go well.

I am sympathetic to the ambition. The frustration of using six different AI surfaces to get one task done is real, and the seam-stitching tax adds up. A single thing that opens your browser, edits your repo, runs your tests, and writes the PR description is genuinely useful, more useful than another point on a benchmark. The hard part has never been the demo. The hard part has always been getting the model to know when to stop, when to ask, when to fail loudly rather than quietly produce something broken.

5.5 is priced at $5 per million input tokens and $30 per million output, which is GPT-5 territory and roughly an order of magnitude above V4 Pro. That is fine if the agentic capability is genuinely a step up, and a problem if it is not. Computer-use agents burn output tokens prodigiously. A single half-decent coding session can produce tens of thousands of tokens of tool calls, reasoning traces, and revisions. Multiply that by an enterprise rollout and the unit economics get scary fast, particularly when a Chinese open-weight model can run the same loop, less well, for pennies.

The other thing worth noticing is the cadence. Anthropic shipped Opus 4.7 the week before. DeepSeek previewed V4 the day after. CNET wrote it up as an "arms race," which is the laziest possible framing but, this week at least, accurate. Three frontier releases inside eight days, all pitching some flavour of agentic coding as the headline capability, all aiming at the same enterprise budget. The dispersion of "best at coding" across labs keeps narrowing. So does the differentiation.

Tom's Guide ran a 5.5 versus Opus 4.7 head-to-head and reported seven wins for Claude on seven impossible tasks. Single-evaluator shootouts are noise more than signal, but the noise is itself informative: nobody outside the labs is sure which model wins on which kind of work right now, and the customer-side answer is increasingly "whichever one we already have a contract with." That is not where OpenAI wants to be when it is asking for super-app trust.

The model probably is good. The strategy probably needs more than a model.

Sources:

OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app' — TechCrunch
AI Arms Race Accelerates With New Models from OpenAI, DeepSeek and Anthropic — CNET
7-0 wipeout: I put ChatGPT-5.5 vs Claude 4.7 through 7 impossible tests — Tom's Guide
DeepSeek's Newest Models Take on Silicon Valley at a Fraction of the Cost — Gizmodo

Plutonic Rainbows

Super App, Same Engine

Recent Entries