Gemini 3.1 Pro and the 0.1 That Matters

Google's first ".1" increment landed today. Previous Gemini updates jumped by 0.5 — this one is smaller on paper and larger in practice. Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, up from 3 Pro's 38%, and leads 13 of 16 benchmarks tested against Claude Opus 4.6 and GPT-5.2. GPQA Diamond: 94.3%. SWE-Bench Verified: 80.6%. The reasoning gains from Deep Think have clearly trickled down into the base model.

What stands out isn't any single number — it's the velocity. Gemini 3 Pro shipped in November. Four months later, the replacement doubles its reasoning score. Claude Sonnet 4.6 still edges it on a couple of agentic tasks, and GPT-5.3 Codex holds ground on Terminal-Bench, but the overall picture is Google pulling ahead on the metrics that get cited most often.

Available now in the Gemini app, AI Studio, and Vertex AI. Preview only, for the moment.

Sources:

Gemini 3.1 Pro Announcement — Google Blog
Google Announces Gemini 3.1 Pro — 9to5Google
Gemini 3.1 Pro Benchmarks — OfficeChai

Plutonic Rainbows

Gemini 3.1 Pro and the 0.1 That Matters

Related Entries