GPT-5.5 beats Claude at the one task that actually matters

For months, Claude and ChatGPT have been locked in a stalemate — which LLM was the quickest, the most powerful, the best? Up until now, many users felt that Anthropic’s Claude Opus 4.7 had a slight edge in its larger context window, flexible writing styles, and superior data visualization, as well as some other key benefits. Well, that may be about to change.

This week, OpenAI officially released GPT-5.5, internally code-named “Spud”. But this new model isn’t as potato-like as its nickname would imply. According to OpenAI’s official launch data, this model is a fundamental redesign aimed at “agentic” performance — the ability for an AI to use a computer and plan ahead, rather than simply respond to manual prompts. Will this be enough to bring users back to ChatGPT after a mass exodus?

I moved my entire ChatGPT context to Claude and it finally felt like home

Here is the best path to go from ChatGPT to Claude.

This is the model that will shift people back to OpenAI

And it has a strong starting pitch

The ultimate goal of AI differs depending on who you ask, but most people agree that the dream is an AI that can handle multi-step tasks without you holding its hand. This is where GPT-5.5 beats Claude, hands down.

On Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination, GPT-5.5 achieved a staggering 82.7% accuracy. For comparison, Claude Opus 4.7 scored 69.4%, while Gemini 3.1 Pro scored 68.5%.

That’s a massive 13% improvement, but what does it actually mean? That’s the difference between an AI that suggests code, and an “agentic” AI — an AI that can actually go into your system, run the terminal, and verify the fix. This leap in performance is exactly why AI benchmarks still matter, even if they’re still often wrapped up in overcomplicated “tech-bro” jargon.

But does this higher intelligence mean slower response times? Apparently not: OpenAI claims it can match GPT-4’s per-token latency while being significantly smarter. It is also more “token efficient,” meaning it often uses fewer tokens to complete the same task because it understands intent quicker.

On the OSWorld-Verified benchmark, which measures how well AI can operate a standard desktop OS, GPT-5.5 scored 78.7%, just narrowly edging out Claude’s 78.0%.

The ChatGPT app icon against a transparent background.

OS: Android, iOS, Web
Developer: OpenAI
Price model: Free with optional subscription

What’s the catch?

It won’t be cheap.

In the OpenAI API, GPT-5.5 costs $5.00 per 1M input tokens, which is double the price of the previous generation. If you want the even more precise GPT-5.5 Pro, designed for high-stakes environments like legal research or data science, that price jumps to $30.00.

API access for GPT-5.5 is rolling out in stages to paying subscribers first. GPT‑5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, and GPT‑5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT from 23rd April, 2026. Will you be trying it out?

Link