ComparisonMay 28, 20256 min read

Claude 4 vs GPT-5: Our First Look at the Next Generation

Both Anthropic and OpenAI shipped major model upgrades in the same week. We ran both through our full prompt suite to find out which one came out ahead.

In the span of a single week in late May 2025, both Anthropic and OpenAI shipped what they described as generational leaps. Claude 4 arrived first, on a Tuesday morning, with a quietly confident blog post and a set of internal benchmarks that drew immediate skepticism. GPT-5 followed three days later, louder and with a live stream. We spent the better part of two weeks running both models through the same prompt suite we use for the Rankly leaderboard. Here is what we found.

Writing Quality

This is the category where Claude 4 pulls ahead most clearly. Anthropic has always prioritized long-form coherence, and the improvement from Claude 3.5 Sonnet to Claude 4 is immediately noticeable. Articles, reports, and nuanced explanations feel like they were written by a careful editor rather than assembled from likely-next-token predictions. Tone stays consistent across thousands of words. Arguments develop instead of repeating.

GPT-5 is not bad at writing. It is, in fact, better than GPT-4o. But it still exhibits a tendency toward a certain kind of polished genericness, particularly in business and professional writing tasks. When asked to write a 1,500-word feature on a technical topic, GPT-5 often produces something that reads competently but never quite surprises. Claude 4 surprises more often, and in a good way.

Coding Abilities

GPT-5 edges ahead in coding tasks, though the gap is narrower than OpenAI's benchmarks would suggest. On our internal test suite, which covers Python, TypeScript, SQL, and systems code in C, GPT-5 solved a higher percentage of hard problems correctly on first pass. The difference was most pronounced in multi-file refactoring and in tasks that require understanding implicit constraints not stated in the prompt.

Claude 4 is an excellent coding model. Its explanations of what it is doing are clearer, and it is more likely to flag potential issues rather than silently work around them. For pair-programming use cases, many developers will prefer it. For raw completion accuracy on competitive-style problems, GPT-5 currently has the edge.

Speed and Response Time

GPT-5 is meaningfully faster. Median time-to-first-token in our tests was around 800ms for GPT-5 versus 1.2 seconds for Claude 4, and GPT-5 sustained higher token throughput on long completions. This matters for applications where latency is part of the user experience, and it matters for agentic workflows where a model makes dozens of calls in sequence.

Anthropic has typically been slower to optimize inference after a model launch, then improved substantially over the following months. Claude 3.5 Sonnet, for example, got noticeably faster between its June and October 2024 versions. Expect a similar trajectory here.

Pricing

Both models launched at nearly identical prices: $15 per million input tokens and $75 per million output tokens for their flagship tiers. Both also offer smaller, faster variants at lower price points (Claude 4 Haiku, GPT-5 Mini) that are priced more aggressively. For most production applications, the cost difference between the two flagships is negligible at moderate volume and meaningful at very high volume, where small optimizations compound.

Verdict

There is no clean winner here, which is the most honest thing we can say. Claude 4 is the better writing model. GPT-5 is the faster, slightly stronger coding model. If your use case is content generation, document analysis, or anything that benefits from careful prose, Claude 4 is worth the try. If you are building coding tools, autonomous agents, or anything where raw speed is a constraint, GPT-5 currently has an edge.

We will update this comparison in sixty days when both models have had time to mature on the deployment side. The current Rankly scores are: Claude 4 at 94/100, GPT-5 at 92/100. Those numbers will shift.

Rankly AI editorial team

Writing Quality

Coding Abilities

Speed and Response Time

Pricing

Verdict

More from Rankly AI