Claude Opus 4.8 Is Here | It Just Topped Every Benchmark
Anthropic released Claude Opus 4.8 on May 28, 2026, flat pricing at $5/$25 per million tokens, 88.6% SWE-bench, and 4x fewer code flaws than its predecessor.
On May 28, 2026, Anthropic released Claude Opus 4.8, their fastest update cadence to date, arriving just 41 days after Opus 4.7. The release is notable not for a single headline number, but for a combination of improvements that collectively shift the frontier: stronger agentic coding, better financial analysis, improved computer-use accuracy, and a new honesty posture that makes the model meaningfully more trustworthy in production.
Across key benchmarks, Opus 4.8 tops both OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro — the two other frontier models it competes with most directly. On SWE-bench Verified, Opus 4.8 scores 88.6%, a meaningful jump over its predecessor and ahead of the current published scores for both competitors.
What Is New in Opus 4.8
The most discussed improvement is in agentic workflows, long, multi-step tasks where a model must plan, execute, verify, and recover from errors across many tool calls. Anthropic's internal evals show Opus 4.8 completing significantly more complex agent tasks end-to-end compared to 4.7, with fewer abandoned steps and better error recovery.
Financial analysis is another area that Anthropic called out specifically: Opus 4.8 handles multi-document financial reasoning (earnings analysis, comparative modeling, regulatory filings) with notably higher accuracy than previous versions. For teams using Claude in finance or legal research contexts, this is a practical improvement.
Computer use, Anthropic's capability for models to interact with desktop interfaces, also improved. The model is faster to identify clickable elements and less likely to take unnecessary intermediate steps. These gains are incremental, not transformative, but they compound across long tasks.
Pricing: Flat Rates and a New Fast Mode
The headline commercial decision in this release is that Anthropic kept standard pricing flat at $5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.7. Given the capability improvements, this is effectively a price reduction on a per-capability basis.
More significantly, Anthropic introduced a Fast mode for Opus 4.8, priced at approximately one third the cost of the standard 4.7 equivalent. Fast mode trades some output quality for substantially lower latency and cost, a useful option for high-volume agentic workflows where speed matters more than marginal quality gains.
Honesty and Code Reliability
The most striking claim in the release is around code honesty: Anthropic reports that Opus 4.8 is roughly four times less likely than Opus 4.7 to let flaws in its own code pass without remarking on them. In practice, this means the model is more likely to proactively flag bugs, edge cases, or suboptimal choices in code it writes, rather than presenting a clean solution that contains hidden issues.
This matters a lot for coding agents. A model that silently produces broken code is worse than useless in agentic settings, it wastes compute and produces hard-to-debug failures. A model that flags its own limitations is a safer collaborator, even if it occasionally produces more verbose responses.
Impact on the Rankly Leaderboard
On the Rankly RQ leaderboard, Claude's score is expected to update within 24 hours of release to reflect the Opus 4.8 improvements. The primary blocs that will move are B1 (Intelligence Generalis, driven by benchmark results) and B3 (Quality / Fiabilitas, driven by the hallucination rate reduction and code reliability improvements).
The f(A) Accessibilitas score, which tracks pricing, latency, and availability, should remain stable or improve slightly: pricing held flat while Fast mode added a lower-cost option, and latency in Fast mode is meaningfully better than 4.7 standard. The net effect is a higher RQ score across all dimensions.
Verdict
Claude Opus 4.8 is the strongest general-purpose AI model available as of its release date. The combination of flat pricing, meaningfully improved agentic coding, and a substantially more honest posture on code quality makes it the default recommendation for any team building on top of frontier AI.
GPT-5.5 remains competitive on speed and has a larger ecosystem of integrations. Gemini 3.1 Pro keeps its edge in Google Workspace contexts and multimodal tasks that involve Google-native data sources. But on pure reasoning, coding reliability, and production safety, Opus 4.8 takes the lead, and the pricing decision makes that lead unusually accessible.
Rankly AI editorial team
More news