Anthropic shipped Claude Opus 4.7 on Thursday, calling it its most powerful generally available model, with gains in software engineering, instruction following, and completing real-world work. The release lands two months after Opus 4.6, keeping Anthropic on a steady cadence.

The catch: it still sits below Claude Mythos Preview, the restricted frontier model Anthropic is only handing to select cybersecurity partners under Project Glasswing.

What's New in Opus 4.7

Opus 4.7 is a meaningful upgrade to Anthropic's flagship AI model with better coding, sharper vision, and a new ability to double-check its own work. Anthropic also highlights improvements to instruction following, multimodal support, real-world work, and memory, noting the model is better at using file system-based memory and remembers notes across long, multi-session work.

For Claude Code users, there's a new /ultrareview command designed to simulate a senior human reviewer, flagging subtle design flaws and logic gaps rather than surface-level syntax issues.

Benchmarks Against GPT-5.4 and Gemini 3.1 Pro

Opus 4.7 exceeds OpenAI's GPT-5.4 and Google's Gemini 3.1 Pro on key benchmarks including agentic coding, scaled tool-use, agentic computer use, and financial analysis. It's not a clean sweep though. Competitors still hold the lead in specific domains such as agentic search, where GPT-5.4 scores 89.3% compared to Opus 4.7's 79.3%, as well as in multilingual Q&A and raw terminal-based coding.

On Anthropic's own evals, Opus 4.7 lifted resolution by 13% over Opus 4.6 on a 93-task coding benchmark, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve.

The New xhigh Effort Level

Anthropic introduced a middle reasoning tier. "Opus 4.7 introduces a new xhigh ('extra high') effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems," Anthropic said.

To stop runaway token bills, the Claude API is introducing "task budgets" in public beta, allowing developers to set a hard ceiling on token spend for autonomous agents so a long-running debugging session doesn't trigger an unexpected bill.

Pricing and Availability

Pricing holds steady. Opus 4.7 starts at $5 per million input tokens and $25 per million output tokens, with up to 90% cost savings with prompt caching and 50% savings with batch processing. It's available across Claude products, the Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry.

One caveat for migrators: Opus 4.7 uses an updated tokenizer, and the same input can map to more tokens, roughly 1.0–1.35× depending on content type.

Final Thoughts

The most interesting piece here isn't the benchmark chart. It's that Anthropic is releasing Opus 4.7 with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses, treating this model as the testbed for guardrails it eventually wants on Mythos.

The /ultrareview command and task budgets feel like the right product instincts for where agentic coding is heading. If you're already on Opus 4.6, the 13% jump on hard coding tasks is worth a weekend of retesting prompts, especially given the tokenizer change.

What do you think of the Mythos strategy? Drop your thoughts in the comments.

FAQ

Is Claude Opus 4.7 more expensive than Opus 4.6?

No. Pricing is unchanged at $5 per million input tokens and $25 per million output tokens, though the new tokenizer may count the same text as slightly more tokens.

How does Opus 4.7 compare to GPT-5.4 and Gemini 3.1 Pro?

It leads on agentic coding, tool use, and financial analysis, but trails GPT-5.4 on agentic search (79.3% vs 89.3%) and on multilingual Q&A.

What is the new xhigh effort level?

A reasoning tier between high and max, giving finer control over the tradeoff between depth of reasoning and response latency.

Where can I use Claude Opus 4.7?

Across Claude apps (Pro, Max, Team, Enterprise), the Claude API, Amazon Bedrock, Google Vertex AI, Microsoft Foundry, and GitHub Copilot.

Why is Claude Mythos not publicly available?

Anthropic keeps Mythos restricted due to advanced cyber capabilities, using Opus 4.7 to field-test safeguards before any broader Mythos-class release.