Anthropic dropped Claude Opus 4.7 this week, and it's the kind of release that reads as both a real step up and a strategic placeholder. The company calls it a meaningful upgrade to its flagship AI model with better coding, sharper vision and a new ability to double-check its own work. But the more interesting subtext is what isn't shipping: Mythos Preview, the more capable internal model Anthropic is holding back for cybersecurity reasons.

I've been running Opus 4.6 daily for agent work, so 4.7 is the upgrade I actually care about this quarter. The short version: it's faster at equivalent quality, and it finally stops hallucinating fallbacks when data is missing.

What's Actually New in Opus 4.7

Anthropic says Claude Opus 4.7 is better at software engineering, following instructions, completing real-world work and is its most powerful generally available model. The release lands on a predictable rhythm. Opus 4.7 arrives two months after Opus 4.6, which arrived two months after Opus 4.5.

The headline capability gains sit in agentic workflows. Anthropic said the new model outperforms Claude Opus 4.6 across many use cases, including industry benchmarks for agentic coding, multidisciplinary reasoning, scaled tool use and agentic computer use. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.

Memory also gets a practical upgrade. Anthropic says "Opus 4.7 is better at using file system-based memory," the company says. "It remembers important notes across long, multi-session work, and uses them to move on to new tasks that, as a result, need less up-front context."

The xhigh Effort Tier and Task Budgets

One of the more useful knobs in this release is a new reasoning level. "Opus 4.7 introduces a new xhigh ('extra high') effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems," Anthropic said. Internal data shows that while max effort yields the highest scores (approaching 75% on coding tasks), the xhigh setting provides a compelling sweet spot between performance and token expenditure.

Alongside that, Anthropic is addressing the real operational pain of agentic runs. The Claude API is introducing "task budgets" in public beta. This allows developers to set a hard ceiling on token spend for autonomous agents, ensuring that a long-running debugging session doesn't result in an unexpected bill.

Two things to plan for on migration. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens, roughly 1.0 to 1.35x depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens.

Benchmarks and the Mythos Shadow

This is where it gets interesting, because Anthropic is being unusually frank about where 4.7 sits in the pecking order. In a chart accompanying its announcement, Anthropic showed that Opus 4.7 beats Opus 4.6, ChatGPT 5.4, Google Gemini 3.1 Pro in a number of key benchmarks. But Opus 4.7 still falls short of its Mythos Preview model, which has only been released to a handpicked group of tech and cybersecurity companies.

The lead isn't dominant either. On directly comparable benchmarks, Opus 4.7 only leads GPT-5.4 by 7-4. Competitors like GPT-5.4 and Gemini 3.1 Pro still hold the lead in specific domains such as agentic search, where GPT-5.4 scores 89.3% compared to Opus 4.7's 79.3%, as well as in multilingual Q&A and raw terminal-based coding.

Where 4.7 actually shines is the stuff enterprises care about. On Anthropic's 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Analytics vendor Hex had a sharper take, saying it correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for. It's a more intelligent, more efficient Opus 4.6: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.

The Cybersecurity Angle and Project Glasswing

The release is also a test vehicle for Anthropic's safety apparatus. "We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses," Anthropic said. "What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models."

There's a measurable cost to those guardrails. Claude Opus 4.7 represents a slight backsliding compared to Claude Opus 4.6 in cybersecurity vulnerability reproduction. The new model scored 73.1% in benchmarking tests, compared to the previous iteration scoring 73.8%. Security professionals can access the model for legitimate cybersecurity work through Anthropic's new Cyber Verification Program.

Pricing and Availability

No surprises here. Pricing for Opus 4.7 starts at $5 per million input tokens and $25 per million output tokens, with up to 90% cost savings with prompt caching and 50% savings with batch processing. Claude Opus 4.7 is available across all of Anthropic's Claude products, its application programming interface and through cloud providers Microsoft, Google and Amazon.

Developer tooling gets upgraded in parallel. Within the Claude Code environment, the update brings a new /ultrareview command. Unlike standard code reviews that look for syntax errors, /ultrareview is designed to simulate a senior human reviewer, flagging subtle design flaws and logic gaps. GitHub is moving fast too. Over the coming weeks, Opus 4.7 will replace Opus 4.5 and Opus 4.6 in the model picker for Copilot Pro+.

Final Thoughts

The most useful thing in 4.7 isn't the benchmark delta against GPT-5.4. It's that the low-effort tier now matches what you used to pay medium effort for, and the xhigh setting gives you a real middle ground when max is overkill. For anyone running Claude inside agent loops, that changes the economics more than any leaderboard ranking.

What I'll watch next is how honest Anthropic stays about the Mythos comparison. Releasing a flagship model and openly telling your customers there's a better one you won't ship is an unusual move. It either becomes a credibility flywheel or starts feeling like a tease, depending on how long Project Glasswing keeps the good stuff gated.

What's your take on the xhigh tier and task budgets? Drop your thoughts in the comments.

FAQ

Is Claude Opus 4.7 more expensive than 4.6?

No. Pricing stayed at $5 per million input tokens and $25 per million output tokens, though the updated tokenizer can map the same input to 1.0 to 1.35x more tokens.

How does Opus 4.7 compare to GPT-5.4 and Gemini 3.1 Pro?

It leads on agentic coding, scaled tool use, and financial analysis, but trails GPT-5.4 on agentic search (79.3% vs 89.3%) and on multilingual Q&A.

What is the new xhigh effort level?

A reasoning tier between high and max, designed to give developers a middle ground between deeper reasoning and lower latency or token cost.

Why is Opus 4.7 weaker at cybersecurity tasks than 4.6?

Anthropic deliberately reduced cyber capabilities during training and added automated safeguards, dropping vulnerability reproduction from 73.8% to 73.1%.

Where can I use Opus 4.7?

It's live across Claude apps, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot Pro+, Business, and Enterprise.