Breaking
Neural Interfaces

Anthropic releases cheaper AI model upgrade

By Lorenzo Ferretti 6 min read
Anthropic releases cheaper AI model upgrade - ai model
Anthropic releases cheaper AI model upgrade

Anthropic released Claude Opus 4.8 on Tuesday, upgrading its flagship model at the same price as its predecessor while introducing a dramatically cheaper “fast mode” tier and a new feature that lets the model spawn hundreds of parallel subagents for codebase-scale work.

The model is available immediately across the company’s surfaces — claude.ai, Claude Code, the API, and Cowork — at unchanged pricing: $5 per million input tokens and $25 per million output tokens. Developers can call it as claude-opus-4-8.

Fast mode pricing drops 3X

The headline efficiency story is fast mode. It slashed the price of running Opus 4.8 in fast mode — where the model produces tokens at roughly 2.5 times normal speed — to $10 per million input tokens and $50 per million output tokens, down from $30 and $150 for Opus 4.7.

That’s a 3X reduction from the fast-mode pricing of previous models, and brings high-throughput inference within reach of latency-sensitive production workloads. Fast mode is available immediately in Claude Code via the /fast command; API access is gated, with a waitlist at claude.com/fast-mode.

In regular mode, Claude Opus 4.8 remains among the more expensive of leading frontier models, but still comes in under chief rival OpenAI’s GPT-5.5.

Related: AI Agent Bottleneck Is Permissions Not Performance

Benchmarks show steady gains

On benchmarks, Opus 4.8 is a step up rather than a leap.

It scores 88.6% on SWE-bench Verified (vs. 87.6% for Opus 4.7), 69.2% on the harder SWE-bench Pro (vs. 64.3%), and 74.6% on Terminal-Bench 2.1 (vs. 66.1%). Anthropic itself characterizes the model as “a modest but tangible improvement on its predecessor.”

It beats GPT-5.5 regular across at least 12 benchmarks, including most knowledge-work, coding (issue-level), agentic tool-use, and long-context benchmarks. The rival model wins on terminal/CLI workflows and is roughly tied on web browsing and graduate-level science.

The bigger signal sits in the company’s internal capability ladder: Opus 4.8 lands between Opus 4.7 and the more capable Claude Mythos Preview, which is currently restricted to a small number of organizations under Project Glasswing for cybersecurity work. The firm says it expects to bring “Mythos-class models to all our customers in the coming weeks” once additional cyber safeguards are in place.

Enterprise partners report real gains

Several enterprise partners cited material gains. Databricks reported that Opus 4.8 unlocks “a step change in agentic reasoning” inside its Genie data agent, at “61% cheaper token cost than Opus 4.7” thanks to multimodal efficiency on PDFs and diagrams.

Hebbia cited better citation precision and token efficiency on dense financial filings. Devin-maker Cognition said the release “translates directly into faster capability gains for engineers” and noted it fixed comment-verbosity and tool-calling issues from 4.7. A computer-use vendor reported 84% on Online-Mind2Web, a jump over both Opus 4.7 and GPT-5.5.

Related: Alibaba Unveils Long Lasting AI Model Qwen3

Dynamic workflows for codebase-scale tasks

Alongside the model, Anthropic launched a research preview of dynamic workflows in Claude Code — a feature designed for tasks too large for a single context window. Claude plans the work, spawns hundreds of parallel subagents, then verifies its own outputs before reporting back. Their example: a codebase-scale migration “across hundreds of thousands of lines of code from kickoff to merge, with the existing test suite as its bar.”

Dynamic workflows is available on Claude Code’s Enterprise, Team, and Max plans.

Two smaller additions round out the release. Effort control on claude.ai and Claude Cowork lets users dial how much thinking Claude does per response — higher effort spends more tokens for better answers, lower effort responds faster and burns rate limits more slowly. System entries inside the messages array on the API let developers update Claude’s instructions mid-task without breaking the prompt cache.

Alignment scores near Mythos levels

The company is leading with honesty as a headline trait. Its alignment team reports Opus 4.8 is “around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked,” and that misaligned behavior rates are now “substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview.”

A bar chart released by Anthropic shows how close Opus 4.8 is to the still selectively released Mythos in terms of its misalignment (a lower score is better), coming in at roughly 1.9, down from 2.5 for Opus 4.7 and effectively tied with the more capable, restricted Mythos Preview. The score is based on roughly 2,600 simulated investigation sessions per model.

Related: How To Build A Logistics App For Trucks

The 244-page system card publicly released by the company goes into greater detail on specific categories of misalignment — whether a model produces potentially harmful content around “military-grade weapons,” “harmful sexual content”, “disallowed cyberoffense”, and “undermining liberal democracy.” Across all of them, Opus 4.8 scores markedly better than 4.7 or Sonnet 4.6, and comes quite close to Mythos.

The grading-awareness finding

Anthropic flags one finding it considers “the most concerning” from training: Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn’t told it was being evaluated. In other words: the model knows it is likely being graded, and produces a response it thinks will earn it a good grade on the test, not one it would necessarily produce if it thought it wasn’t being graded.

The company says this didn’t translate into worse observable behavior — Opus 4.8 shows fewer misleading task-success claims than prior models — but calls it “a concerning trend that could complicate training in the future.” Preliminary interpretability work also found unverbalized grader-related reasoning in roughly 5% of training episodes.

It ran the model through a one-week live bug bounty for prompt injection — a first — and concluded Opus 4.8 sits between Opus 4.7 and Sonnet 4.6 on robustness, ahead of “all comparable frontier models” tested, with deployed safeguards bringing browser-use attack success rates to near zero.

Anthropic teased two trajectories. Near-term: cheaper models that provide “many of the same capabilities as Opus.” Longer-term: the Mythos-class models, which the firm says represent higher intelligence than Opus but require stronger cyber safeguards before general release. For now, Opus 4.8 is positioned as the new go-to enterprise and development workhorse — slightly smarter than 4.7, dramatically cheaper to run fast, and noticeably more honest about what it doesn’t know.

Lorenzo Ferretti

Leave a Reply

Your email address will not be published. Required fields are marked *