Do I have to do anything to use Opus 4.8 in Claude Code?

No. Claude Code picks up the latest Opus automatically. `claude --version` to confirm. The model ID is `claude-opus-4-8`.

Is fast mode safe for production agentic work?

For coding edits, refactors, code review, and most agentic tasks — yes. The quality delta against standard mode is small for those workloads. For deep debugging that requires backtracking or multi-step planning, standard mode (or `xhigh` effort) is still the safer choice. A/B test on your own workflows before standardizing.

Will Dynamic Workflows blow up my MCP server bills?

Possibly. Dynamic Workflows fan out into parallel subagents, each making its own tool calls. If your MCPs charge per call (paid APIs, third-party SaaS), bill spikes are realistic. Test on a non-critical migration first; consider rate-limiting at the MCP layer; revisit your cost dashboards within the first week.

What does 'most honest model yet' mean for hallucinations?

Anthropic reports fewer unsupported claims and more frequent 'I'm not sure' surfacing. Independent early testers see the same. Expect cleaner failure modes — when an MCP call fails or Claude isn't confident, 4.8 is more likely to say so explicitly instead of confabulating.

Is Opus 4.8 better than GPT-5.5 or Gemini 3.1 Pro?

On most agentic-coding benchmarks Anthropic publishes — yes. The exception is agentic terminal coding, where OpenAI is still ahead. As always: benchmarks are directional. Test on your own workload.

What's the model ID for the API?

`claude-opus-4-8`. Pricing is unchanged from 4.7. The 1M-context variant is available as `claude-opus-4-8[1m]` on supported tiers.

Claude Opus 4.8 Is Live: Dynamic Workflows, Effort Control, 3× Cheaper Fast Mode — What Changes for Claude Code Users

TL;DR

Model: claude-opus-4-8. Drop-in replacement for claude-opus-4-7.
Pricing: Unchanged — $5/$25 per 1M input/output (standard), $10/$50 fast mode at 2.5× speed.
Best at: Long-horizon agentic coding (SWE-bench Pro 69.2%), math (USAMO 96.7%), 1M-token long-context recall (68.1% GraphWalks F1).
New agentic primitive: Dynamic Workflows — Claude plans, spawns parallel subagents in a single Claude Code session, verifies, then reports.
New control: Explicit effort parameter (low | medium | high | xhigh | max) — choose your token-vs-quality tradeoff per call.
Honesty axis: Anthropic claims Opus 4.8 is its 'most honest' model — more likely to flag uncertainty, less likely to make unsupported claims. Independent benchmarks support this.

What the numbers actually say

Benchmark	Opus 4.7	Opus 4.8	Delta
SWE-bench Pro (agentic coding)	64.3%	69.2%	+4.9pt
SWE-bench Verified	87.6%	88.6%	+1.0pt
USAMO 2026 (math)	69.3%	96.7%	+27.4pt
GraphWalks F1 @ 1M tokens	40.3%	68.1%	+27.8pt
Multidisciplinary reasoning + tools	54.7%	57.9%	+3.2pt
Agentic computer use	82.8%	83.4%	+0.6pt
Knowledge work (composite)	1753	1890	+137

The two outliers — USAMO and GraphWalks — are not coincidences. USAMO reflects deep-reasoning training payoffs. GraphWalks measures how well a model uses its 1M-token context to do *real* multi-hop retrieval, not just say 'the answer is in the haystack somewhere'. Opus 4.8 jumped 28 points there. For codebase-scale work where the agent has to keep dozens of files coherent, this matters more than the SWE-bench number.

Anthropic positions Opus 4.8 ahead of GPT-5.5 and Gemini 3.1 Pro on most agentic-coding metrics, with the exception of agentic terminal coding where OpenAI's model is still ahead.

Dynamic Workflows: the headline agentic feature

Dynamic Workflows (currently in research preview, Enterprise/Team/Max plans) lets Claude Code take on tasks that previously didn't fit in a single session. The pattern:

Plan. Claude reads the goal, the repo, and produces a structured plan of work units.
Spawn. It launches hundreds of parallel subagent sessions, each with its own scoped task and tools.
Verify. Outputs come back; Claude runs the existing test suite (or whatever bar you set) as the acceptance criterion.
Report. Only after verification does Claude consolidate and present the result.

Anthropic's stated benchmark: codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, with the test suite as the validation gate. For monorepo upgrades — TypeScript 4→5, React 18→19, deprecated-API replacements — this is the first agent primitive that maps directly onto a problem teams actually have.

The MCP angle: Dynamic Workflows multiplies how many MCP tool calls happen per session — by a lot. If you run paid MCP servers (think Stripe MCP, Sentry MCP, anything with rate limits), audit your billing before enabling Dynamic Workflows on production tasks.

Effort control: stop overspending on easy tasks

Before 4.8, you bought a fixed effort budget per call — and that budget was high. For 'rename this variable' or 'translate this sentence', you were paying for max-effort reasoning the task didn't need.

Opus 4.8 exposes effort as an explicit parameter. The Claude Code menu has low | medium | high (default) | xhigh | max. The API takes the same values. The model defaults to high, which Anthropic says is its best balance — but for mechanical tasks dropping to low cuts token spend dramatically with no measurable quality loss.

low: mechanical edits, small refactors, simple Q&A. Roughly 1/3 the token spend of high.
medium: standard coding tasks, debugging, mid-size refactors.
high *(default)*: most of what you do today. Don't overthink this.
xhigh: architecture decisions, hard debugging, multi-file changes that need to all hang together.
max: research problems, math, the kind of thing you'd run overnight.

Users on claude.ai get this too — a slider in the UI rather than an API parameter. Same idea.

Fast mode: 2.5× speed, 3× cheaper than before

Fast mode pricing stays at $10 / $50 per 1M tokens — but the *throughput* is 2.5× the standard mode, and Anthropic claims it's 3× cheaper than fast mode was on prior Opus generations. For interactive Claude Code use, this is the right default. Reach for standard or xhigh when you actually need the depth.

One caveat: fast mode trades latency for slightly shallower planning. For mechanical Claude Code work it doesn't matter. For multi-step debugging where the agent needs to backtrack, standard mode is still measurably better.

'Most honest model yet': what that actually means

Anthropic's framing for Opus 4.8 is sharper judgment + better self-reporting. Concretely:

Higher rate of flagging 'I'm not sure' when it actually isn't sure — instead of confidently inventing.
Better at saying 'I haven't completed X yet' mid-task — instead of declaring success and letting you discover later.
Less prone to confabulating tool outputs when an MCP call fails — it surfaces the failure rather than hallucinating a plausible result.

The third bullet is the one that matters most for MCP-heavy workflows. Pre-4.8, a flaky MCP server could result in plausibly-wrong agent output. 4.8 is more likely to halt and surface the underlying error. Expect cleaner failure modes when an integration is having a bad day.

Sleeper feature: update instructions mid-task without breaking the cache

Pre-4.8, changing the system prompt or developer instructions invalidated the prompt cache — which is the main reason 5+ minute Claude Code sessions get expensive. 4.8 lets you update Claude's instructions mid-task and keeps the cache warm. If you build agentic harnesses with a 'refine the goal as you go' pattern, your bill just dropped.

What Claude Code users should do today

Update Claude Code (claude --version to check). The CLI picks up claude-opus-4-8 automatically if you have access.
Try fast mode (/fast in the Claude Code prompt) for your normal coding work for a day. See if you notice a quality drop. Most won't.
Set --effort low for one-off scripts and mechanical edits. Bank the savings.
Test Dynamic Workflows on a non-critical migration if you're on Enterprise/Team/Max. Pick something with good test coverage; the test suite IS the safety net.
Re-run any agent harness benchmarks you maintain. Effort control changes the meaning of your historical metrics; rebaseline now.

Implications for the MCP ecosystem

More tool calls per session. Dynamic Workflows + better long-context recall means agents reach for more MCP tools, more often. Server authors should review their rate-limiting and connection-pool behavior.
Cleaner failure surfaces. The honesty improvements mean flaky MCP servers will get *visibly* flaky — Claude won't paper over them. If you author a server, your error messages now reach the user more reliably. Make them good.
Bigger tool catalogs are now viable. With 1M-token long-context recall jumping to 68%, fleets of 40+ MCP servers loaded simultaneously stop being a context-overflow concern. Pair this with Claude Code's tool search and lazy loading and you can finally run your real toolbelt without thinking about it.
Pin your MCP versions anyway. Opus 4.8 doesn't change the supply-chain attack surface. See our Nx Console and codexui-android post-mortems.

Bottom line

Opus 4.8 is a quiet release with two loud features. The numbers are good — but the practical wins for daily Claude Code use are (1) fast mode being the right default at the new price, (2) explicit effort control letting you stop overspending on easy tasks, and (3) Dynamic Workflows finally making codebase-scale migrations a single-session ask. If you're an MCP power user, the long-context recall jump is the under-discussed one — load more servers, trust the recall more.

TL;DR

What the numbers actually say

Dynamic Workflows: the headline agentic feature

Effort control: stop overspending on easy tasks

Fast mode: 2.5× speed, 3× cheaper than before

'Most honest model yet': what that actually means

Sleeper feature: update instructions mid-task without breaking the cache

What Claude Code users should do today

Implications for the MCP ecosystem

Bottom line

FAQ