ClaudeAnthropicReleaseClaude Code

Claude Opus 4.8 Is Live: Dynamic Workflows, Effort Control, 3× Cheaper Fast Mode — What Changes for Claude Code Users

Anthropic shipped Claude Opus 4.8 on 2026-05-28. Headline numbers: SWE-bench Pro 69.2% (up from 64.3%), USAMO math 96.7% (up from 69.3%), GraphWalks 1M-token recall 68.1% (up from 40.3%). The bigger story is the new agentic primitives — Dynamic Workflows, explicit effort control, and a fast mode that's 2.5× faster and 3× cheaper. Here's the practical breakdown for anyone running Claude Code, Cursor, or an MCP-heavy stack.

Published 2026-05-28

TL;DR

What the numbers actually say

BenchmarkOpus 4.7Opus 4.8Delta
SWE-bench Pro (agentic coding)64.3%69.2%+4.9pt
SWE-bench Verified87.6%88.6%+1.0pt
USAMO 2026 (math)69.3%96.7%+27.4pt
GraphWalks F1 @ 1M tokens40.3%68.1%+27.8pt
Multidisciplinary reasoning + tools54.7%57.9%+3.2pt
Agentic computer use82.8%83.4%+0.6pt
Knowledge work (composite)17531890+137

The two outliers — USAMO and GraphWalks — are not coincidences. USAMO reflects deep-reasoning training payoffs. GraphWalks measures how well a model uses its 1M-token context to do *real* multi-hop retrieval, not just say 'the answer is in the haystack somewhere'. Opus 4.8 jumped 28 points there. For codebase-scale work where the agent has to keep dozens of files coherent, this matters more than the SWE-bench number.

Anthropic positions Opus 4.8 ahead of GPT-5.5 and Gemini 3.1 Pro on most agentic-coding metrics, with the exception of agentic terminal coding where OpenAI's model is still ahead.

Dynamic Workflows: the headline agentic feature

Dynamic Workflows (currently in research preview, Enterprise/Team/Max plans) lets Claude Code take on tasks that previously didn't fit in a single session. The pattern:

  1. Plan. Claude reads the goal, the repo, and produces a structured plan of work units.
  2. Spawn. It launches hundreds of parallel subagent sessions, each with its own scoped task and tools.
  3. Verify. Outputs come back; Claude runs the existing test suite (or whatever bar you set) as the acceptance criterion.
  4. Report. Only after verification does Claude consolidate and present the result.

Anthropic's stated benchmark: codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, with the test suite as the validation gate. For monorepo upgrades — TypeScript 4→5, React 18→19, deprecated-API replacements — this is the first agent primitive that maps directly onto a problem teams actually have.

Effort control: stop overspending on easy tasks

Before 4.8, you bought a fixed effort budget per call — and that budget was high. For 'rename this variable' or 'translate this sentence', you were paying for max-effort reasoning the task didn't need.

Opus 4.8 exposes effort as an explicit parameter. The Claude Code menu has low | medium | high (default) | xhigh | max. The API takes the same values. The model defaults to high, which Anthropic says is its best balance — but for mechanical tasks dropping to low cuts token spend dramatically with no measurable quality loss.

Users on claude.ai get this too — a slider in the UI rather than an API parameter. Same idea.

Fast mode: 2.5× speed, 3× cheaper than before

Fast mode pricing stays at $10 / $50 per 1M tokens — but the *throughput* is 2.5× the standard mode, and Anthropic claims it's 3× cheaper than fast mode was on prior Opus generations. For interactive Claude Code use, this is the right default. Reach for standard or xhigh when you actually need the depth.

One caveat: fast mode trades latency for slightly shallower planning. For mechanical Claude Code work it doesn't matter. For multi-step debugging where the agent needs to backtrack, standard mode is still measurably better.

'Most honest model yet': what that actually means

Anthropic's framing for Opus 4.8 is sharper judgment + better self-reporting. Concretely:

The third bullet is the one that matters most for MCP-heavy workflows. Pre-4.8, a flaky MCP server could result in plausibly-wrong agent output. 4.8 is more likely to halt and surface the underlying error. Expect cleaner failure modes when an integration is having a bad day.

Sleeper feature: update instructions mid-task without breaking the cache

Pre-4.8, changing the system prompt or developer instructions invalidated the prompt cache — which is the main reason 5+ minute Claude Code sessions get expensive. 4.8 lets you update Claude's instructions mid-task and keeps the cache warm. If you build agentic harnesses with a 'refine the goal as you go' pattern, your bill just dropped.

What Claude Code users should do today

  1. Update Claude Code (claude --version to check). The CLI picks up claude-opus-4-8 automatically if you have access.
  2. Try fast mode (/fast in the Claude Code prompt) for your normal coding work for a day. See if you notice a quality drop. Most won't.
  3. Set --effort low for one-off scripts and mechanical edits. Bank the savings.
  4. Test Dynamic Workflows on a non-critical migration if you're on Enterprise/Team/Max. Pick something with good test coverage; the test suite IS the safety net.
  5. Re-run any agent harness benchmarks you maintain. Effort control changes the meaning of your historical metrics; rebaseline now.

Implications for the MCP ecosystem

Bottom line

Opus 4.8 is a quiet release with two loud features. The numbers are good — but the practical wins for daily Claude Code use are (1) fast mode being the right default at the new price, (2) explicit effort control letting you stop overspending on easy tasks, and (3) Dynamic Workflows finally making codebase-scale migrations a single-session ask. If you're an MCP power user, the long-context recall jump is the under-discussed one — load more servers, trust the recall more.

FAQ

Do I have to do anything to use Opus 4.8 in Claude Code?
No. Claude Code picks up the latest Opus automatically. claude --version to confirm. The model ID is claude-opus-4-8.
Is fast mode safe for production agentic work?
For coding edits, refactors, code review, and most agentic tasks — yes. The quality delta against standard mode is small for those workloads. For deep debugging that requires backtracking or multi-step planning, standard mode (or xhigh effort) is still the safer choice. A/B test on your own workflows before standardizing.
Will Dynamic Workflows blow up my MCP server bills?
Possibly. Dynamic Workflows fan out into parallel subagents, each making its own tool calls. If your MCPs charge per call (paid APIs, third-party SaaS), bill spikes are realistic. Test on a non-critical migration first; consider rate-limiting at the MCP layer; revisit your cost dashboards within the first week.
What does 'most honest model yet' mean for hallucinations?
Anthropic reports fewer unsupported claims and more frequent 'I'm not sure' surfacing. Independent early testers see the same. Expect cleaner failure modes — when an MCP call fails or Claude isn't confident, 4.8 is more likely to say so explicitly instead of confabulating.
Is Opus 4.8 better than GPT-5.5 or Gemini 3.1 Pro?
On most agentic-coding benchmarks Anthropic publishes — yes. The exception is agentic terminal coding, where OpenAI is still ahead. As always: benchmarks are directional. Test on your own workload.
What's the model ID for the API?
claude-opus-4-8. Pricing is unchanged from 4.7. The 1M-context variant is available as claude-opus-4-8[1m] on supported tiers.