I Tested the New Claude Opus 4.8 With 5 Prompts: Here's the Honest Verdict

Every time Anthropic launches a new flagship AI model, the launch post often has several benchmark numbers, charts, and bar graphs that are slightly higher than last time. That's nice, but it never tells you the thing I actually want to know: what does it feel like to use?

Anthropic has released its new Claude Opus 4.8 model, calling it a more reliable and effective collaborator that builds on Opus 4.7 with improvements across the board and ships for the same price. The company frames it as a model with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

Opus 4.8 is available everywhere today, including claude.ai, Cowork, and the API as claude-opus-4-8. It launches alongside a few notable features:

A new "effort control" that lets you decide how hard Claude thinks on a given response.
A "dynamic workflows" feature in Claude Code that spins up hundreds of parallel subagents.
A fast mode that runs at 2.5× the speed while costing three times less than it did on previous models.

AdCreative.ai: An AI-powered platform that automates the creation of high-performing ad creatives for social media and display campaigns

Try Now

What Can the Claude Opus 4.8 Model Do

Anthropic believes you can give Opus 4.8 a complicated task and trust it to plan the work, use tools properly, check its own results, identify what it isn't sure about, and complete the job.

The biggest gains show up in a few areas:

Agentic coding: It writes, debugs, and refactors with a better grip on large codebases, and is roughly four times less likely than Opus 4.7 to let a flaw in its own code slip by unremarked.
Computer use & browser agents: Early testers clocked it at 84% on Online-Mind2Web, calling it the strongest computer-use model they'd tested, a meaningful jump over both Opus 4.7 and GPT-5.5.
Knowledge work: Drafting documents, analyzing dense filings, and reasoning over PDFs and diagrams, with noticeably better citation precision.
Honesty: This is the headline. Opus 4.8 is more likely to flag uncertainty and less likely to make confident claims it can't back up.

On benchmarks, the picture is strong. On SWE-Bench Pro, a test of AI coding agents Opus 4.8 scores a record 69.2%, up from Opus 4.7's 64.3% and ahead of GPT-5.5's 58.6%. On Anthropic's internal Super-Agent benchmark, it's reportedly the only model to complete every case end-to-end, beating GPT-5.5 at parity in terms of cost.

In simple words, the new Opus 4.8 is a modest but tangible step up — and worth testing against your own stack.

Hands-on testing the new Claude Opus 4.8:

1. Vibe-coding challenge

Prompt: Build a playable browser game in a single HTML file called "Neon Prism." I slide a glowing prism along the bottom to catch falling lights, and every catch builds a combo multiplier. Add dark "bad" orbs I have to dodge, a 3-life system, a difficulty ramp that speeds things up as my score climbs, juicy particle bursts on every catch, screen shake, a starry neon background, and sound effects using the Web Audio API. No external files or libraries. Make it feel alive.

Verdict:

Opus 4.8 handled this prompt perfectly on the first try. The orbs looked nice with the neon glow. The animation was on the spot with particle explosions with a satisfying sound that got higher as your score went up. The paddle widened with multiple catches, and the game's difficulty also increased with my score by changing fall speed and spawn rate. The game worked smoothly with the arrow keys, with no extra setup needed. Overall, it felt lively and met my expectations.

2. Real-world knowledge work

Prompt: Here is a 20-page services agreement. Pull out every obligation that falls on the vendor (not the client), list the deadline attached to each, and flag anything vague enough to cause a dispute. Draft a one-page summary I could hand to a non-lawyer.

Claude Opus 4.8 Real World knowledge work

Verdict:

The output was clear and helpful. It provided a simple table showing vendor obligations with deadlines, or they were perfectly labeled as no deadline specified, so there could not be any reason for disputes. I also really liked its honesty, pointing out vague phrases and explaining why these terms are unclear and risky. It also did not create new clauses or assume responsibilities when they were unclear.

Base44: An AI-powered platform that empowers anyone to transform their ideas into fully-functional apps within minutes, all without coding.

Try Now

3. Vibe coding a productivity tool

Prompt: Build a single-file HTML "Focus Dashboard" I can open in my browser. It needs a Pomodoro timer with customizable work/break lengths, a drag-and-drop task list grouped into Now/Next/Later, a daily streak counter, and a distraction-free mode that dims everything except the active task. Add smooth micro-animations and a clean glassmorphism dark theme. No frameworks, vanilla JS only.

Claude Opus 4.8 Vibe Coding productivity work

Verdict:

Solid. Opus 4.8 followed the prompt well and created exactly what I asked for. The Pomodoro timer worked with custom lengths, and the task drag and drop between Now/Next/Later was smooth. The glassmorphism style, with frosted, semi-transparent panels, looked clean and not gimmicky; something I would use daily. I didn't need to make any edits. The micro-animations were done tastefully and were not distracting.

4. Multi-tool agentic workflow

Prompt: Research the top 5 AI productivity tools launched in the last 6 months. Build a comparison spreadsheet with pricing, core features, target user, and a weighted score. Then draft a 400-word blog intro summarizing the landscape.

Claude Opus 4.8 multi-tool agentic workflow

Verdict:

The output was complete. Opus 4.8 took a few minutes to check everything. The results are very good, with an easy-to-understand spreadsheet that includes all necessary columns and a weighted score that meets the criteria. The 400-word introduction is professional but easy to understand and straight to the point. What I found most impressive is that it mentioned the pricing data might be outdated instead of claiming everything is accurate.

5. Reasoning under ambiguity

Prompt: I'm launching a paid newsletter in 6 weeks. I have a day job, zero existing audience, and about 8 hours a week. Audit my realistic options, build a week-by-week plan, and be honest about what I should NOT try to do.

Claude Opus 4.8 reasoning under ambiguity

Verdict:

This is one of the best responses I've received. Opus 4.8 made a six-week plan that fits my 8-hour limit. It suggested that I spend the first two weeks finding a niche and writing sample issues before building an audience. It also told me to over-invest in branding to pay for ads to grow. It wanted me to start slowly, manually, and strategically using time as my budget and spending my time wisely. It focused on my constraints rather than ignoring them.

Editor's note:

Claude Opus 4.8 is a significant improvement for handling complex tasks and is a clear response to competition from OpenAI and Google. During my tests, it worked well with all five prompts. The vibe-coded game, in particular, functioned correctly on the first attempt, which has not always been the case. The main point that Anthropic emphasized at launch is this: this model is more honest about what it knows and what it doesn't, and you can feel that reliability when you use it.

It's important to know about the new effort control (I really liked it) on Claude.ai and Cowork. Opus 4.8 starts with a "high" effort setting. If you have a difficult task, you can change it to "extra" or "max." This way, the model will use more tokens to give you a better result for tough prompts.

This test was meant to show the improvements a regular, non-technical to slightly-technical Claude user would actually experience. Your results could differ from mine, so try the new Claude Opus 4.8 yourself.

💡 For Partnership/Promotion on AI Tools Club, please check out our partnership page.

Learn more

I Tested the New Claude Opus 4.8 With 5 Prompts: Here's the Honest Verdict

What Can the Claude Opus 4.8 Model Do

The biggest gains show up in a few areas:

Hands-on testing the new Claude Opus 4.8:

1. Vibe-coding challenge

Verdict:

2. Real-world knowledge work

Verdict:

3. Vibe coding a productivity tool

Verdict:

4. Multi-tool agentic workflow

Verdict:

5. Reasoning under ambiguity

Verdict:

Editor's note:

Nishant

Meet June: A Fully Open-Source Private AI Agent for Your Mac

5 Powerful Claude Cowork Features Most People Never Use (But Should)

10 Best AI Tools for Running a Business in 2026 (Build, Automate & Grow)

I Tested the New Claude Sonnet 5 With 5 Prompts: Here's the Honest Verdict

Claude Cowork + Claude for Chrome: This Is the Best Way to Automate Your Workflow

AI Tools Club