Car Wash Dilemma
Logic Reasoning
Survived 10 out of 15 breakers
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks.
200,000 tokens
$5.00 /1M tokens
$25.00 /1M tokens
64,000
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| The Missing A | Pattern Matching | 0% | ||
| Self-Reference Count | Self Reference | 7% | ||
| Contradictory Premises | Logic Reasoning | 67% | ||
| 10-Step Instructions | Instruction Following | 72% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Silence Protocol | Instruction Following | 100% | ||
| Broken Mug | Lateral Thinking | 100% | ||
| Bullshit Detector | Epistemic Humility | 100% | ||
| Horse Race Logic | Logic Reasoning | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |