Self-Reference Count
Self Reference
Pass rate
0%
Survived 7 out of 15 breakers
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).
1,048,576 tokens
$0.30 /1M tokens
$2.50 /1M tokens
65,535
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 0% | ||
| Contradictory Premises | Logic Reasoning | 0% | ||
| Broken Mug | Lateral Thinking | 0% | ||
| Car Wash Dilemma | Logic Reasoning | 0% | ||
| The Missing A | Pattern Matching | 0% | ||
| The Compartment Trick | Logic Reasoning | 0% | ||
| 10-Step Instructions | Instruction Following | 11% | ||
| Bullshit Detector | Epistemic Humility | 50% | ||
| Horse Race Logic | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 100% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Silence Protocol | Instruction Following | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |