Models Challenges Benchmarks About Submit Challenge

Google: Gemini 2.5 Flash

Survived 7 out of 15 breakers

Resilience

47%

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Context

1,048,576 tokens

Cost (Input)

$0.30 /1M tokens

Cost (Output)

$2.50 /1M tokens

Max completion tokens

65,535

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

Contradictory Premises

Logic Reasoning

Pass rate

Broken Mug

Lateral Thinking

Pass rate

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	0%
Contradictory Premises	Logic Reasoning	0%
Broken Mug	Lateral Thinking	0%
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	0%
The Compartment Trick	Logic Reasoning	0%
10-Step Instructions	Instruction Following	11%
Bullshit Detector	Epistemic Humility	50%
Horse Race Logic	Logic Reasoning	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Silence Protocol	Instruction Following	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%