Google: Gemini 2.5 Flash

Survived 7 out of 15 breakers

Resilience
47%

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Context

1,048,576 tokens

Cost (Input)

$0.30 /1M tokens

Cost (Output)

$2.50 /1M tokens

Max completion tokens

65,535

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Contradictory PremisesLogic Reasoning0%
Broken MugLateral Thinking0%
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
The Compartment TrickLogic Reasoning0%
10-Step InstructionsInstruction Following11%
Bullshit DetectorEpistemic Humility50%
Horse Race LogicLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Silence ProtocolInstruction Following100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%