Google: Gemini 3 Flash Preview

Survived 10 out of 15 breakers

Resilience
67%

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

Context

1,048,576 tokens

Cost (Input)

$0.50 /1M tokens

Cost (Output)

$3.00 /1M tokens

Max completion tokens

65,536

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Bullshit DetectorEpistemic Humility0%
10-Step InstructionsInstruction Following11%
Contradictory PremisesLogic Reasoning11%
The Missing APattern Matching25%
Silence ProtocolInstruction Following56%
Broken MugLateral Thinking75%
Car Wash DilemmaLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Horse Race LogicLogic Reasoning100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%