OpenAI: GPT-5.1

Survived 9 out of 15 breakers

Resilience
60%

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Broken MugLateral Thinking0%
Car Wash DilemmaLogic Reasoning0%
10-Step InstructionsInstruction Following22%
The Missing APattern Matching25%
Horse Race LogicLogic Reasoning25%
Silence ProtocolInstruction Following33%
Contradictory PremisesLogic Reasoning33%
The Compartment TrickLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Bullshit DetectorEpistemic Humility100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%