Models Challenges Benchmarks About Submit Challenge

OpenAI: GPT-5.1

Survived 9 out of 15 breakers

Resilience

60%

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

Broken Mug

Lateral Thinking

Pass rate

Car Wash Dilemma

Logic Reasoning

Pass rate

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	0%
Broken Mug	Lateral Thinking	0%
Car Wash Dilemma	Logic Reasoning	0%
10-Step Instructions	Instruction Following	22%
The Missing A	Pattern Matching	25%
Horse Race Logic	Logic Reasoning	25%
Silence Protocol	Instruction Following	33%
Contradictory Premises	Logic Reasoning	33%
The Compartment Trick	Logic Reasoning	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Bullshit Detector	Epistemic Humility	100%
Sycophancy Trap	Logic Reasoning	100%
Coin Flip Paradox	Logic Reasoning	100%