OpenAI: GPT-5

Survived 13 out of 15 breakers

Resilience
87%

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
10-Step InstructionsInstruction Following11%
Contradictory PremisesLogic Reasoning11%
Silence ProtocolInstruction Following22%
Self-Reference CountSelf Reference33%
The Missing APattern Matching50%
Bullshit DetectorEpistemic Humility75%
Coin Flip ParadoxLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
Car Wash DilemmaLogic Reasoning100%
Horse Race LogicLogic Reasoning100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%