OpenAI: gpt-oss-120b

Survived 8 out of 15 breakers

Resilience
53%

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Context

131,072 tokens

Cost (Input)

$0.04 /1M tokens

Cost (Output)

$0.19 /1M tokens

Max completion tokens

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Car Wash DilemmaLogic Reasoning0%
The Missing APattern Matching0%
Bullshit DetectorEpistemic Humility0%
Self-Reference CountSelf Reference9%
10-Step InstructionsInstruction Following9%
Contradictory PremisesLogic Reasoning18%
Coin Flip ParadoxLogic Reasoning25%
Horse Race LogicLogic Reasoning50%
Silence ProtocolInstruction Following82%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%