OpenAI: GPT-5.1-Codex

Survived 7 out of 15 breakers

Resilience
47%

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

Context

400,000 tokens

Cost (Input)

$1.25 /1M tokens

Cost (Output)

$10.00 /1M tokens

Max completion tokens

128,000

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference0%
Contradictory PremisesLogic Reasoning0%
Car Wash DilemmaLogic Reasoning0%
10-Step InstructionsInstruction Following22%
The Missing APattern Matching25%
Bullshit DetectorEpistemic Humility25%
Silence ProtocolInstruction Following67%
Horse Race LogicLogic Reasoning75%
Reverse Word TestCharacter Manipulation89%
Strawberry ProblemCharacter Counting100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%
Coin Flip ParadoxLogic Reasoning100%