Models Challenges Benchmarks About Submit Challenge

Mistral: Mistral Large 3 2512

Survived 5 out of 15 breakers

Resilience

33%

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Context

262,144 tokens

Cost (Input)

$0.50 /1M tokens

Cost (Output)

$1.50 /1M tokens

Max completion tokens

–

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

10-Step Instructions

Instruction Following

Pass rate

Silence Protocol

Instruction Following

Pass rate

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	0%
10-Step Instructions	Instruction Following	0%
Silence Protocol	Instruction Following	0%
Contradictory Premises	Logic Reasoning	0%
Car Wash Dilemma	Logic Reasoning	0%
The Missing A	Pattern Matching	0%
Coin Flip Paradox	Logic Reasoning	0%
Bullshit Detector	Epistemic Humility	25%
Horse Race Logic	Logic Reasoning	25%
The Compartment Trick	Logic Reasoning	25%
Broken Mug	Lateral Thinking	50%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%