Models Challenges Benchmarks About Submit Challenge

DeepSeek: R1 0528 (free)

Survived 2 out of 15 breakers

Resilience

13%

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.

Context

163,840 tokens

Cost (Input)

$0.00 /1M tokens

Cost (Output)

$0.00 /1M tokens

Max completion tokens

163,840

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

10-Step Instructions

Instruction Following

Pass rate

64%

Reverse Word Test

Character Manipulation

Pass rate

67%

Breaker Results

Test	Category	Latest Result	Success Rate
Self-Reference Count	Self Reference		0%
Alice's Brother Problem	Logic Reasoning	–	0%	–
Silence Protocol	Instruction Following	–	0%	–
Contradictory Premises	Logic Reasoning	–	0%	–
Broken Mug	Lateral Thinking	–	0%	–
Car Wash Dilemma	Logic Reasoning	–	0%	–
The Missing A	Pattern Matching	–	0%	–
Bullshit Detector	Epistemic Humility	–	0%	–
Horse Race Logic	Logic Reasoning	–	0%	–
The Compartment Trick	Logic Reasoning	–	0%	–
Sycophancy Trap	Logic Reasoning	–	0%	–
Coin Flip Paradox	Logic Reasoning	–	0%	–
10-Step Instructions	Instruction Following		64%
Reverse Word Test	Character Manipulation		67%
Strawberry Problem	Character Counting		93%