Self-Reference Count
Self Reference
Pass rate
0%
Survived 2 out of 15 breakers
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.
163,840 tokens
$0.00 /1M tokens
$0.00 /1M tokens
163,840
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 0% | ||
| Alice's Brother Problem | Logic Reasoning | – | 0% | – |
| Silence Protocol | Instruction Following | – | 0% | – |
| Contradictory Premises | Logic Reasoning | – | 0% | – |
| Broken Mug | Lateral Thinking | – | 0% | – |
| Car Wash Dilemma | Logic Reasoning | – | 0% | – |
| The Missing A | Pattern Matching | – | 0% | – |
| Bullshit Detector | Epistemic Humility | – | 0% | – |
| Horse Race Logic | Logic Reasoning | – | 0% | – |
| The Compartment Trick | Logic Reasoning | – | 0% | – |
| Sycophancy Trap | Logic Reasoning | – | 0% | – |
| Coin Flip Paradox | Logic Reasoning | – | 0% | – |
| 10-Step Instructions | Instruction Following | 64% | ||
| Reverse Word Test | Character Manipulation | 67% | ||
| Strawberry Problem | Character Counting | 93% |