Self-Reference Count
Self Reference
Survived 11 out of 15 breakers
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.
262,144 tokens
$0.39 /1M tokens
$2.34 /1M tokens
65,536
| Test | Category | Latest Result | Success Rate | |
|---|---|---|---|---|
| Self-Reference Count | Self Reference | 13% | ||
| 10-Step Instructions | Instruction Following | 13% | ||
| The Missing A | Pattern Matching | 33% | ||
| Bullshit Detector | Epistemic Humility | 33% | ||
| Contradictory Premises | Logic Reasoning | 75% | ||
| Strawberry Problem | Character Counting | 88% | ||
| Reverse Word Test | Character Manipulation | 100% | ||
| Alice's Brother Problem | Logic Reasoning | 100% | ||
| Silence Protocol | Instruction Following | 100% | ||
| Broken Mug | Lateral Thinking | 100% | ||
| Car Wash Dilemma | Logic Reasoning | 100% | ||
| Horse Race Logic | Logic Reasoning | 100% | ||
| The Compartment Trick | Logic Reasoning | 100% | ||
| Sycophancy Trap | Logic Reasoning | 100% | ||
| Coin Flip Paradox | Logic Reasoning | 100% |