Models Challenges Benchmarks About Submit Challenge

MoonshotAI: Kimi K2.5

Survived 9 out of 15 breakers

Resilience

60%

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

Context

262,144 tokens

Cost (Input)

$0.45 /1M tokens

Cost (Output)

$2.20 /1M tokens

Max completion tokens

65,535

Toughest Breakers

Self-Reference Count

Self Reference

Pass rate

11%

10-Step Instructions

Instruction Following

Pass rate

11%

Contradictory Premises

Logic Reasoning

Pass rate

22%

Breaker Results

Test	Category	Success Rate
Self-Reference Count	Self Reference	11%
10-Step Instructions	Instruction Following	11%
Contradictory Premises	Logic Reasoning	22%
Horse Race Logic	Logic Reasoning	50%
Silence Protocol	Instruction Following	56%
Car Wash Dilemma	Logic Reasoning	75%
Bullshit Detector	Epistemic Humility	75%
Coin Flip Paradox	Logic Reasoning	75%
Strawberry Problem	Character Counting	100%
Reverse Word Test	Character Manipulation	100%
Alice's Brother Problem	Logic Reasoning	100%
Broken Mug	Lateral Thinking	100%
The Missing A	Pattern Matching	100%
The Compartment Trick	Logic Reasoning	100%
Sycophancy Trap	Logic Reasoning	100%