MoonshotAI: Kimi K2.5

Survived 9 out of 15 breakers

Resilience
60%

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

Context

262,144 tokens

Cost (Input)

$0.45 /1M tokens

Cost (Output)

$2.20 /1M tokens

Max completion tokens

65,535

Toughest Breakers

Breaker Results

TestCategoryLatest ResultSuccess Rate
Self-Reference CountSelf Reference11%
10-Step InstructionsInstruction Following11%
Contradictory PremisesLogic Reasoning22%
Horse Race LogicLogic Reasoning50%
Silence ProtocolInstruction Following56%
Car Wash DilemmaLogic Reasoning75%
Bullshit DetectorEpistemic Humility75%
Coin Flip ParadoxLogic Reasoning75%
Strawberry ProblemCharacter Counting100%
Reverse Word TestCharacter Manipulation100%
Alice's Brother ProblemLogic Reasoning100%
Broken MugLateral Thinking100%
The Missing APattern Matching100%
The Compartment TrickLogic Reasoning100%
Sycophancy TrapLogic Reasoning100%