Re
AI
ty Check
Models
Challenges
Benchmarks
About
Submit Challenge
Models
Challenges
Benchmarks
About
Submit Challenge
minimax
Minimax
2 models tracked
Average resilience
71%
Tests Survived
186
Tests Failed
77
Toughest Breakers
10-Step Instructions
Instruction Following
#1
Pass rate (provider)
0%
Car Wash Dilemma
Logic Reasoning
#2
Pass rate (provider)
0%
The Missing A
Pattern Matching
#3
Pass rate (provider)
0%
Models
MM
MiniMax: MiniMax M2.5
minimax
#1
Survived
75%
Failure Rate
25%
MM
MiniMax: MiniMax M2.1
minimax
#2
Survived
67%
Failure Rate
33%