Re
AI
ty Check
Models
Challenges
Benchmarks
About
Submit Challenge
Models
Challenges
Benchmarks
About
Submit Challenge
openai
Openai
9 models tracked
Average resilience
71%
Tests Survived
886
Tests Failed
355
Toughest Breakers
10-Step Instructions
Instruction Following
#1
Pass rate (provider)
0%
Contradictory Premises
Logic Reasoning
#2
Pass rate (provider)
11%
Car Wash Dilemma
Logic Reasoning
#3
Pass rate (provider)
22%
Models
OG
OpenAI: GPT-5.2
openai
#1
Survived
76%
Failure Rate
24%
OO
OpenAI: o4 Mini
openai
#2
Survived
76%
Failure Rate
24%
OG
OpenAI: GPT-5
openai
#3
Survived
75%
Failure Rate
25%
OG
OpenAI: GPT-5 Codex
openai
#4
Survived
73%
Failure Rate
27%
OG
OpenAI: GPT-5.1-Codex
openai
#5
Survived
70%
Failure Rate
30%
OG
OpenAI: gpt-oss-120b
openai
#6
Survived
70%
Failure Rate
30%
OG
OpenAI: GPT-5 Chat
openai
#7
Survived
70%
Failure Rate
30%
OG
OpenAI: GPT-5.1
openai
#8
Survived
69%
Failure Rate
31%
OG
OpenAI: GPT-5.1 Chat
openai
#9
Survived
63%
Failure Rate
37%