ReAIty Check
ModelsChallengesBenchmarksAbout
Submit Challenge
ModelsChallengesBenchmarksAboutSubmit Challenge
google

Google

9 models tracked

Average resilience
64%
Tests Survived

950

Tests Failed

577

Toughest Breakers

Self-Reference Count

Self Reference

#1
Pass rate (provider)
0%

10-Step Instructions

Instruction Following

#2
Pass rate (provider)
11%

Contradictory Premises

Logic Reasoning

#3
Pass rate (provider)
11%

Models

GG

Google: Gemini 3 Pro Preview

google

#1
Survived
78%
Failure Rate
22%
GG

Google: Gemini 3.1 Pro Preview

google

#2
Survived
74%
Failure Rate
26%
GG

Google: Gemini 3 Flash Preview

google

#3
Survived
72%
Failure Rate
28%
GG

Google: Gemini 2.5 Pro

google

#4
Survived
66%
Failure Rate
34%
GG

Google: Gemini 2.5 Flash

google

#5
Survived
63%
Failure Rate
37%
GG

Google: Gemini 2.0 Flash

google

#6
Survived
57%
Failure Rate
43%
GG

Google: Gemma 3 27B (free)

google

#7
Survived
57%
Failure Rate
43%
GG

Google: Gemini 2.5 Flash Lite

google

#8
Survived
57%
Failure Rate
43%
GG

Google: Gemma 3 27B

google

#9
Survived
51%
Failure Rate
49%

© 2026 ReAIty Check v0.5.27-beta by Eugene Tusmenko