Re
AI
ty Check
Models
Challenges
Benchmarks
About
Submit Challenge
Models
Challenges
Benchmarks
About
Submit Challenge
baidu
Baidu
1 model tracked
Average resilience
62%
Tests Survived
85
Tests Failed
53
Toughest Breakers
Self-Reference Count
Self Reference
#1
Pass rate (provider)
0%
10-Step Instructions
Instruction Following
#2
Pass rate (provider)
0%
Alice's Brother Problem
Logic Reasoning
#3
Pass rate (provider)
0%
Models
BE
Baidu: ERNIE 4.5 300B A47B
baidu
#1
Survived
62%
Failure Rate
38%