Live Benchmarks • Updated about 19 hours ago

Where AI Models
Face Reality

We throw tricky but funny prompts at top AI models and watch them squirm. Count letters. Flip cups. Cite imaginary dolphins.
Nobody passes clean.

47
Models Tracked
15
Active Challenges
16
Providers
Daily
Automated Runs
HOW MANY R's in 🍓 // A CUP WITH NO BOTTOM // WHO ARE YOU TO TELL ME TO BE SILENT 🤫 // CAN'T COUNT THE BROTHERS 👨‍👧‍👧 // WRITE IT BACKWARDS I DARE YOU 🔄 // CITE ME A DOLPHIN PAPER 🐬 // SARCASM 🪧 // I ATE IT ALL🍪 // 2 + 2 = 5 🧮 // HOW MANY R's in 🍓 // A CUP WITH NO BOTTOM // WHO ARE YOU TO TELL ME TO BE SILENT 🤫 // CAN'T COUNT THE BROTHERS 👨‍👧‍👧 // WRITE IT BACKWARDS I DARE YOU 🔄 // CITE ME A DOLPHIN PAPER 🐬 // SARCASM 🪧 // I ATE IT ALL🍪 // 2 + 2 = 5 🧮 // HOW MANY R's in 🍓 // A CUP WITH NO BOTTOM // WHO ARE YOU TO TELL ME TO BE SILENT 🤫 // CAN'T COUNT THE BROTHERS 👨‍👧‍👧 // WRITE IT BACKWARDS I DARE YOU 🔄 // CITE ME A DOLPHIN PAPER 🐬 // SARCASM 🪧 // I ATE IT ALL🍪 // 2 + 2 = 5 🧮 // HOW MANY R's in 🍓 // A CUP WITH NO BOTTOM // WHO ARE YOU TO TELL ME TO BE SILENT 🤫 // CAN'T COUNT THE BROTHERS 👨‍👧‍👧 // WRITE IT BACKWARDS I DARE YOU 🔄 // CITE ME A DOLPHIN PAPER 🐬 // SARCASM 🪧 // I ATE IT ALL🍪 // 2 + 2 = 5 🧮 //
Benchmark

Providers Performance

Failure-rate snapshot by provider (averaged across their models).

Provider
Nvidia100%100%100%100%100%100%100%100%100%100%100%100%100%100%0%
Mistralai100%100%100%100%100%100%100%100%100%0%100%0%0%0%0%
Xiaomi100%100%100%100%100%100%100%100%100%100%0%0%0%0%0%
Deepseek100%100%100%100%100%100%100%100%100%0%0%0%0%0%0%
Baidu100%100%100%100%100%100%0%0%0%100%100%0%100%0%0%
Google89%89%100%78%78%78%11%44%78%56%44%44%22%11%11%
Prime-intellect100%100%100%100%0%0%100%100%0%100%100%0%0%0%0%
Anthropic92%92%85%92%85%69%85%54%15%15%8%8%15%8%15%
Arcee-ai100%100%0%100%100%0%100%100%0%100%0%0%0%0%0%
Moonshotai100%100%100%100%0%0%0%100%0%0%100%0%0%0%0%
Bytedance-seed100%100%100%100%100%0%0%0%100%0%0%0%0%0%0%
X-ai100%100%50%0%50%0%50%50%100%0%50%0%0%0%0%
Minimax100%50%50%100%100%50%0%50%0%0%50%0%0%0%0%
Openai100%89%56%78%78%33%56%22%33%0%0%0%0%0%0%
Z-ai100%100%100%0%100%0%0%0%100%0%0%0%0%0%0%
Qwen100%0%100%0%100%0%0%0%100%0%0%0%0%0%0%

Have a tricky prompt?

Submit your edge case. If it breaks major models, we add it to the gauntlet and credit the submission.

Submit Challenge
Model Eliminated