10-Step Instructions
Instruction Following
Tests ability to follow multiple detailed instructions simultaneously.
Kill rate
96%
All prompt gauntlets sorted by kill rate. Top breakers first.
Instruction Following
Tests ability to follow multiple detailed instructions simultaneously.
Logic Reasoning
Models are sycophantic — they assume every question has a valid answer and invent plausible-sounding explanations for each, even when the premises are mutually exclusive.
Self Reference
Tests self-awareness and recursive reasoning. Model must count letters in its own response.