We ran Cursor, Claude Code, Codex, Devin, and Cline against the same 200-bug repo. The leaderboard surprised us.