Test prompts across Claude and GPT.

Set a horizontal prompt once. Run your prompt against any model. Compare it head-to-head against standard ChatGPT. An LLM judge scores both on overall and technical dimensions.

Anthropic + OpenAIStreamingPRQ judgedShareable linksRecent runs โ†’
Provider: anthropic

Persists across the session. Applied to the model under test only โ€” the standard ChatGPT baseline runs without it for a fair test.

PRQ โ€” Pairwise Response Quality
An impartial Claude judge scores both responses 0โ€“10 on accuracy, relevance, depth, clarity, style, and safety; rolls up overall + technical scores; and declares a winner with reasoning.
โŒ˜+Enter to run ยท The horizontal prompt above is applied automatically.