Test prompts across Claude and GPT.
Set a horizontal prompt once. Run your prompt against any model. Compare it head-to-head against standard ChatGPT. An LLM judge scores both on overall and technical dimensions.
Provider: anthropic
Persists across the session. Applied to the model under test only โ the standard ChatGPT baseline runs without it for a fair test.
PRQ โ Pairwise Response Quality
An impartial Claude judge scores both responses 0โ10 on accuracy, relevance, depth, clarity, style, and safety; rolls up overall + technical scores; and declares a winner with reasoning.
โ+Enter to run ยท The horizontal prompt above is applied automatically.