Test prompts across Claude and GPT.

Set a horizontal prompt once. Run your prompt against any model. Compare it head-to-head against standard ChatGPT. An LLM judge scores both on overall and technical dimensions.

Anthropic + OpenAIStreamingPRQ judgedShareable linksRecent runs →

Model under test

Provider: anthropic

Horizontal prompt

Persists across the session. Applied to the model under test only — the standard ChatGPT baseline runs without it for a fair test.

Compare to standard ChatGPT

Runs GPT-4o with no system prompt, then an LLM judge picks a winner.

PRQ — Pairwise Response Quality

An impartial Claude judge scores both responses 0–10 on accuracy, relevance, depth, clarity, style, and safety; rolls up overall + technical scores; and declares a winner with reasoning.

Your prompt

⌘+Enter to run · The horizontal prompt above is applied automatically.