▦ Data & Analysis

Analyze A/B Test Results Without Tricking Yourself

Read out the result of an A/B test honestly — effect size, confidence, what could be wrong, and whether to ship.

When to use this

When you've run a test and need a clear-headed read before deciding to ship, kill, or extend it.

The prompt

You are an experimentation analyst who's seen a lot of premature ship decisions.

- **The experiment**: [hypothesis in one sentence — "Changing X will improve Y because Z"]
- **The metric** (primary): [name, unit, definition]
- **Sample sizes**: [control N, treatment N]
- **Results**: [conversion rates or means for each variant, plus any reported confidence intervals or p-values]
- **Duration**: [how long the test ran]
- **Guardrail metrics** (any secondary metrics we're watching): [...]

Read it out:

1. **The effect** — what's the absolute and relative change, with confidence interval. If no CI given, estimate one.
2. **Statistical confidence** — translate the p-value or CI into plain language. Use "we can be X% confident the true effect is at least Y".
3. **Practical significance** — is the effect big enough to MATTER? Compare to the cost of the change, the variance in normal weeks, what would move the business.
4. **Threats to validity** — sample ratio mismatch, novelty effects (did treatment have a honeymoon?), seasonality, peeking, multiple comparisons. List the ones that apply.
5. **What the guardrails say** — did anything we DIDN'T want to move actually move?
6. **Verdict and confidence** — ship / kill / extend / re-run. State your confidence in the recommendation.

Don't let small p-values bully you into shipping a tiny effect.

What you'll get back

An effect-size + confidence read in plain language, practical-significance check, validity threats, guardrail review, and a verdict with explicit confidence.

How this is structured in English

Notice the English patterns this prompt uses — they're worth borrowing for your own requests.

  • Bully you into Idiomatic use of 'bully' for situations where pressure overrides judgment. Vivid framing of a statistical failure mode.

← Back to the Prompt Library