How to choose the right statistical test for your A/B test

Picking the wrong test can invalidate your results. This guide walks you through the decision tree based on your metric type.

Step 1: What type of metric are you testing?

The very first question to ask: what kind of data does your metric produce?

Binary / Conversion rates — each user either converts or doesn't (clicked, purchased, signed up). Use the Conversions Calculator.
Continuous per-user metrics — each user has one numeric value (revenue per user, session duration, pages viewed). Use the Continuous Metrics Calculator.
Ratio metrics — sum(X)/sum(Y) where the denominator varies per user (AOV = revenue/orders, revenue per click). Use the Ratio Metrics Calculator.

Z-test (two proportions) — the default choice. Works well when sample sizes are moderate to large (n > 30 per group) and expected cell counts are ≥ 5.
Chi-square test — best when comparing more than 2 groups simultaneously or analyzing contingency tables with multiple categories.
Fisher's exact test — use when sample sizes are small or expected cell counts are below 5. Exact rather than approximate.

Welch's t-test — the go-to for comparing means. Works for normally distributed data or large samples (CLT applies at n > 30). Does not assume equal variances.
Mann-Whitney U test — use when data is heavily skewed, has outliers, or violates normality assumptions. Compares entire distributions rather than just means.

Simple t-test on ratios — quick but can be biased when users contribute different numbers of events.
Delta method — the recommended approach. Properly handles the variance of a ratio where the denominator varies per user.
Bootstrap — the most flexible. Makes no distributional assumptions. Best for complex or non-standard metrics.

Want to peek at results early? Use Sequential Testing with spending functions to control false positive rates while allowing early stopping.
Testing more than one variant? Use the Multi-variant Calculator with Bonferroni or Holm-Bonferroni corrections.
Prefer probabilities over p-values? Use the Bayesian Calculator for posterior probability of one variant beating another.

Is your metric a conversion rate (yes/no)?

YES → Sample > 30 per group? → Z-test

YES → Small samples or sparse data? → Fisher's exact

YES → Multiple groups or categories? → Chi-square

Is your metric one value per user?

YES → Roughly normal or n > 30? → Welch's t-test

YES → Skewed or small samples? → Mann-Whitney U

Is your metric a ratio (sum/sum)?

YES → Users have different denominators? → Delta method

YES → Complex metric or no assumptions? → Bootstrap