How to choose the right statistical test for your A/B test
Picking the wrong test can invalidate your results. This guide walks you through the decision tree based on your metric type.
Step 1: What type of metric are you testing?
The very first question to ask: what kind of data does your metric produce?
- Binary / Conversion rates โ each user either converts or doesn't (clicked, purchased, signed up). Use the Conversions Calculator.
- Continuous per-user metrics โ each user has one numeric value (revenue per user, session duration, pages viewed). Use the Continuous Metrics Calculator.
- Ratio metrics โ sum(X)/sum(Y) where the denominator varies per user (AOV = revenue/orders, revenue per click). Use the Ratio Metrics Calculator.
Step 2: Choose a method within your metric type
For conversion rates:
- Z-test (two proportions) โ the default choice. Works well when sample sizes are moderate to large (n > 30 per group) and expected cell counts are โฅ 5.
- Chi-square test โ best when comparing more than 2 groups simultaneously or analyzing contingency tables with multiple categories.
- Fisher's exact test โ use when sample sizes are small or expected cell counts are below 5. Exact rather than approximate.
For continuous metrics:
- Welch's t-test โ the go-to for comparing means. Works for normally distributed data or large samples (CLT applies at n > 30). Does not assume equal variances.
- Mann-Whitney U test โ use when data is heavily skewed, has outliers, or violates normality assumptions. Compares entire distributions rather than just means.
For ratio metrics:
- Simple t-test on ratios โ quick but can be biased when users contribute different numbers of events.
- Delta method โ the recommended approach. Properly handles the variance of a ratio where the denominator varies per user.
- Bootstrap โ the most flexible. Makes no distributional assumptions. Best for complex or non-standard metrics.
Step 3: Consider these special cases
- Want to peek at results early? Use Sequential Testing with spending functions to control false positive rates while allowing early stopping.
- Testing more than one variant? Use the Multi-variant Calculator with Bonferroni or Holm-Bonferroni corrections.
- Prefer probabilities over p-values? Use the Bayesian Calculator for posterior probability of one variant beating another.
Quick decision flowchart
Is your metric a conversion rate (yes/no)?
YES โ Sample > 30 per group? โ Z-test
YES โ Small samples or sparse data? โ Fisher's exact
YES โ Multiple groups or categories? โ Chi-square
Is your metric one value per user?
YES โ Roughly normal or n > 30? โ Welch's t-test
YES โ Skewed or small samples? โ Mann-Whitney U
Is your metric a ratio (sum/sum)?
YES โ Users have different denominators? โ Delta method
YES โ Complex metric or no assumptions? โ Bootstrap