How to choose the right statistical test for your A/B test
Picking the wrong test can invalidate your results. This guide walks you through the decision tree based on your metric type.
Step 1: What type of metric are you testing?
The very first question to ask: what kind of data does your metric produce?
- Binary / Conversion rates β each user either converts or doesn't (clicked, purchased, signed up). Use the Conversions Calculator.
- Continuous per-user metrics β each user has one numeric value (revenue per user, session duration, pages viewed). Use the Continuous Metrics Calculator.
- Ratio metrics β sum(X)/sum(Y) where the denominator varies per user (AOV = revenue/orders, revenue per click). Use the Ratio Metrics Calculator.
Step 2: Choose a method within your metric type
For conversion rates:
- Z-test (two proportions) β the default choice. Works well when sample sizes are moderate to large (n > 30 per group) and expected cell counts are β₯ 5.
- Chi-square test β best when comparing more than 2 groups simultaneously or analyzing contingency tables with multiple categories.
- Fisher's exact test β use when sample sizes are small or expected cell counts are below 5. Exact rather than approximate.
For continuous metrics:
- Welch's t-test β the go-to for comparing means. Works for normally distributed data or large samples (CLT applies at n > 30). Does not assume equal variances.
- Mann-Whitney U test β use when data is heavily skewed, has outliers, or violates normality assumptions. Compares entire distributions rather than just means.
For ratio metrics:
- Simple t-test on ratios β quick but can be biased when users contribute different numbers of events.
- Delta method β the recommended approach. Properly handles the variance of a ratio where the denominator varies per user.
- Bootstrap β the most flexible. Makes no distributional assumptions. Best for complex or non-standard metrics.
Step 3: Consider these special cases
- Want to peek at results early? Use Sequential Testing with spending functions to control false positive rates while allowing early stopping.
- Testing more than one variant? Use the Multi-variant Calculator with Bonferroni or Holm-Bonferroni corrections.
- Prefer probabilities over p-values? Use the Bayesian Calculator for posterior probability of one variant beating another.
Quick decision flowchart
Is your metric a conversion rate (yes/no)?
YES β Sample > 30 per group? β Z-test
YES β Small samples or sparse data? β Fisher's exact
YES β Multiple groups or categories? β Chi-square
Is your metric one value per user?
YES β Roughly normal or n > 30? β Welch's t-test
YES β Skewed or small samples? β Mann-Whitney U
Is your metric a ratio (sum/sum)?
YES β Users have different denominators? β Delta method
YES β Complex metric or no assumptions? β Bootstrap