ABtesting.tools

How to choose the right statistical test for your A/B test

Picking the wrong test can invalidate your results. This guide walks you through the decision tree based on your metric type.

Step 1: What type of metric are you testing?

The very first question to ask: what kind of data does your metric produce?

  • Binary / Conversion rates โ€” each user either converts or doesn't (clicked, purchased, signed up). Use the Conversions Calculator.
  • Continuous per-user metrics โ€” each user has one numeric value (revenue per user, session duration, pages viewed). Use the Continuous Metrics Calculator.
  • Ratio metrics โ€” sum(X)/sum(Y) where the denominator varies per user (AOV = revenue/orders, revenue per click). Use the Ratio Metrics Calculator.

Step 2: Choose a method within your metric type

For conversion rates:

  • Z-test (two proportions) โ€” the default choice. Works well when sample sizes are moderate to large (n > 30 per group) and expected cell counts are โ‰ฅ 5.
  • Chi-square test โ€” best when comparing more than 2 groups simultaneously or analyzing contingency tables with multiple categories.
  • Fisher's exact test โ€” use when sample sizes are small or expected cell counts are below 5. Exact rather than approximate.

For continuous metrics:

  • Welch's t-test โ€” the go-to for comparing means. Works for normally distributed data or large samples (CLT applies at n > 30). Does not assume equal variances.
  • Mann-Whitney U test โ€” use when data is heavily skewed, has outliers, or violates normality assumptions. Compares entire distributions rather than just means.

For ratio metrics:

  • Simple t-test on ratios โ€” quick but can be biased when users contribute different numbers of events.
  • Delta method โ€” the recommended approach. Properly handles the variance of a ratio where the denominator varies per user.
  • Bootstrap โ€” the most flexible. Makes no distributional assumptions. Best for complex or non-standard metrics.

Step 3: Consider these special cases

  • Want to peek at results early? Use Sequential Testing with spending functions to control false positive rates while allowing early stopping.
  • Testing more than one variant? Use the Multi-variant Calculator with Bonferroni or Holm-Bonferroni corrections.
  • Prefer probabilities over p-values? Use the Bayesian Calculator for posterior probability of one variant beating another.

Quick decision flowchart

Is your metric a conversion rate (yes/no)?

YES โ†’ Sample > 30 per group? โ†’ Z-test

YES โ†’ Small samples or sparse data? โ†’ Fisher's exact

YES โ†’ Multiple groups or categories? โ†’ Chi-square

Is your metric one value per user?

YES โ†’ Roughly normal or n > 30? โ†’ Welch's t-test

YES โ†’ Skewed or small samples? โ†’ Mann-Whitney U

Is your metric a ratio (sum/sum)?

YES โ†’ Users have different denominators? โ†’ Delta method

YES โ†’ Complex metric or no assumptions? โ†’ Bootstrap