ABtesting.tools

How to choose the right statistical test for your A/B test

Picking the wrong test can invalidate your results. This guide walks you through the decision tree based on your metric type.

Step 1: What type of metric are you testing?

The very first question to ask: what kind of data does your metric produce?

  • Binary / Conversion rates β€” each user either converts or doesn't (clicked, purchased, signed up). Use the Conversions Calculator.
  • Continuous per-user metrics β€” each user has one numeric value (revenue per user, session duration, pages viewed). Use the Continuous Metrics Calculator.
  • Ratio metrics β€” sum(X)/sum(Y) where the denominator varies per user (AOV = revenue/orders, revenue per click). Use the Ratio Metrics Calculator.

Step 2: Choose a method within your metric type

For conversion rates:

  • Z-test (two proportions) β€” the default choice. Works well when sample sizes are moderate to large (n > 30 per group) and expected cell counts are β‰₯ 5.
  • Chi-square test β€” best when comparing more than 2 groups simultaneously or analyzing contingency tables with multiple categories.
  • Fisher's exact test β€” use when sample sizes are small or expected cell counts are below 5. Exact rather than approximate.

For continuous metrics:

  • Welch's t-test β€” the go-to for comparing means. Works for normally distributed data or large samples (CLT applies at n > 30). Does not assume equal variances.
  • Mann-Whitney U test β€” use when data is heavily skewed, has outliers, or violates normality assumptions. Compares entire distributions rather than just means.

For ratio metrics:

  • Simple t-test on ratios β€” quick but can be biased when users contribute different numbers of events.
  • Delta method β€” the recommended approach. Properly handles the variance of a ratio where the denominator varies per user.
  • Bootstrap β€” the most flexible. Makes no distributional assumptions. Best for complex or non-standard metrics.

Step 3: Consider these special cases

  • Want to peek at results early? Use Sequential Testing with spending functions to control false positive rates while allowing early stopping.
  • Testing more than one variant? Use the Multi-variant Calculator with Bonferroni or Holm-Bonferroni corrections.
  • Prefer probabilities over p-values? Use the Bayesian Calculator for posterior probability of one variant beating another.

Quick decision flowchart

Is your metric a conversion rate (yes/no)?

YES β†’ Sample > 30 per group? β†’ Z-test

YES β†’ Small samples or sparse data? β†’ Fisher's exact

YES β†’ Multiple groups or categories? β†’ Chi-square

Is your metric one value per user?

YES β†’ Roughly normal or n > 30? β†’ Welch's t-test

YES β†’ Skewed or small samples? β†’ Mann-Whitney U

Is your metric a ratio (sum/sum)?

YES β†’ Users have different denominators? β†’ Delta method

YES β†’ Complex metric or no assumptions? β†’ Bootstrap