ABtesting.tools

Understanding A/B test sample size

Running a test with too few visitors wastes time and produces unreliable results. Too many, and you delay decisions. Here is how to find the sweet spot.

Why sample size matters

A test with insufficient sample size is underpowered โ€” it will fail to detect real effects most of the time. This leads to inconclusive results and wasted experiment slots. On the other hand, massively oversized tests delay shipping winning variants.

The four inputs that determine sample size

  1. Baseline rate โ€” your current conversion rate or metric value. Lower baselines require more samples because there is less signal in the data.
  2. Minimum Detectable Effect (MDE) โ€” the smallest improvement worth detecting. Smaller effects need larger samples. Be realistic: a 1% relative lift on a 2% baseline is extremely hard to detect.
  3. Significance level (ฮฑ) โ€” the false positive rate, typically 5% (95% confidence). Lowering ฮฑ to 1% requires substantially more data.
  4. Statistical power (1-ฮฒ) โ€” the probability of detecting a real effect. 80% is the standard minimum. Increasing to 90% adds roughly 30% more required samples.

Common mistakes

  • Stopping early โ€” peeking at results and stopping when p < 0.05 inflates false positive rates dramatically. Use sequential testing if you need to peek.
  • Using post-hoc power โ€” computing power after the test is done tells you nothing useful. Always calculate sample size before launching.
  • Ignoring traffic splits โ€” if only 50% of traffic enters the test, you need twice as many total visitors. Use the Duration Calculator to account for this.
  • Unrealistic MDE โ€” hoping to detect a 1% relative change on a 3% baseline would require millions of visitors. Align MDE with your business impact threshold.

Rules of thumb

  • Halving your MDE roughly quadruples the required sample size.
  • Lower baseline rates need more visitors โ€” a 1% baseline needs ~25x more than a 50% baseline for the same relative MDE.
  • Going from 80% to 90% power increases sample by about 30%.
  • Adding more variants linearly increases total required traffic.

Calculate your sample size

Use the Sample Size Calculator to get an exact number for your specific baseline, MDE, and power requirements.