What is the multiple comparisons problem?

Testing many variants increases false positive risk. With 5 comparisons at 95% confidence, chance of at least one false positive is ~23%.

What is the difference between Bonferroni and Holm corrections?

Bonferroni divides significance level by the number of comparisons. Holm is strictly more powerful while still controlling error rate.

How many variants should I test at once?

Testing 3-5 variants is common. More variants means more traffic needed and longer test duration.

A/B/n multi-variant test calculator

Which of my multiple variants performs best while controlling for multiple comparisons?

Compare three or more variants simultaneously with proper correction for multiple comparisons. Add up to 5 variants and get pairwise significance tests.

How to use this calculator

Enter visitors and conversions for the control and each variant. Click Add Variant to add up to 5 variants. Select a correction method: Bonferroni (more conservative, controls family-wise error rate) or Holm (less conservative but still valid). The calculator runs all pairwise comparisons and shows which differences are significant after correction.

Why multiple comparison corrections matter

When you test multiple variants against a control, each comparison has a chance of producing a false positive. With 5 pairwise comparisons at 95% confidence, the probability of at least one false positive rises to about 23%. Multiple comparison corrections adjust the significance threshold to maintain the overall false positive rate at 5%. Bonferroni divides alpha by the number of comparisons (simple but conservative). Holm's step-down method is less conservative while still controlling the family-wise error rate.

When to use multi-variant testing

Use A/B/n testing when you have multiple ideas to test simultaneously and want to find the best variant efficiently. This is common in design experiments (testing 3-4 layouts), headline testing, or pricing experiments. However, be aware that adding more variants increases the required sample size. If you only have enough traffic for two variants, run an A/B test instead.

Common mistakes in multi-variant testing

The biggest mistake is running multiple comparisons without any correction, which dramatically inflates false positives. Another mistake is adding too many variants and splitting traffic too thin, leading to underpowered comparisons. Also avoid changing variants mid-test or removing underperforming variants early — this invalidates the statistical analysis. Plan your variants and sample size before starting.