What is statistical power in A/B testing?

Statistical power is the probability that your test will detect a real effect if one exists.

What power level should I aim for?

80% is the widely accepted minimum. Use 90% or higher for high-stakes tests.

Should I do power analysis before or after my test?

Before. Post-hoc power analysis using the observed effect size is statistically invalid.

Statistical power calculator for A/B tests

Does my test have enough statistical power to detect the expected effect?

Determine the statistical power of your A/B test. Find out what effect sizes you can reliably detect with your current sample size and traffic.

How to use this calculator

Enter your sample size per variant, baseline conversion rate, and the minimum detectable effect you want to evaluate. The calculator shows the statistical power — the probability that your test will correctly detect a real effect of that size. Adjust the MDE to see the power curve, which shows how power changes across different effect sizes.

Understanding statistical power

Statistical power is the probability that a test correctly rejects the null hypothesis when a real effect exists. It is calculated as 1 minus the Type II error rate (beta). The power depends on four factors: sample size, effect size, significance level (alpha), and the baseline conversion rate. Higher power means fewer false negatives — you are less likely to miss a real improvement. The standard minimum is 80%, meaning you have an 80% chance of detecting a true effect.

When to use this calculator

Use power analysis before running a test to verify your sample size is adequate, or after a test shows no significant result to understand whether you had enough power to detect the effect you cared about. A non-significant result from an underpowered test does not mean there is no effect — it means your test was unable to detect it. This distinction is critical for correct interpretation.

Common mistakes in power analysis

The most common mistake is running underpowered tests (below 80% power) and then concluding there is no effect when the result is not significant. Another mistake is computing power after the test using the observed effect size, which is circular reasoning (post-hoc power analysis). Power should be calculated before the test using the minimum effect size you care about, not the effect you actually observed.