Continuous metrics A/B test calculator
Question: Is there a significant difference in my per-user continuous metrics?
Analyze A/B test results for continuous per-user metrics like revenue per visitor, session duration, pages per session, and more. Choose between Welch's t-test (enter sample size, mean, and standard deviation) or Mann-Whitney U test (paste raw data) depending on your data and distribution assumptions.
How to use this calculator
For the t-test method, enter the sample size, mean, and standard deviation for each group. For Mann-Whitney U, paste your raw data values. For revenue data, the mean is your average order value or revenue per visitor, and the standard deviation measures how much individual values vary. Most analytics platforms report these values. Welch's t-test does not assume equal variances between groups, making it robust for real-world data.
Welch's t-test and Mann-Whitney U
Unlike conversion rate tests (which use the z-test for proportions), continuous metrics like revenue per visitor, AOV, and session duration require comparing means. Welch's t-test computes the t-statistic as the difference in means divided by the standard error of the difference, using the Welch-Satterthwaite equation for degrees of freedom. The Mann-Whitney U test is a non-parametric alternative that does not assume normality and works on ranked data, making it more robust for heavily skewed distributions.
When to use this calculator
Use this calculator when your metric of interest is a continuous value rather than a binary conversion. Common examples include revenue per visitor, average order value, session duration, pages per session, time on page, and customer satisfaction scores. This is the correct test for any metric where each user contributes a numeric value rather than just a yes/no outcome.
Common mistakes with continuous metric testing
Revenue and other continuous data is typically right-skewed with a few very high-value observations. This makes standard deviation critical โ a large standard deviation requires much larger samples for significance. Common mistakes include using a z-test for proportions on continuous data (wrong test), ignoring the impact of outliers, not accounting for the high variance inherent in revenue metrics, and comparing totals instead of per-visitor metrics.