When should I use Welch's t-test vs Mann-Whitney?

Use Welch's t-test for normally distributed data or large samples. Use Mann-Whitney for skewed data with outliers.

Why Welch's t-test instead of Student's?

Welch's doesn't assume equal variances between groups, making it more robust for A/B testing.

How do I prepare continuous metric data?

Compute one value per user (e.g., revenue per visitor, session duration). For t-test, you need mean, standard deviation, and sample size per group.

Continuous metrics A/B test calculator

Is there a significant difference in my per-user continuous metrics?

Analyze A/B test results for continuous per-user metrics like revenue per visitor, session duration, pages per session, and more. Choose between Welch's t-test (enter sample size, mean, and standard deviation) or Mann-Whitney U test (paste raw data) depending on your data and distribution assumptions.

When to use: For metrics where each user provides exactly one value — e.g., total revenue per user, session duration, pages viewed. If your metric is a ratio like Revenue/Orders where the denominator varies per user, use the Ratio Metrics calculator instead.

How to use this calculator

For the t-test method, enter the sample size, mean, and standard deviation for each group. For Mann-Whitney U, paste your raw data values. For revenue data, the mean is your average order value or revenue per visitor, and the standard deviation measures how much individual values vary. Most analytics platforms report these values. Welch's t-test does not assume equal variances between groups, making it robust for real-world data.

Welch's t-test and Mann-Whitney U

Unlike conversion rate tests (which use the z-test for proportions), continuous metrics like revenue per visitor, AOV, and session duration require comparing means. Welch's t-test computes the t-statistic as the difference in means divided by the standard error of the difference, using the Welch-Satterthwaite equation for degrees of freedom. The Mann-Whitney U test is a non-parametric alternative that does not assume normality and works on ranked data, making it more robust for heavily skewed distributions.

When to use this calculator

Use this calculator when your metric of interest is a continuous value rather than a binary conversion. Common examples include revenue per visitor, average order value, session duration, pages per session, time on page, and customer satisfaction scores. This is the correct test for any metric where each user contributes a numeric value rather than just a yes/no outcome.

Common mistakes with continuous metric testing

Revenue and other continuous data is typically right-skewed with a few very high-value observations. This makes standard deviation critical — a large standard deviation requires much larger samples for significance. Common mistakes include using a z-test for proportions on continuous data (wrong test), ignoring the impact of outliers, not accounting for the high variance inherent in revenue metrics, and comparing totals instead of per-visitor metrics.