Sequential testing calculator
Can I stop my A/B test early while controlling false positives?
Check if your A/B test has reached significance using sequential testing boundaries. This method lets you check results at multiple points without inflating false positive rates.
How to use this calculator
Enter your planned number of looks (interim analyses) and which look you are currently at. Enter visitors and conversions for each variant. Select a spending function: O'Brien-Fleming (conservative early, liberal late) or Pocock (equal boundaries at each look). The calculator shows whether your current z-score crosses the efficacy or futility boundary at this interim analysis.
How sequential testing works
Traditional A/B testing requires waiting until a fixed sample size is reached. Sequential testing allows you to check results at pre-planned intervals (looks) while controlling the overall false positive rate. It uses alpha spending functions that allocate the total significance level across looks. O'Brien-Fleming spends very little alpha early (hard to stop early) but nearly all at the final look. Pocock spends alpha equally across looks (easier to stop early but harder at the final look). Both methods maintain the overall Type I error rate at the desired level.
When to use sequential testing
Use sequential testing when you need to monitor an experiment over time and want the option to stop early for clear winners or losers. This is especially valuable for tests with business urgency (product launches, seasonal campaigns) or when the cost of continuing a losing variant is high (negative revenue impact). Sequential testing is the proper solution to the peeking problem β the temptation to check results before the planned sample size.
Common mistakes in sequential testing
The most common mistake is using sequential boundaries without planning them in advance. You must decide the number of looks and the spending function before the test starts. Another mistake is checking results more often than planned β even with sequential testing, you can only check at pre-specified intervals. Using Pocock boundaries when you plan many looks makes the final analysis very conservative, so O'Brien-Fleming is often preferred for tests with many planned looks.