Stopping rules are the pre-defined criteria that determine when an A/B test should be concluded and its results analyzed. They exist to prevent the peeking problem and ensure that the statistical assumptions underlying significance tests are not violated by early stopping. A properly defined stopping rule specifies, before the test starts, the exact conditions under which the experiment will end — typically based on reaching a pre-calculated sample size, a fixed calendar duration, or a sequential testing boundary.
Types of Stopping Rules
1. Fixed-horizon stopping rule (most common):
- Run the test until the pre-planned sample size is reached.
- Do not analyze results before that point.
- Simple, transparent, and widely understood.
2. Duration-based stopping rule:
- Run for a fixed number of days (e.g., 28 days), regardless of sample size.
- Useful for capturing weekly traffic cycles; often combined with sample size requirements.
3. Sequential stopping rules (advanced):
- Use statistical methods (SPRT, always-valid inference, mSPRT) that allow continuous monitoring without inflating false positive rates.
- The test can be stopped at any point once the sequential boundary is crossed.
- More complex to implement but appropriate when speed is critical.
4. Futility stopping:
- Stop the test early if it becomes clear the variant cannot possibly reach significance within the planned run time.
- Frees up traffic and resources for more promising experiments.
Why Stopping Rules Matter for Ecommerce
Without stopping rules, "stop when significant" becomes the de facto approach — which is statistically equivalent to having no stopping rule at all. This is the root cause of the peeking problem. For D2C brands under pressure to show results quickly, the absence of stopping rules turns the experimentation program into a false-positive factory. Establishing and following stopping rules is what makes the difference between a testing program that generates real insights and one that generates a convincing but misleading library of "wins."
Real-World Example
Boat Lifestyle's experimentation team implemented a formal stopping rule policy: all experiments must (1) reach the pre-calculated sample size AND (2) run for at least 14 calendar days. One experiment reached its sample size target on day 8, but the stopping rule required waiting until day 14. Between day 8 and day 14, the variant's lead narrowed from 11% to 6% — still a significant and valuable result, but very different from the premature read. The calendar requirement protected them from capturing a non-representative early traffic mix.
How to Implement Stopping Rules
- Document the stopping rule in the test brief before launch — specify the sample size threshold and/or calendar duration.
- Configure your testing platform to alert you (not auto-stop) when the planned criteria are met.
- Never stop a test simply because it reached significance unless it also met the planned sample size and duration requirements.
- Apply a futility threshold for long-running inconclusive tests: if a test is at 40% confidence after reaching 150% of the planned sample size, call it done.
- Log every test's stopping rule and how it was applied to build institutional knowledge and accountability.
Stopping Rules in A/B Testing
Stopping rules are the procedural counterpart to statistical methodology. The math of significance testing assumes that you decided when to stop before seeing the data — stopping rules enforce that assumption in practice. Teams that operate without them are running tests that don't actually have the statistical properties they appear to have.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.