The peeking problem is the phenomenon where checking A/B test results repeatedly during the experiment — and stopping early when significance appears to be reached — dramatically inflates the false positive rate. Each time you check the results and compute a p-value, you are conducting an additional hypothesis test. If you stop as soon as any of those checks crosses the significance threshold, you are not running at α = 0.05; your actual false positive rate is much higher, depending on how many times you checked.
How Much Does Peeking Inflate False Positive Rates?
Research by Johari et al. (Optimizely) quantified the effect:
- Checking once daily and stopping whenever p < 0.05: actual false positive rate ≈ 26%
- Checking continuously with no fixed endpoint: false positive rate can exceed 50%
At α = 0.05, you intended a 5% false positive rate. Peeking without correction turns this into a coin flip for false positives.
Why the Peeking Problem Matters for Ecommerce
The peeking problem is the most common mistake in ecommerce A/B testing. Growth teams feel pressure to ship results quickly; marketing teams want to know if the new landing page is working before the campaign ends. This creates constant pressure to check tests daily and stop as soon as they look positive. The result is a testing program that feels productive — lots of "winners" shipped — but produces many changes that don't actually improve conversion in the long run, because a significant portion of those "winners" were false positives.
Real-World Example
A D2C supplements brand launched a test on their subscription landing page. The growth lead checked results every morning. On day 6, the variant showed 96% confidence. The team stopped the test and shipped. Over the next 30 days, the variant's conversion rate drifted back toward the control's, and a month later, the page was essentially performing at the same level as before. In hindsight, the day-6 result was a peeking false positive — the test needed 18 days to reach a reliable sample size, and the early "significance" was noise crossing the threshold during a random early fluctuation.
How to Avoid the Peeking Problem
- Calculate the required sample size before the test starts and commit to running until that sample size is reached.
- Set a fixed end date for the test — do not check results until that date arrives.
- Use sequential testing methods (like always-valid p-values or SPRT) if you genuinely need the ability to peek — these are designed for continuous monitoring without inflating false positives.
- Configure your testing platform to lock results until the planned sample size is met.
- Create a team norm that experiment results are only reviewed at the scheduled end date — not shared in Slack partway through.
Peeking Problem in A/B Testing
The peeking problem is a form of the multiple testing problem applied over time rather than across simultaneous tests. Testing platforms that display live confidence scores make peeking irresistible — the solution is process discipline, not platform features. Teams that implement fixed stopping rules and pre-planned sample sizes eliminate the peeking problem from their experimentation culture.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.