Effect size is a standardized measure of the magnitude of the difference between the control and variant in an A/B test. Unlike statistical significance — which tells you whether a difference is real — effect size tells you how large that difference is, independent of sample size. A large sample can make a tiny, commercially meaningless difference statistically significant; effect size keeps you anchored to whether the result actually matters for your business.
For conversion rate tests, effect size is often expressed as Cohen's h or simply as the relative lift:
Relative Effect Size = (Variant Rate − Control Rate) / Control Rate
For continuous metrics (like revenue per visitor), Cohen's d is used:
Cohen's d = (Mean Variant − Mean Control) / Pooled Standard Deviation
In practice, most ecommerce teams work with relative lift (e.g., "10% improvement in conversion rate") as their working definition of effect size.
Common benchmarks:
- Small effect: < 2% relative lift
- Medium effect: 2–10% relative lift
- Large effect: > 10% relative lift
Why Effect Size Matters for Ecommerce
Effect size is what connects your experiment to a revenue number. Statistical significance only tells you the result is real; effect size tells you whether it's worth acting on. A D2C brand with 500,000 monthly visitors and a 2% conversion rate will see very different revenue impact from a 1% lift versus a 10% lift — even if both are statistically significant. Planning tests around a realistic minimum detectable effect prevents you from running underpowered tests that can only catch large effects, or from over-investing traffic in tests chasing tiny improvements that won't move your P&L.
Real-World Example
Pilgrim, a D2C skincare brand, was testing a new product page layout. They estimated their baseline conversion rate at 3.2% and wanted to detect a 10% relative lift (to 3.52%). Their power calculator said they needed 18,000 visitors per variant to detect this with 80% power. They ran the test accordingly. The result came in at a 9.8% lift — within the planned detectable range — and the team confidently shipped the new layout. Had they only looked for significance without considering effect size, they might have stopped the test at 5,000 visitors and called it inconclusive.
How to Use Effect Size in Practice
- Define your minimum meaningful effect before the test starts — what lift would justify shipping the change?
- Use effect size to size your test: smaller effects require larger sample sizes to detect reliably.
- Don't ship statistically significant results with negligible effect size — a 0.1% lift at 99% confidence isn't worth engineering time.
- Report effect size alongside confidence intervals so stakeholders understand both the result and the uncertainty range.
- Track effect sizes across experiments to calibrate your hypothesis quality over time.
Effect Size in A/B Testing
Effect size is the input to sample size calculators and the output of experiment analysis. Before a test, you specify the minimum effect size you want to detect; after the test, you measure the actual effect size and report it alongside significance. Together, these two numbers tell the full story of an experiment.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.