Hypothesis testing is the formal statistical process used to determine whether the difference in performance between a control and a variant in an A/B test is real (attributable to the change) or merely a product of random variation. It involves formulating a null hypothesis (no effect) and an alternative hypothesis (a real effect exists), collecting data, calculating a test statistic and p-value, and deciding whether to reject the null hypothesis based on a pre-set significance threshold.
The process follows these steps:
- State the null hypothesis (H₀): There is no difference in conversion rate between control and variant.
- State the alternative hypothesis (H₁): The variant conversion rate differs from (or is higher than) the control.
- Set the significance level (α): Typically 0.05 (5%), giving 95% confidence.
- Collect data until the pre-planned sample size is reached.
- Calculate the test statistic (z-score or chi-square) and derive the p-value.
- Decision rule: If p < α, reject H₀ and accept H₁. The result is statistically significant.
Why Hypothesis Testing Matters for Ecommerce
Every A/B test is a hypothesis test, whether you think of it that way or not. The difference between a rigorous testing program and a p-hacking exercise is whether teams follow the formal structure: setting α before the test, committing to a sample size, and not adjusting the hypothesis based on early results. D2C brands that skip this rigor end up with a portfolio of "winning" tests that don't hold up when shipped — because each result was actually noise that happened to cross an arbitrary threshold.
Real-World Example
Nykaa's analytics team ran an experiment on their beauty editorial pages, testing whether adding a "Shop This Look" CTA below editorial content would increase product page visits. Their null hypothesis: adding the CTA does not change the CTR to product pages. Their alternative hypothesis: the CTA increases product page CTR. After setting α = 0.05 and collecting 28 days of data, the p-value came in at 0.019 — below the 0.05 threshold. They rejected the null hypothesis and shipped the CTA, which went on to drive a measurable increase in product discovery sessions.
How to Apply Hypothesis Testing Correctly
- Write the hypothesis before looking at any data — the null and alternative must be fixed at test design, not reverse-engineered from results.
- Use two-tailed tests when you don't have a strong prior about direction (i.e., the variant could be better or worse).
- Do not adjust α after seeing results — this is p-hacking, and it inflates your false positive rate.
- Check for multiple testing corrections if you are evaluating several metrics simultaneously — consider Bonferroni correction.
- Interpret p-values correctly: p = 0.03 does not mean there is a 97% chance the variant is better; it means there is a 3% chance of seeing this large a difference if the null were true.
Hypothesis Testing in A/B Testing
A/B testing is applied hypothesis testing. Every time you set a significance threshold, run a test to sample size, and check a p-value, you are executing a formal hypothesis test. Most A/B testing platforms automate the calculation but not the discipline — the rigor of defining hypotheses upfront, respecting run time, and interpreting results honestly is the responsibility of the practitioner.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.