
From the conversion glossary
Concepts referenced in this article, defined.

Concepts referenced in this article, defined.
Run rigorous A/B tests and personalize every visit on Shopify or any storefront โ no engineers required.
A/B testing is one of the most valuable tools in CRO โ but it's not always the right one. Brands that A/B test everything waste time on inconclusive tests, delay obviously correct improvements, and sometimes make decisions based on statistically meaningless results. Here's a decision framework for knowing when to test, when to just ship, and when to use a different method entirely.
"We should A/B test that" has become a default response to any proposed change. But running a bad A/B test is worse than not testing at all โ it gives you false confidence in results that are statistically meaningless or misinterpreted.
Testing everything also slows down your program. If you're running 20 simultaneous tests, many will be underpowered (not enough traffic per variant). If you spend 8 weeks testing a minor copy change, you've delayed 8 weeks of other potential improvements.
A mature CRO program is selective: it tests changes where the outcome is genuinely uncertain, the stakes are significant, and the traffic supports a valid test.
Ask these questions before committing to an A/B test:

Calculate first: Use a sample size calculator (Evan Miller's is free and reliable). Input your current conversion rate, the minimum detectable effect you care about (typically 10-15%), and target statistical significance (95%).
Rule of thumb check: You need approximately 1,000 conversions per variant to detect a 10% improvement at 95% confidence for a metric converting at 1-5%. If your page generates 30 conversions per month per variant, a test would take 33 months. Don't bother testing โ implement based on qualitative evidence.
When traffic is insufficient: Use qualitative methods (surveys, usability testing, session recordings). Make high-confidence improvements. Scale traffic first, then test.

Some changes are obviously correct with virtually no risk of harm. You don't need an A/B test to know you should:
When it's obviously correct: Just ship it. Document it as a "direct implementation" in your CRO log. Save your testing capacity for genuinely uncertain decisions.
A/B testing requires the ability to show different experiences to different users simultaneously. This becomes problematic for:
When it's site-wide: Implement, measure before/after, and run qualitative research post-launch. Or implement in phases (roll out to 10%, measure, then full rollout).
See also: A/B Testing glossary | Conversion Rate Optimization glossary | Statistical Significance glossary
For a page getting 500 conversions/month, a valid A/B test takes 2-4 weeks minimum โ often 6-8 weeks for smaller effects. Sometimes business context requires faster decisions:
When speed is critical: Make your best judgment call based on qualitative research, expert heuristic review, and past test learnings. Document the reasoning. Review post-launch data and course-correct if needed.
Some changes can't easily be A/B tested because they're difficult or impossible to reverse if the test loses:
When it's hard to reverse: Do extensive qualitative research before committing. Use staged rollouts with careful measurement. Consider a limited pilot with a small customer segment.
When A/B testing isn't appropriate, these alternatives deliver CRO insights:
Usability testing: Watch 5 users complete key tasks. Reveals friction that quantitative data misses. No traffic minimum. Customer surveys: Ask exit-intent or post-purchase questions. Works at any traffic level. Session recordings and heatmaps: Understand how users actually navigate. Works at any traffic level. 5-second tests: Test first-impression clarity of landing pages. No traffic minimum.
Pre-post analysis: Implement a change, compare conversion rates before and after (accounting for seasonal and traffic differences). Less reliable than A/B testing but useful when testing isn't feasible. Staged rollout: Release to a subset of users (by geography, acquisition date, or random sample). Compare segment performance. Imperfect but better than a single launch. Holdout groups: For email and push campaigns, hold back a control group and compare to those who received the campaign. Standard for measuring email and retargeting impact.
When you can't test and the evidence from qualitative research is strong, a structured heuristic review (against established CRO frameworks like LIFT Model or Baymard's ecommerce UX research) can guide high-confidence implementation decisions.
See also: Bounce Rate glossary | User Behavior glossary | Heatmap glossary
Before running any A/B test, confirm:
If you can't check all of these, reconsider whether testing is the right approach.
To balance this framework: there are times when testing is non-negotiable even when the change seems "obviously correct."
High-traffic, high-stakes changes: Any change to your highest-revenue product page or checkout flow should be tested even if the direction seems clear. The cost of being wrong is too high.
Counter-intuitive changes: If data suggests a change that conflicts with best practice or strong internal conviction, testing settles it definitively.
Personalization elements: When you're showing different content to different audiences, testing validates whether your segmentation and content assumptions are correct.
After a rebrand or redesign: Post-launch, run targeted tests on new elements to optimize within the new design โ even if testing the redesign itself wasn't feasible.