From the conversion glossary
Concepts referenced in this article, defined.
Concepts referenced in this article, defined.
Run rigorous A/B tests and personalize every visit on Shopify or any storefront — no engineers required.
Good tracking is what separates a reliable A/B test from a guess with extra steps. If your conversion events are firing inconsistently, if your goal definition doesn't match your actual business objective, or if variant assignment is leaking, your test results are wrong—and you'll make decisions based on bad data. Here's how to set up tracking for A/B testing so that your results actually mean something.
Most teams jump straight to designing experiments. They pick a hypothesis, create variants, launch the test, and then discover three weeks later that 60% of their "conversions" were test events from QA sessions, the conversion pixel fired twice on some orders (doubling reported conversions), or the control and variant were not splitting traffic evenly.
At that point the test is unusable. You've wasted 3 weeks and still don't know whether your change worked.
Spend one day on tracking setup. It protects every test you'll run afterward.
Your primary conversion event depends on what you're testing:
| What You're Testing | Primary Conversion Event |
|---|---|
| Homepage / Landing page | Add to cart or product page view |
| Product page | Add to cart |
| Cart page | Checkout initiated |
| Checkout flow | Purchase |
| Email signup | Form submit |
Don't use "purchase" as your primary event for a homepage test. Too few purchases happen per day (especially for smaller stores) and your test will take months to reach significance. Use a higher-funnel event like add-to-cart that happens 5–10x more frequently.
Even if add-to-cart is your primary metric, track these as secondary signals:
If your variant lifts add-to-cart but tanks the checkout completion rate, you haven't actually won. Secondary events catch this.
For Shopify stores, these events are available out of the box or with minor configuration:
page_view — fires on every pageproduct_viewed — fires on product detail pagescollection_viewed — fires on category pagescart_viewed — fires when cart is openedcheckout_started — fires when checkout beginspayment_info_submitted — fires when card details enteredpurchase — fires on order confirmation pageMake sure your A/B testing tool can read these events or has its own equivalent tracking.
purchase)For custom events (e.g., "add_to_cart" with specific product conditions), create them via Google Tag Manager:
add_to_cartBefore running any test, verify your events:
Common issues to catch:
A well-configured A/B test tracks:
CustomFit.ai handles this automatically for Shopify stores. When you create a test in CustomFit.ai, it:
No custom event setup required for standard Shopify goals. For custom goals (e.g., specific button clicks), you can define them in the CustomFit.ai interface without writing code.
See how CustomFit.ai handles A/B test tracking →
Mistake 1: Using the same GA4 property for test tracking and business reporting
Your A/B test traffic will mix into your regular GA4 reports and can distort trends if variants behave very differently. Either use a separate GA4 data stream for test data, or use your A/B testing tool's native analytics (which is usually cleaner for test-specific data).
Mistake 2: Including internal traffic in tests
If you and your team are browsing the site while a test is live, your internal sessions will skew results—especially for smaller stores. Use IP exclusion in GA4 and in your A/B testing tool to filter out internal traffic.
Mistake 3: Starting a test without establishing baseline conversion rate
Before launching a test, know your current conversion rate for the specific event you're measuring. If your add-to-cart rate is 4.2%, you need to know how much traffic you need to detect a 0.5% lift with 95% confidence. Use a sample size calculator before you start.
Mistake 4: Running tests during atypical periods
Don't start a test the week before Diwali or during a flash sale. Festive traffic behaves differently—higher intent, more mobile, more COD. Your results will reflect the festive period, not normal behavior. Your test conclusions won't transfer to regular days.
Mistake 5: Stopping tests early
If your variant is "winning" after 3 days, resist the urge to stop. Early results are noisy. You need to let the test run for at least 2 full business cycles (at least 2 weeks) and reach statistical significance before drawing conclusions.
If you're running paid traffic into A/B test pages, add UTMs to your ad URLs:
https://yourstore.com/products/protein-powder?utm_source=meta&utm_medium=paid&utm_campaign=jan-sale&utm_content=variant-hero-image
This lets you filter your A/B test data by traffic source. A variant might win overall but only because it performs better for email traffic—while actually performing worse for paid traffic. Source-level segmentation catches this.
Multivariate tests (testing multiple elements simultaneously) need more careful tracking because you're tracking combinations, not just A vs. B.
For a multivariate test with 2 elements × 2 variants each, you have 4 combinations:
Each combination needs to be tracked as a separate "variant" in your tool, with enough traffic for all four buckets to reach significance. For most D2C stores with under 50,000 monthly sessions, multivariate tests take too long. Stick to A/B until you have the traffic volume to support it.
Your A/B testing tool reports on the conversion event you defined. But you need to connect test results to actual business outcomes.
After a test concludes, pull Shopify revenue data for the test period and segment it:
Compare average order value and revenue per visitor, not just conversion rate. A variant might increase conversion rate but attract lower-AOV orders. If variant CVR is +8% but AOV is -12%, the variant actually hurt revenue.
CustomFit.ai's dashboard shows revenue per visitor alongside conversion rate so you can make the full business case, not just an isolated metric win.
purchase in GA4 and order_completed in your A/B tool, reconciling data is painful. Standardize.