CustomFit.ai โ€” Website personalization, A/B testing and CRO for Shopify and D2C
Product
Features
โœฑ
Website Personalization
Adapt to each visitor's behavior & intent
โง–
A/B & Multivariate Testing
Rigorous experimentation
โœจ
AI CopilotNEW
Personalize with a prompt
๐Ÿค–
AI WingmanNEW
Auto-optimize toward winners
๐ŸŽฏ
AI Conversion OptimizerNEW
GPT-grade test ideas
โœŽ
No-Code Visual Editor
Drag-and-drop edit any element
โ–ฆ
Product Recommendations
Personalized recs that lift AOV
โš‘
Feature Flags
Ship safely with kill-switches
โ—ง
Chrome Extension
Edit your store in the browser
โง‰
Shopify, WooCommerce & more
All platform integrations
View all features โ†’
Use Cases
$
Price A/B Testing
Test price points to maximize revenue
โ–ฆ
Theme A/B Testing
Compare whole layouts & designs
๐Ÿ—‚
Template A/B Testing
Test whole PDP/PLP templates
๐Ÿท
Discount A/B Testing
Find the offer that converts
๐Ÿšš
Shipping A/B Testing
Thresholds, speed & copy
โœ
Content A/B Testing
Copy, images & reviews
๐Ÿ’ณ
Checkout Gateway A/B
Payments & one-click
โŒ–
Geo-Based Personalization
Per-location content & offers
โšก
Buyer-Intent Nudges
Exit-intent & retargeting
โ†”
Split-URL / Redirection
Full-page redirect tests
View all use cases โ†’
Solutions & Guides
โคข
Conversion Rate Optimization
The complete CRO guide
โง–
A/B Testing Software
Buyer's guide for D2C
๐Ÿ›’
Cart Abandonment Recovery
Win back lost carts
๐Ÿ“ฐ
Landing Page Optimization
Convert more paid traffic
S
Shopify A/B Testing
Test your store, no code
S
Shopify Personalization
Tailor the store per shopper
โ—”
First-Time Visitor Offers
Convert new shoppers with trust & offers
โ˜…
Repeat-Customer Experiences
Reward and re-engage loyal buyers
โ—Ž
Campaign-Matched Pages
Match the landing page to the ad
โŒ–
Location-Based Experiences
Currency, language & regional offers
Explore CRO โ†’
Customer stories
GIVA
+32%
conversion via personalized recs
GIVA
Mamaearth
+18%
revenue lift from PDP A/B tests
ME
The Sleep Company
+24%
AOV from product recommendations
TSC
Read customer stories โ†’
Integrations
SWsfGA+15
โœฆ
Not sure where to start?
Let AI Copilot pick your first tests

โ€œWe wake up to evidence-backed tests ready to deploy โ€” not a backlog of maybe ideas.โ€

AN
Anirudh S.
Growth ยท Chargebee
โ˜…โ˜…โ˜…โ˜…โ˜…4.8on G2 ยท 2,400+ brands
Talk to our team โ†’
Widgets
Integrations
Ecommerce & Checkout
Shopify
Shopline
Shoplazza
GoKwik
ShopFlo
Razorpay Magic Checkout
Breeze
Shiprocket
View all integrations โ†’
Analytics & Behavior
Google Analytics 4
Microsoft Clarity
Hotjar
Mixpanel
Amplitude
Heap
Adobe Analytics
Segment (CDP)
View all integrations โ†’
Engagement, CRM & More
Klaviyo
MoEngage
CleverTap
WebEngage
HubSpot
Salesforce
Slack
Meta Ads
View all integrations โ†’
CustomersPricing
Resources
CRO
โ–ค
Playbooks
Proven strategies to boost conversions
๐ŸŽ™
Interviews
D2C leaders & marketing experts
โ–ถ
Webinars
Live deep dives & product sessions
Learn
โœŽ
Blog
Tips, experiments & best practices
๐Ÿ“•
Free E-Books
Mastering personalization
๐Ÿ“–
Conversion Glossary
Every CRO term, defined
โœฆAI CopilotNEWLog inBook a demo
Start free trial
Select your platform โ€” Install in 2 minsWe'll tailor the setup
โšก Risk-free 14-day trial ยท No credit card ยท Cancel anytime
S
Shopify
Install from Shopify App Store
โ€บ
W
WooCommerce
Install the WooCommerce plugin
โ€บ
B
BigCommerce
Install from BigCommerce App Marketplace
โ€บ
SL
Shopline
Install from Shopline App Store
โ€บ
M
Salesforce / Magento
Install from the marketplace
โ€บ
SZ
Shoplazza
Install from Shoplazza App Store
โ€บ
WP
WordPress / Webflow
Install plugin or paste the script
โ€บ
โ—ง
Others
Custom-built on React, Next.js, etc.
โ€บ
Tip: pick your platform โ€” we handle the restBook a demo โ†’
Product
Website PersonalizationA/B & Multivariate TestingAI CopilotAI WingmanAI Conversion OptimizerNo-Code Visual EditorProduct RecommendationsFeature FlagsView all features โ†’
Use Cases
Price A/B TestingTheme A/B TestingTemplate A/B TestingDiscount A/B TestingShipping A/B TestingContent A/B TestingCheckout Gateway A/BGeo-Based PersonalizationBuyer-Intent NudgesSplit-URL / Redirection
Solutions & Guides
Conversion Rate OptimizationA/B Testing SoftwareCart Abandonment RecoveryLanding Page OptimizationShopify A/B TestingShopify Personalization
Explore
WidgetsIntegrationsCustomersPricing
Resources
BlogPlaybooksWebinarsInterviewsE-BooksConversion Glossary
Platforms
ShopifyShoplineShoplazzaChrome ExtensionAll integrations
Start free trialBook a demo
Homeโ€บBlogโ€บab testingโ€บA/B Testing Sample Size Calculator: How Many Visitors Do You Need?
a-b-testingsample-sizestatistical-significance

A/B Testing Sample Size Calculator: How Many Visitors Do You Need?

Calculate how many visitors you need for a valid A/B test. Inputs: baseline CVR, minimum detectable effect, confidence level. Includes formula and worked examples.

SJSapna JoharHead of Growth & CRO, CustomFit.aiMarch 26, 202613 min read
On this page
  1. The Quick Sample Size Reference Table
  2. What Inputs You Need (and How to Find Them)
  3. 1. Baseline Conversion Rate
  4. 2. Minimum Detectable Effect (MDE)
  5. 3. Statistical Confidence Level
  6. 4. Traffic Split
  7. The Sample Size Formula Explained Simply
  8. Worked Example: A Shopify Product Page Test
  9. What to Do When Your Traffic Is Too Low
  10. Focus on Higher-Traffic Pages
  11. Increase Your MDE โ€” Test Bigger Changes
  12. Accept Longer Test Durations
  13. Use Bayesian Testing for Low-Traffic Situations
  14. Sample Size Mistakes That Invalidate Tests
  15. Calculating Sample Size After You Start (or Not at All)
  16. Not Accounting for Test Duration (Minimum 14 Days)
  17. Ignoring Multiple Variants (A/B/C Splits Need 3x Traffic)
  18. Starting the Test During Unusual Traffic Periods
  19. How CustomFit.ai Handles Sample Size Automatically
0%
A/B Testing Sample Size Calculator: How Many Visitors Do You Need?

From the conversion glossary

Concepts referenced in this article, defined.

Definition
What Is Variant? Definition, Formula & Guide
Definition
What Is Sample Size? Definition & Guide
Definition
What Is Baseline? Definition, Formula & Guide
Definition
What Is Lift? Definition, Formula & Guide
Definition
What Is Significance? Definition, Formula & Guide
โ† Back to Ab Testing guide
Try CustomFit.ai

Run A/B tests and personalize your store without code. 14-day free trial, no credit card.

Start free trial โ†’
Share
XLinkedInEmail

Related articles

ab testing

Statistical Significance in A/B Testing: A Plain-English Guide

Statistical significance in A/B testing means there's less than a 5% chance your result is random. Here's what p-values, confidence levels, and sample size mean for your tests.

Sapna Joharยท 12 min read
ab testing

How A/B Testing Works: Step-by-Step Explained

A/B testing works by splitting traffic between two versions of a page, measuring which performs better on a conversion metric, and declaring a winner at statistical significance.

Sapna Joharยท 10 min read
ab testing

A/B Testing vs Split Testing: What's the Difference?

A/B testing and split testing are the same thing โ€” two names for the same experiment. Here's why the terms are used interchangeably and what actually matters.

Sapna Joharยท 7 min read

Start lifting conversions today.

Run rigorous A/B tests and personalize every visit on Shopify or any storefront โ€” no engineers required.

Start free trialBook a demo

Built for every D2C category

๐Ÿงด
Skincare
๐Ÿ’„
Beauty
๐ŸŒฟ
Wellness
โ˜•
F&B
๐Ÿ‘Ÿ
Apparel
๐Ÿ’
Jewelry
๐Ÿ›‹๏ธ
Home
๐Ÿผ
Baby
Live ยท Right now
Mamaearth โ€” free-shipping band +12.4% AOVGIVA โ€” festive collection page +34% revenueBellavita โ€” PDP CTA test +27.4% CVRKapiva โ€” Quiz-driven recs +9.48% CTRThe Sleep Co โ€” landing personalized 2ร— capturesPlum โ€” Returning shopper swap +18.2% CVRMamaearth โ€” free-shipping band +12.4% AOVGIVA โ€” festive collection page +34% revenueBellavita โ€” PDP CTA test +27.4% CVRKapiva โ€” Quiz-driven recs +9.48% CTRThe Sleep Co โ€” landing personalized 2ร— capturesPlum โ€” Returning shopper swap +18.2% CVR
Get in touch

Tell us about your store.

We reply within an hour during business hours. No sales pitch, no spam โ€” just answers from someone who's seen 2,400+ D2C stores.

โœ“ Reply within 1 hourโœ“ No spam, everโœ“ Free demo & setup help
โœ“ Thanks! We'll be in touch shortly.
CustomFit.ai

The all-in-one website personalization, A/B testing & CRO platform for high-growth D2C brands. Made by marketers, fueled by coffee.

in๐•โ—Žโ–ถf
Product
  • Features
  • A/B Testing
  • Personalization
  • AI Copilot
  • AI Wingman
  • AI Conversion Optimizer
  • Feature Flags
  • Widgets
  • Integrations
  • ROI Calculator
Platforms
  • Shopify
  • Shopline
  • Shoplazza
  • Salesforce
  • Chrome Extension
  • All Integrations
Resources
  • Blog
  • Playbooks
  • Webinars
  • GrowthFit Interviews
  • Free E-Books
  • Conversion Glossary
  • Case Studies
Compare
  • vs VWO
  • vs Optimizely
  • vs Google Optimize
  • vs Mutiny
  • vs Intelligems
  • vs Shoplift
  • vs AB Tasty
  • vs Convert
  • vs Kameleoon
Company
  • About Us
  • Partners
  • CustomFit Awards
  • Recognition
  • Contact
  • Privacy Policy
  • Terms & Conditions
ยฉ 2026 CustomFit.ai ยท Valley Monks Pvt Ltd ยท Made by marketers, fueled by coffee, and obsessed with conversions.
SOC 2 Type II ยท GDPR ยท CCPA ยท ISO 27001

You need roughly 4,500 to 8,000 visitors per variant to run a valid A/B test on a typical ecommerce product page โ€” but the exact number depends on three things: your current conversion rate, how large an improvement you want to be able to detect, and the confidence threshold you're working to.

Here's the quick version: the lower your baseline conversion rate and the smaller the improvement you're testing for, the more traffic you need. A 1% CVR page needs roughly twice the traffic of a 2% CVR page to detect the same relative lift.

If you want to understand why โ€” and how to calculate your specific number โ€” read on.

The Quick Sample Size Reference Table

Use this table as your starting point. These figures assume 95% statistical confidence, a 10% minimum detectable effect (relative), and a 50/50 traffic split between control and variant.

Baseline CVRMDE (10% relative lift)Absolute lift you're detectingVisitors per variantTotal visitorsDays at 500/day
1%+10% relative1.0% โ†’ 1.1%~16,000~32,00064 days
2%+10% relative2.0% โ†’ 2.2%~8,000~16,00032 days
3%+10% relative3.0% โ†’ 3.3%~5,500~11,00022 days
5%+10% relative5.0% โ†’ 5.5%~3,200~6,40013 days
8%+10% relative8.0% โ†’ 8.8%~2,000~4,0008 days
10%+10% relative10.0% โ†’ 11.0%~1,600~3,2007 days

Key takeaway: If your product page converts at 2% and you want to detect a 10% relative improvement (i.e., taking CVR from 2.0% to 2.2%), you need 8,000 visitors per variant โ€” 16,000 total. At 500 daily page visitors, that's 32 days.

If you can only tolerate a 30-day test, and you get 500 visitors per day to the page, you can reliably detect a 10% lift on anything above a 2% baseline.

What Inputs You Need (and How to Find Them)

Before you use any sample size calculator, you need four numbers. Here's where to get each of them.

1. Baseline Conversion Rate

This is your current CVR on the page you're testing. Pull it from your analytics platform (Google Analytics 4, Shopify Analytics, or your attribution tool) over the last 30โ€“90 days.

Important: Use the conversion rate for the specific goal you're testing, on the specific page you're testing. If you're testing a product page, use "add to cart" or "purchase" rate from that page โ€” not sitewide CVR.

Practical note for Indian D2C brands: Conversion rates vary significantly by traffic source. Organic search visitors typically convert at 2โ€“4%. Paid traffic (Meta, Google) tends to be 1โ€“2.5%. Brand traffic can be 5โ€“8%. If your test page receives a mix of sources, use the blended CVR but be aware that seasonal spikes (Big Billion Days, Diwali, Republic Day sales) will distort your baseline if they fall during the test period.

2. Minimum Detectable Effect (MDE)

MDE is the smallest improvement you want the test to be able to detect reliably. This is a business decision, not a statistics decision.

Ask yourself: what's the minimum CVR improvement that would justify implementing this change?

  • If a developer needs two days to implement the winning variant, a 3% relative lift probably isn't worth it โ€” you'd need a 10%+ lift to clear the ROI bar.
  • If it's a CSS change that takes 10 minutes, a 5% lift might be worth detecting.

For most D2C tests, 10% relative lift is a practical MDE. It's the threshold at which most implementations are clearly justified, and it's achievable on most reasonable page elements.

Avoid setting your MDE below 5% relative unless you have very high traffic. Chasing 2โ€“3% relative lifts requires enormous sample sizes and typically provides marginal business value.

3. Statistical Confidence Level

Use 95% (p < 0.05) as your default. This means there's a 5% chance of a false positive โ€” declaring a winner when none exists.

If you're testing something expensive to reverse (a full checkout flow redesign, a new pricing model), use 99% confidence. This roughly doubles your required sample size.

Never drop below 95%. The 90% threshold sounds reasonable but doubles your false positive rate compared to 95%, which corrupts your testing programme over time.

4. Traffic Split

Use 50/50 (equal traffic to control and variant) unless you have a specific reason not to. Equal splits are the most statistically efficient โ€” any deviation increases the total sample size you need.

Some teams use 90/10 or 80/20 splits when testing risky changes (protecting most traffic from a potentially worse experience). That's a valid risk management approach, but you'll need 5โ€“9x more total traffic to reach significance. Only use unequal splits for genuinely high-risk changes.

The Sample Size Formula Explained Simply

The full statistical formula involves standard deviations and z-scores. Here's the simplified version for conversion rate tests:

n = (Zยฒ ร— 2p(1-p)) / dยฒ

Where:

  • n = visitors needed per variant
  • Z = z-score for your confidence level (1.96 for 95%, 2.58 for 99%)
  • p = baseline conversion rate (as a decimal)
  • d = absolute difference you want to detect (baseline CVR ร— relative MDE)

Worked example:

  • Baseline CVR: 3% (p = 0.03)
  • MDE: 10% relative = 0.3% absolute (d = 0.003)
  • Confidence: 95% (Z = 1.96)

n = (1.96ยฒ ร— 2 ร— 0.03 ร— 0.97) / 0.003ยฒ n = (3.84 ร— 0.0582) / 0.000009 n = 0.2235 / 0.000009 n โ‰ˆ 24,833

Wait โ€” that's much higher than the 5,500 in the table above. Why?

Because this formula uses 80% statistical power (the ability to detect a real effect) as an additional parameter, and I simplified it above. The full formula that accounts for both 95% confidence AND 80% power is:

n = 16 ร— p(1-p) / dยฒ

(The constant 16 encodes both the z-scores for confidence and power.)

n = 16 ร— 0.03 ร— 0.97 / (0.003)ยฒ n = 16 ร— 0.0291 / 0.000009 n = 0.4656 / 0.000009 n โ‰ˆ 5,173 per variant

That matches the reference table. The key shortcut: n โ‰ˆ 16 ร— p(1-p) / dยฒ โ€” where d is the absolute CVR difference you want to detect.

You don't need to do this manually for every test. Use a pre-built calculator, or let your testing tool calculate it automatically. But understanding the formula helps you understand why traffic, MDE, and baseline CVR are so tightly connected.

Worked Example: A Shopify Product Page Test

Let's walk through a complete sample size calculation for a realistic D2C scenario.

Brand: A Bangalore-based D2C supplements brand Product: Whey protein, โ‚น2,199 per 1kg Test: Changing the product headline from "Whey Protein โ€” 24g per serve" to "Recover Faster. Build Stronger. 24g Whey Protein." Goal metric: Add-to-cart rate on the product page

Step 1: Pull baseline data From Shopify Analytics over the last 60 days:

  • Page visits: 12,400
  • Add-to-cart events: 372
  • Baseline add-to-cart rate: 372 / 12,400 = 3.0%
  • Daily page visits: approximately 200

Step 2: Set MDE The team decides: if the new headline generates less than a 10% relative lift (i.e., from 3.0% to 3.3%), it's not worth implementing given the complexity of updating it across multiple product pages. MDE = 10% relative = 0.3% absolute

Step 3: Calculate sample size n = 16 ร— 0.03 ร— 0.97 / (0.003)ยฒ n โ‰ˆ 5,173 visitors per variant Total: ~10,350 visitors

Step 4: Calculate test duration At 200 daily page visitors with a 50/50 split: 10,350 total visitors / 200 per day = 52 days

Step 5: Decision 52 days is within range for a high-AOV product where a 10% lift represents meaningful revenue. At โ‚น2,199 AOV and a current 200 visits/day:

  • Current revenue contribution: 200 ร— 3.0% ร— โ‚น2,199 = โ‚น13,194/day
  • At 3.3% CVR: 200 ร— 3.3% ร— โ‚น2,199 = โ‚น14,513/day
  • Incremental revenue: โ‚น1,319/day = โ‚น4.8 lakh per year

For a one-time headline change, a 52-day test is worth running.

Alternative: If 52 days feels too long, increase MDE to 20% (testing bolder variants) โ€” this reduces required sample size to ~1,300 per variant (13 days). But now you're only declaring a winner if the lift is 20%+, which may not happen with a minor copy tweak.

What to Do When Your Traffic Is Too Low

Low traffic is the reality for most early-stage Indian D2C brands. Here's how to run meaningful experiments anyway.

Focus on Higher-Traffic Pages

Your product page might get 150 visitors per day. Your homepage might get 1,200. Your collection page might get 800.

Test on higher-traffic pages first. Homepage hero headline tests, announcement bar tests, and collection page layout tests all run faster and build your testing muscle. Move to product page tests once you have the traffic โ€” or once you've grown the brand enough.

Increase Your MDE โ€” Test Bigger Changes

Test formula cvr mde confidence

If you only have traffic for a 20-day test and your calculator tells you that requires a 20% minimum detectable effect โ€” test a change that has a real shot at delivering 20%.

Bold changes: completely different value propositions, social proof vs. no social proof, lifestyle photography vs. product shots, short-form vs. long-form page.

Incremental changes (button colour tweaks, minor copy edits) are for brands with scale. Early-stage brands should be testing hypotheses, not optimising decimals.

Accept Longer Test Durations

Some tests are worth 60-90 day runtimes. Calculate the annual revenue impact of the lift you're trying to detect. If a 10% CVR lift on a โ‚น5,000 AOV product generating 50 orders per month = โ‚น3 lakh/year incremental revenue, a 90-day test is a reasonable investment.

The rule: longer test duration is fine if you commit to it upfront and don't peek. The problem isn't long tests โ€” it's stopping early because you saw a promising result on day 12.

Use Bayesian Testing for Low-Traffic Situations

Frequentist testing at 95% confidence requires substantial sample sizes. Bayesian A/B testing provides probabilistic guidance even with smaller samples โ€” "72% probability Variant B is better than Control."

This doesn't mean you're making decisions with 72% confidence in the same sense as frequentist 95%. But for low-traffic brands making reversible decisions, it provides a structured framework better than pure guesswork.

Sample Size Mistakes That Invalidate Tests

Mistakes invalidate tests

These mistakes are common, costly, and largely avoidable.

Calculating Sample Size After You Start (or Not at All)

If you don't calculate required sample size before launching a test, you have no principled basis for deciding when to stop. You'll stop when it "feels right" โ€” which usually means stopping when you see a result you like.

This is the most common testing mistake and the source of most false positives in D2C testing programmes. Always calculate before you start. Always.

Not Accounting for Test Duration (Minimum 14 Days)

Even if your calculator says you need only 7 days of data to hit the required sample size, always run for a minimum of 14 days.

Why? Consumer behaviour is not uniform across the week. Your conversion rate on a Saturday evening is different from a Tuesday morning. Buyers coming from paid ads on weekends behave differently from organic search visitors during the week. If you stop at 7 days, your sample may systematically over- or under-represent certain behavioural patterns.

The 14-day minimum captures two full weekly cycles, ensuring your results represent the true range of your audience's behaviour.

Ignoring Multiple Variants (A/B/C Splits Need 3x Traffic)

If you're testing three variants (Control + Variant A + Variant B), you need the required sample size for each variant โ€” not split between them.

That means:

  • A/B test at 5,000/variant โ†’ 10,000 total
  • A/B/C test at 5,000/variant โ†’ 15,000 total

This is why running too many variants simultaneously is problematic for low-traffic stores. Stick to A/B (two variants) unless you have the traffic to support more.

Also: when you run three variants, you're doing three comparisons (A vs B, A vs C, B vs C). This increases the risk of a false positive. Apply a Bonferroni correction: use p < 0.017 instead of p < 0.05 for each comparison to maintain a 5% overall false positive rate.

Starting the Test During Unusual Traffic Periods

Don't start a test during Diwali, Republic Day sale, a major paid campaign launch, or right after an influencer post goes viral. Your traffic composition during those periods is atypical and won't reflect your steady-state audience.

Run tests during "normal" traffic weeks. If you must run during a sale, keep the test running for at least 7 days after the sale period ends so steady-state behaviour re-enters the data.

How CustomFit.ai Handles Sample Size Automatically

Calculating sample sizes manually, monitoring significance without peeking, and knowing when to stop a test are all solvable problems โ€” but they require discipline and statistical literacy that many D2C growth teams simply don't have bandwidth for.

CustomFit.ai handles all of this in the platform:

  • Pre-test sample size guidance: Input your baseline CVR and desired MDE, and CustomFit.ai tells you how many visitors you need and how many days to run
  • Automatic significance monitoring: The platform tracks p-values and flags when tests reach significance โ€” without encouraging premature peeking
  • Proper stopping rules: Tests don't get flagged as "significant" just because they briefly crossed 95% on day 4; the platform uses sequential testing methods to control false positive rates
  • Multi-variant support: Running A/B/C tests? CustomFit.ai adjusts significance thresholds automatically to account for multiple comparisons
  • Segment-aware results: See significance broken down by device type, traffic source, and new vs returning visitors

The result: your team focuses on test ideas and business decisions โ€” not on spreadsheets and significance calculators.

For the full context on significance and what these numbers mean, read our guide on statistical significance in A/B testing. To understand the full testing framework from hypothesis to decision, start with what is A/B testing or the A/B testing pillar guide.

1,000+ D2C brands use CustomFit.ai to run statistically valid A/B tests without needing a data science team. 14-day free trial ยท No credit card required ยท Setup in under 30 minutes.

Start Your Free Trial ยท Book a Demo