Home /Product /Analytics
Experiment analytics

A/B test analytics, built for decisions.

Every CustomFit test ships with Bayesian + frequentist verdicts, revenue per visitor, segment-aware significance, and holdout measurement — so you know which variant won, for whom, by how much, and whether to ship it. No spreadsheets. No p-value squinting.

By Ashwin Kumar, Co-Founder & CEO·Updated
TL;DR
  • Read revenue per visitor (RPV) before conversion rate — CVR-wins that tank AOV are common.
  • Use Bayesian probability for the everyday call (peeking-safe); keep frequentist p-values for audits.
  • Always check segment-level significance — a pooled loser is often a mobile-new-visitor winner.
  • Carve a 5–10% holdout and never touch it. It's the only honest answer to "is the program working?"
  • Auto-promote winners and auto-pause losers on configurable thresholds — stop babysitting tests.
What you see

One experiment, every angle that matters.

The verdict view CustomFit shows you the moment a test crosses your significance + lift gate. No exports. No SQL. No second tool.

PDP Hero CTA · "Ships tonight" vs "Add to cart"
Running 14 days · 25,643 visitors · 2 variants
Winner declared
A — Control
"Add to cart" · 12,842 visitors
2.84%
CVR
$1.84
RPV
baseline
B — "Ships tonight"
12,801 visitors
3.62%
CVR
$2.34
RPV
+27.4%
RPV lift
Bayesian P(B>A)
99.2%
p-value
0.012
95% CI on lift
+18.4% to +36.5%
Revenue impact
+$28,412 / wk
The metrics

Eight numbers we report on every experiment.

Each metric answers a different question. Read them in this order; never declare a winner on a single one.

MetricWhat it isWhy it mattersGotcha
Revenue per visitor (RPV)Revenue ÷ unique visitors, per variant.The only metric that maps directly to P&L. Catches CVR-lifting, AOV-tanking variants.Needs a stable AOV baseline; trim refunds before you call a winner.
Conversion rate (CVR)Orders ÷ unique visitors, per variant.Fast to read, easy to explain, comparable across tests.Blind to AOV. A win here can be a loss in revenue. Always pair with RPV.
Average order value (AOV)Revenue ÷ orders, per variant.Surfaces bundling, upsell, and offer-framing effects.High variance on low order counts; needs longer runtime than CVR.
Bayesian probability to winP(B > A) given the data so far.Peeking-safe, intuitive — "86% likely B wins" beats "p = 0.04."Requires a sensible prior. Default to neutral; tighten only with strong reason.
Frequentist p-valueProbability of observing this lift under the null hypothesis.Familiar to legal, finance, and most analytics teams.Peeking inflates false positives. Set runtime upfront; don't stop on first p < 0.05.
Confidence interval (CI)Plausible range for the true lift.Communicates uncertainty better than a point estimate. Width = sample size.A wide CI straddling zero is not a winner, no matter how nice the midpoint looks.
Lift% change vs control on the primary metric.Headline number for stakeholders.Always quote with the CI. Lift without bounds is decoration.
Holdout upliftTreated cohort revenue vs holdout cohort revenue.Proves the entire program is paying off, not just individual tests.Holdout must stay untouched. Tempting to "borrow" the traffic; don't.
Stats engine

Bayesian for speed. Frequentist for audit.

CustomFit runs both engines on every experiment. Bayesian probability answers "how likely is B better than A right now?" — peeking-safe, easy to act on. Frequentist p-values answer "assuming no real effect, how surprising is this data?" — familiar to finance, defensible in audits.

You don't have to pick a religion. Both are shown side by side. Decide which one your team trusts and gate auto-promote on it.

Same experiment, both engines
EngineVerdict
Bayesian: P(B beats A)99.2%
Bayesian: expected RPV lift+$0.50
Frequentist: p-value0.012
Frequentist: 95% CI on lift+18.4% – +36.5%
Min. detectable effect (set)±10% on RPV
Runtime to 80% power11 days (actual: 14)

Both engines agree on this experiment — common for well-powered tests. When they disagree, the data is too noisy to ship either way; let it run.

Segment truth

The pooled answer hides the real one. Segment lift surfaces it.

CustomFit slices significance by every audience attribute you have — geo, device, new-vs-returning, intent, traffic source. Tests that look neutral overall often hide double-digit wins inside specific segments.

PDP Hero CTA — segment-level lift
SegmentVisitorsRPV liftP(B>A)
All visitors (pooled)25,643+27.4%99.2%
Mobile · India · new9,142+38.1%99.7%
Mobile · US · new5,206+22.4%97.1%
Desktop · returning4,887+11.8%84.6%
Paid social referral3,108+31.6%96.2%
Organic · branded3,300+5.2%62.4%

The pooled win is real — but the mobile-IN-new-visitor cohort delivered it. Ship variant B to that segment first; let the rest of the test mature before rolling broadly.

Q2 program holdout — 7% of traffic
Treated traffic RPV$3.84
Holdout traffic RPV$3.21
Incremental lift+19.6%
Incremental revenue (90d)+$184,266
Tests shipped in window11

Holdout-vs-treated is the only board-ready proof that the entire experimentation program is paying off — not just any single test.

Program-level proof

The 5–10% holdout is your only honest answer.

Individual A/B test wins compound — but stacking lifts on paper doesn't prove the whole program is moving revenue. A small slice of traffic that never sees any personalization is the cleanest counterfactual.

CustomFit reserves the holdout automatically, locks it from edits, and reports the incremental revenue contribution at any cadence you set — weekly, monthly, quarterly. It's the number your CFO wants. It's also the only one that survives an audit.

Don't do these

Five ways teams fool themselves reading test results.

Each of these looked like a winning experiment. Each lost money. Watch for them.

#PitfallSymptomFix
01Peeking at frequentist testsYou stop the test the moment p drops below 0.05, then it climbs back.Lock the runtime upfront, or use Bayesian probability (peeking-safe by design).
02Calling CVR a win, missing the AOV lossCVR +6%, RPV -2%. You celebrated the wrong number.Always read RPV alongside CVR. If they disagree, RPV wins the tie.
03Pooled significance hiding segment truthTest "loses" overall, but mobile-IN-new-visitor is a +28% blowout.Read segment-level significance before killing a variant. Ship targeted, not pooled.
04Underpowered tests on tiny traffic200 visitors per variant, CI wider than the moon. You shipped on noise.Pre-compute MDE. If you can't power it in 21 days, don't run it; pick a bigger lever.
05No holdout, no program-level proofEight "winning" tests shipped, but revenue is flat vs same-quarter-last-year.Carve a 5–10% holdout. Measure incremental revenue against it. Defend the program.
Decisions on autopilot

Auto-promote winners. Auto-pause losers.

Set the thresholds once. Every test that clears them gets shipped or killed without waiting on a Monday review. You stay in the loop on every change.

Significance gate

Pick Bayesian P(B>A) ≥ 95% or frequentist p < 0.05 — set per account, override per test.

Minimum lift gate

Don't promote a 0.2% lift even if it's significant. Set a floor (default +3% on RPV).

Runtime guard

Refuse to call winners before minimum runtime. Kills the peeking trap by design.

Loser pause

When P(B>A) drops below 5% after powered runtime, auto-pause and notify the owner.

Slack + email digest

Daily digest of new winners, losers, and tests crossing thresholds — to the channels you choose.

Audit log

Every promotion, pause, and threshold change is recorded with actor and timestamp.

Sample-ratio alarm

If the A/B traffic split drifts from 50/50 by more than 2pp, the test is flagged — your data is lying before you read it.

Staged rollout

Promote at 25% → 50% → 100% with auto-monitor at each step. Catch regressions before they hit full traffic.

A/B test analytics — common questions.

What is A/B test analytics?

A/B test analytics is the layer that turns raw experiment traffic into business decisions — measuring conversion rate, revenue per visitor, statistical significance, and segment-level lift. Strong analytics tells you which variant won, by how much, in which segments, and how confident you can be — without you running the math yourself.

Bayesian or frequentist — which should we use?

Both, depending on the question. Bayesian gives you the probability variant B beats A (intuitive, peeking-safe, lets you call winners earlier). Frequentist gives you the p-value (familiar to most stakeholders, defensible in audits). CustomFit shows both for every experiment — you decide which to act on.

Why measure revenue per visitor instead of conversion rate?

Conversion rate alone is misleading — a variant can lift CVR while dropping AOV, leaving you flat or negative on revenue. Revenue per visitor (RPV) bakes both into one number that maps to your P&L. Always check RPV before declaring a winner; it catches lifts that look real but lose money.

What is segment-aware significance?

It's calculating significance for each audience segment (mobile vs desktop, new vs returning, geo, intent) instead of only the pooled population. A variant can lose overall but win sharply for first-time mobile visitors in India — segment-aware analytics surfaces that so you can ship targeted personalizations instead of one-size-fits-all winners.

Do you support holdout measurement?

Yes. A holdout is a small slice of traffic (typically 5–10%) that never sees any active personalization or test variant. Comparing the holdout against treated traffic tells you the true incremental revenue contribution of the entire program — not just per-test. Without a holdout, you can't prove the program is working in aggregate.

Can experiments auto-promote winners?

Yes — once a variant clears your configured significance + minimum-lift thresholds, CustomFit can auto-promote it to 100% traffic and notify the team, or surface it as a one-click approval. Same logic auto-pauses statistically dead losers so they stop costing you traffic. Thresholds are per-account and overridable per-test.

Keep reading

Go deeper.

A/B testing — the productThe CRO pillar guidePersonalization at scaleConversion glossaryAll product featuresCustomer case studies

Stop guessing which test won.

14-day free trial. Bayesian + frequentist, segment-aware lift, holdout measurement — included on every plan.

Built for every D2C category

🧴
Skincare
💄
Beauty
🌿
Wellness
F&B
👟
Apparel
💍
Jewelry
🛋️
Home
🍼
Baby
Live · Right now
Mamaearthfree-shipping band +12.4% AOVGIVAfestive collection page +34% revenueBellavitaPDP CTA test +27.4% CVRKapivaQuiz-driven recs +9.48% CTRThe Sleep Colanding personalized 2× capturesPlumReturning shopper swap +18.2% CVRMamaearthfree-shipping band +12.4% AOVGIVAfestive collection page +34% revenueBellavitaPDP CTA test +27.4% CVRKapivaQuiz-driven recs +9.48% CTRThe Sleep Colanding personalized 2× capturesPlumReturning shopper swap +18.2% CVR