Thompson Sampling (also called posterior sampling or probability matching) is a Bayesian multi-armed bandit algorithm that allocates traffic to variants by sampling a random value from each variant's posterior probability distribution, then routing the visitor to the variant with the highest sampled value. Unlike Epsilon-Greedy, which uses a fixed exploration rate, Thompson Sampling automatically calibrates exploration to uncertainty — variants with wide, uncertain distributions get explored more; variants with narrow, confident distributions get exploited more. It was first described by William R. Thompson in 1933 and is now the most widely used bandit algorithm in ecommerce personalisation.
Mechanism: For each visitor, sample θᵢ ~ Beta(αᵢ, βᵢ) for each variant i, where αᵢ = conversions + 1 and βᵢ = non-conversions + 1. Route the visitor to the variant with the highest sampled θ.
Why Thompson Sampling Matters for Ecommerce
Thompson Sampling is considered the gold standard for bandit-style ecommerce optimisation because it adapts intelligently — early in an experiment when uncertainty is high, it explores widely; as data accumulates and uncertainty decreases, it converges on the winner automatically without any manual epsilon tuning.
For D2C brands running dozens of concurrent personalisation campaigns (seasonal banners, recommendation widgets, promotional offers), Thompson Sampling manages the exploration-exploitation tradeoff for each campaign independently, at scale. This is particularly valuable for category pages or PDP recommendation sections where the "right" variant might differ by product type, customer segment, or traffic source.
Indian ecommerce brands running high-SKU catalogues benefit from Thompson Sampling's ability to handle 3, 5, or 10+ variants simultaneously without the traffic dilution problems that plague multivariate A/B tests — the algorithm naturally routes more traffic to promising variants and less to clear losers.
Real-World Example
Nykaa runs Thompson Sampling across their "Complete the Look" recommendation widget on product pages, testing five different recommendation algorithms simultaneously: collaborative filtering, content-based, trending items, recently viewed, and curated editorial picks. The algorithm starts with uniform uncertainty across all five. Within 4 days, collaborative filtering emerges as the top performer for haircare SKUs (12.3% click rate vs. 7–9% for others), while editorial picks performs best on premium skincare (14.1% click rate). Thompson Sampling routes 75% of haircare traffic to collaborative filtering and 70% of premium skincare traffic to editorial picks — personalising by product category without any manual rule-writing. Estimated AOV lift: ₹220 per recommendation widget click.
How to Improve / Optimize Thompson Sampling
- Use Beta distribution for binary conversion metrics. The Beta-Bernoulli conjugate model is the correct implementation for click-through rate, add-to-cart rate, or purchase conversion rate. For continuous metrics (AOV, revenue per visit), use a Normal-Normal model instead.
- Initialise with informed priors when available. If you have historical data on similar experiments, use it to set the initial alpha and beta parameters rather than starting from uniform (1,1). This reduces the cold-start exploration period significantly.
- Don't terminate Thompson Sampling tests abruptly. Unlike A/B tests, MAB algorithms designed for long-running optimisation may never formally "end." Define a shipping criterion before starting — for example, when one variant consistently receives 85%+ of traffic for 5 consecutive days.
- Stratify by segment if possible. Thompson Sampling can be run separately for different user segments (new vs. returning, mobile vs. desktop, Delhi vs. Mumbai). Segment-specific models outperform global models when segment behaviour differs significantly.
- Compare cumulative regret against A/B test baseline. Use simulation to confirm Thompson Sampling outperforms a fixed split on your expected traffic and effect sizes. For very small traffic volumes (< 50 conversions/day), a simple A/B test may produce equally reliable results with less complexity.
Thompson Sampling in A/B Testing
Thompson Sampling represents the Bayesian approach to adaptive experimentation. It is the preferred bandit algorithm for ecommerce personalisation because it requires no hyperparameter tuning (unlike Epsilon-Greedy's epsilon), handles multiple variants naturally, and minimises regret more efficiently than simpler alternatives. CustomFit.ai's personalisation engine uses Thompson Sampling to continuously optimise variant allocation across active experiments.
Run smarter A/B tests with CustomFit.ai — 14-day free trial, no credit card required.