AI & Quantitative3 min readUpdated Mar 2026

Thompson Sampling

A Bayesian probability-matching algorithm that balances exploration of uncertain options with exploitation of known winners, commonly used in multi-armed bandit problems and adaptive trading strategy selection.

See Thompson Sampling in real trade signals

Tradewink uses thompson sampling as part of its AI signal pipeline. Get signals with full analysis — free to start.

Preview Signals

Explained Simply

Thompson Sampling is a principled way to solve the explore-exploit dilemma: when should you stick with what's working vs. try something new?

The classic setup: You have N strategies (or slot machines). Each has an unknown true win rate. You want to maximize total wins over time. If you only exploit (always play the currently best-performing strategy), you might miss a better option that hasn't been tried enough. If you always explore (try everything equally), you waste time on clearly inferior options.

Thompson Sampling's elegant solution:

For each strategy, maintain a probability distribution over its true win rate (a Beta distribution initialized at Beta(1,1) — equal uncertainty)
Each decision round: sample one value from each strategy's distribution
Pick the strategy with the highest sampled value
After observing the outcome, update that strategy's distribution (win: add 1 to alpha; loss: add 1 to beta)

Why it works: Strategies with high uncertainty (few trials) have wide distributions with high variance — they're often sampled high, getting explored. Strategies with consistent wins narrow to high values. Strategies with consistent losses narrow to low values. Over time, allocation naturally concentrates on high performers while never completely abandoning untried options.

Key advantages over simpler approaches:

No fixed exploration rate to tune (like epsilon-greedy)
Provably near-optimal regret bounds
Naturally reduces exploration as more data is collected
Handles non-stationary environments well with discounting

For trading strategy selection, Thompson Sampling adapts to changing market regimes: if momentum strategies start underperforming (losses → beta increases), they get sampled lower more often, reducing allocation automatically.

Thompson Sampling vs. Other Bandit Algorithms

Epsilon-Greedy: With probability ε, explore randomly; otherwise, exploit the current best. Simple but requires tuning ε — too high wastes time exploring; too low fails to adapt to change.

UCB (Upper Confidence Bound): Choose the option with the highest upper confidence bound (mean + exploration bonus). Deterministic given the same data. Doesn't naturally handle non-stationary environments.

Thompson Sampling: Probabilistic sampling from posterior distributions. No tuning required. Naturally adaptive. Handles non-stationarity with discounting. Computationally cheap for Beta distributions.

For trading strategy selection, Thompson Sampling's natural adaptation to changing market regimes makes it the preferred choice.

How to Use Thompson Sampling

1
Understand the Explore-Exploit Tradeoff
Thompson sampling balances exploring (trying different strategies to learn which works best) with exploiting (using the strategy that currently appears best). It samples from probability distributions of each strategy's success rate to decide which to use next.
2
Apply to Strategy Selection
Maintain a Beta distribution for each trading strategy based on its win/loss record. Each day, sample from each distribution and use the strategy that draws the highest value. Strategies with high win rates are exploited more; underexplored strategies occasionally get tested.
3
Update After Each Trade
After each trade, update the winning strategy's Beta distribution: add 1 to alpha (success parameter) for wins, 1 to beta (failure parameter) for losses. Over time, the distributions narrow and Thompson sampling converges on the best-performing strategy while still occasionally exploring alternatives.

Frequently Asked Questions

How quickly does Thompson Sampling adapt to a regime change?

Adaptation speed depends on the discount factor applied to historical outcomes. With no discounting, older data fully weights newer data. With a 0.9 discount per trade, outcomes from 20 trades ago have only 12% weight. Tradewink uses time-based discounting — outcomes older than 2 weeks are down-weighted, allowing adaptation to weekly regime shifts while still using sufficient historical data.

Can Thompson Sampling be used for parameter optimization too?

Yes — Bayesian optimization (a variant of Thompson Sampling with Gaussian Processes instead of Beta distributions) is widely used for hyperparameter tuning. For continuous parameter spaces (like stop-loss multiplier), you replace the Beta distribution with a Gaussian Process prior.

Does Thompson Sampling require a lot of data to work?

It works from the very first trial — it starts with a uniform prior (equal uncertainty about all strategies) and updates with each observation. It becomes more reliable with more data, but it produces valid allocations even with just a handful of trades per strategy.

How Tradewink Uses Thompson Sampling

Tradewink's `RLStrategySelector` uses Thompson Sampling to adaptively weight intraday strategies (momentum breakout, VWAP mean-reversion, ORB, etc.). Each strategy maintains Beta distribution parameters updated after every trade outcome. The selector samples from each distribution to decide which strategy to prioritize in the current scan cycle. This means strategies that have been performing well recently get higher allocation, while underperforming strategies are naturally de-weighted — without a static schedule or manual tuning. The system also applies time decay to historical outcomes, so a strategy's performance last month matters less than its performance this week.

Save a signal preview for later

Get a concise AI signal example in your inbox, then build a watchlist when you are ready. No spam, unsubscribe anytime.

Learn More

How AI Day Trading Bots Actually Work: The 8-Stage Pipeline from Data to Execution

A builder's breakdown of a production AI day trading system. Covers the full pipeline: market data ingestion, regime detection, screening, AI conviction scoring, position sizing, execution, dynamic exits, and self-improvement.

Market Regime Detection: How AI Identifies Bull, Bear, and Choppy Markets

Market regime detection uses statistical models to classify whether the market is trending, mean-reverting, or in transition. Learn how Hidden Markov Models and efficiency ratios power regime-aware trading systems.

How to Backtest Trading Strategies: A Practical Guide for 2026

Learn how to backtest trading strategies properly -- avoid common pitfalls like overfitting, survivorship bias, and look-ahead bias. Includes frameworks, metrics, and validation techniques.

See Thompson Sampling in real trade signals

Tradewink uses thompson sampling as part of its AI signal pipeline. Get daily trade ideas with full analysis — free to start.

Explained Simply

Thompson Sampling vs. Other Bandit Algorithms

How to Use Thompson Sampling

Frequently Asked Questions

How quickly does Thompson Sampling adapt to a regime change?

Can Thompson Sampling be used for parameter optimization too?

Does Thompson Sampling require a lot of data to work?

How Tradewink Uses Thompson Sampling

Save a signal preview for later

Related Terms

Learn More

See Thompson Sampling in real trade signals