When a DX Engine Component Rule is configured for A/B Testing, multiple configurations (aka Variants) are specified. When that Component Rule is triggered, one of those variants will be used. The DX Engine supports two approaches for selecting which variant to use:
- Weighted Probability
- Multi-ared Bandit
With this approach, within a given Component Rule, each variant is given a numeric value (relative weight) to indicate the degree to which it will be chosen over other variants. The variant value is the number of times it will be selected out of a number of executions equal to the sum of all the relative weights.
For example, consider we have a Component Rule,
Featured Product Image with Variants:
|Configuration||Relative Weight Value|
Headphones will be selected approximately 2 out of every 6 (2 + 1 + 3) executions (~33%).
Gaming Console will be selected approximately 1 out of every 6 (2 + 1 + 3) executions (~17%).
Laptop will be selected approximately 3 out of every 6 executions (~50%).
A multi-armed bandit is a type of problem which involves balancing exploration and exploitation. In a multi-armed bandit problem, an agent (DX Engine) must choose from a number of different "arms" (Variants), each of which has a different probability of yielding a reward. The goal is to maximize the reward over time by balancing the exploration of new actions in order to learn more about their rewards, with the exploitation of known high-reward actions. There is always a trade-off between exploration and exploitation, and finding the optimal balance.
When a Component Rule is configured to use Multi-armed Bandit for variant-selection, it must specify the following:
- Algorithm - Epsilon-greedy, Upper confidence bound, Boltzmann Exploration (Softmax)
- Variant Reward Events - The Events, that, when seen within a session, count as a successful outcome for which a Variant should be rewarded.
- Variant Event Reward Value - For each Variant Reward Event, the value that will be rewarded to a variant.
Over time, the selected algorithm will continue to ingest real-world activity (Variant Reward Events) in order to fine-tune what it thinkg the optimal variant is.