A/B Testing
A/B testing is a powerful feature of Conscia that allows you to optimize your multi-channel experiences. This document will guide you through Conscia's unique approach to A/B testing, focusing on our real-time context-based experience rules.
At the heart of Conscia's multi-channel experience orchestration are Components, the building blocks that define the behavior of each experience. The behavior of these Components is controlled by Experience Rules, which respond to a customer's real-time context.
An experiment is run within the context of a specific Experience Rule. An Experience Rule defines the Trigger conditions of the experience, which include the audience criteria, and the Target Experience. When you designate an Experience Rule as an A/B test, it automatically becomes an Experiment.
Configuring an A/B Test
When setting up an A/B test in Conscia, you can control the distribution of traffic between the different variants of your experience rules.
Select the Rule: Identify the Experience Rule that you would like to test and check the "Enable A/B Test" box.
Define Variants: Within the Target Experience portion of the rule, create the variations that you would like to compare. Designate one of the variants to be the Control Group.
Set Traffic Distribution: Specify the proportion of traffic that will be directed to each variant.
With this approach, within a given Experience Rule, each variant is given a numeric value (relative weight) to indicate the degree to which it will be chosen over other variants. The variant value is the number of times it will be selected out of a number of executions equal to the sum of all the relative weights.
For example, consider we have an Experience Rule, Featured Product Image
with Variants:
Configuration | Relative Weight Value |
---|---|
Headphones | 2 |
Gaming Console | 1 |
Laptop | 3 |
Headphones
will be selected approximately 2 out of every 6 (2 + 1 + 3) executions (~33%).
Gaming Console
will be selected approximately 1 out of every 6 (2 + 1 + 3) executions (~17%).
Laptop
will be selected approximately 3 out of every 6 executions (~50%).
Stickiness
When a session begins, the identify request is sent to Conscia. This call requires you to provide a userID which is later used to associate all user actions to each other, calculate metrics for experiments and create a customer profile.
The user can be assigned to variants in one of the following ways:
Session-Sticky: The user will stay with the assigned variant for the entirety of the session.
User-Sticky: The user will stay with the assigned variant for all sessions until the experiment ends.
Non-Sticky: The user may be reassigned to a different variant in a new session or even during the same session.
Calculating Metrics for Each Experiment
Measuring the impact of your experiment correctly is vital to gain reliable and actionable insights. In Conscia, the calculation of metrics depends on the type of metric, your experiment setup, and the nature of your customer interactions. Here's a broad overview:
Success Metrics
When defining a variant within an experiment, you need to specify metrics that determine the success or failure of that experiment. The most common types of metrics are conversion rate, bounce rate, average order value, bounce rate and cart abandonment rate. Note that 'conversion' could mean different things depending on what business outcome is desired. For ecommerce websites, it could be completion of the checkout process, account creation, engagement on social media, etc.
Metrics are calculated from events that are registered with Conscia via the Track API as the customer/end user interacts with your digital properties. Conscia allows you to register a variety of events. These are documented here: Events
Gathering Data
Once you kick off your experiment in Conscia, visitors to your website or application are randomly bucketed into different variants, which includes a control group and one or more test groups. Conscia tracks and records each user's interactions, which can range from clicks, page views, form submissions to purchases, and many more.
Counting Conversions
In Conscia, a conversion is counted each time a user completes the target action defined for your A/B test. It could range from clicking a specific link, making a purchase, filling out a form, or any other action that aligns with your experiment's goal.
The platform accurately records the number of conversions for each variant. Depending on the specifics of your experiment and the nature of your metric.
Selecting Metrics for Your Experiment
Choosing the right metrics is crucial for the success of your A/B testing experiment. They act as indicators of the experiment’s effectiveness and impact on user behavior. Here are some guidelines to help you select the most appropriate metrics:
Understand Your Goal
Before selecting metrics, clearly define the goal of your experiment. This goal should align with your broader business objectives. For instance, if your business goal is to increase customer retention, a potential experiment goal could be to improve user engagement on your platform.
Identify Relevant Metrics
Once the goal is set, identify metrics that will accurately measure the performance of your experiment. Using our previous example, appropriate metrics for user engagement could be time spent on site, pages per session, or event-specific actions (like clicking a particular button).
Primary and Secondary Metrics
It's helpful to distinguish between primary and secondary metrics:
Primary Metrics: These are directly related to the experiment’s goal and provide a clear measure of its success. In most cases, there should only be one primary metric to avoid diluting focus.
Secondary Metrics: These are additional metrics that provide more context to your results. They help monitor unintended consequences and provide additional insights. For example, while your primary metric may be conversion rate, secondary metrics could include average session duration, bounce rate, or number of new sign-ups.
Configuring Experiment Metrics
In Conscia, each experiment allows the selection of a primary metric and several secondary metrics for precise and multi-dimensional evaluation of the results.
A metric in Conscia is defined by selecting a combination of an event type, a property associated with the event, an operator, and a value. The operator could be 'equals' , 'greater than' or 'less than' etc depending on the nature of the property and the specific question you're trying to answer with your experiment.
Let's consider an example where the event type is "Product Purchase", a common event in e-commerce platforms.
Event Type: Product Purchase Property: Purchase Amount Operator: Greater than Value: $100
The metric defined by this combination would measure the number of purchases that have a purchase amount greater than $100. This metric can help identify how many high-value transactions are happening, which can be useful for experiments aimed at increasing the average transaction value on the website.
Furthermore, Conscia allows you to define a conversion count scope. For each metric, you can pick the way the conversions are tallied:
Total Conversions: This is the aggregate number of successful events or actions (as defined by the metric) that have been completed during the experiment.
Unique Conversions Per Visitor: This is the count of individual users who have successfully completed the action defined by the metric at least once during the experiment. It is a measure of the unique success rate and helps in understanding the reach of the experiment.
Unique Conversions Per Session: This is the count of individual sessions where a conversion event was successfully completed at least once during the experiment.
Conversion Rate: The conversion rate is calculated by dividing the number of conversions by the total number of participants in the experiment. This ratio provides a standardized measure of success that can be used to compare the performance of different variants, irrespective of the differences in sample size. The Conversion Rate is always available regardless of the way you choose to tally your conversions.
Here is a screenshot of the form used to configure A/B Test Success Metrics.
[A/B Test Configuration]
Importance of the Control Group
In A/B testing, a control group serves as the benchmark or standard against which other test groups (also known as variants or treatments) are compared. The main goal of having a control group is to provide an unchanged reference point that allows you to accurately measure the impact of changes or treatments you introduce in your test groups.
Here's how control groups work in A/B testing:
Random Assignment: When an A/B test starts, every participant (like a website visitor) is randomly assigned to either the control group or one of the variant groups. The random assignment helps ensure that each group is statistically similar and that any observed differences in outcomes can be attributed to the changes introduced, rather than some inherent difference between the groups.
No Changes Applied: The control group is exposed to the original, unchanged version of whatever is being tested. This could be a webpage, an email, an app interface, or some other user experience. The control group essentially experiences the "business as usual" scenario.
Measure and Compare: The behavior of the control group and variant group(s) are then monitored and measured based on predetermined metrics (like click-through rate, conversion rate, time spent on page, etc.). The performance of the variant groups is then compared to the control group to determine if the changes made had any statistically significant impact.
Decision Making: If the variant outperforms the control based on your success metrics and the results are statistically significant, you might decide to implement the changes for all users. If the variant underperforms the control, or there's no significant difference, you might decide to stick with the original version or iterate on your changes and run another test.
You can designate one of your variants as the 'Control' Group.
Statistical Significance
In the context of A/B testing, statistical significance determines whether a variant's impact on a success metric is due to the changes made or is a result of random chance. The higher the statistical significance, the less likely the observed results happened by chance.
In Conscia, statistical significance is automatically calculated for each metric. The platform uses a significance level of 0.05 (5%), which means that if a variant shows an improvement over the control with a statistical significance less than or equal to 5%, it can be confidently concluded that the variant had an effect on the improvement.
Conversion Attribution in A/B Testing
In today's fragmented consumer journey, the path to purchase is often a complex series of touchpoints, rather than a direct, linear path. Consequently, navigating the complex landscape of A/B tests and personalization campaigns, and pinpointing increased conversions and revenue to distinct variations, presents a formidable challenge for modern marketers.
Attribution, in the context of A/B testing and personalization, is the act of ascribing credit to unique touchpoints or engagement actions along the consumer's conversion route. This essential process allows us to determine if specific experiences or variations should be credited for revenue or conversion events. This insight gives us a deeper understanding of our actions' influence on buyer behavior, enabling us to present content that is most likely to convert.
Consider a typical A/B test. A user encounters a variation and makes a purchase several days later. Should we attribute this conversion to the exposed variation? The answer isn't as straightforward - it largely depends on the specifics of the experiment. This scenario highlights the need for moving beyond rigid conversion attribution models to a more flexible approach, which can truly uncover the key influencers of conversions and guide optimization initiatives.
To accurately tie conversions to a specific experiment or personalized experience, it's vital to comprehend the different conversion attribution configurations:
Session-Level Attribution Scoping: If a user is exposed to an A/B test variation or personalized experience, any subsequent revenue and conversion events within the same session are attributed to that variation.
User-Level Attribution Scoping: If a user views an A/B test variation or personalized experience, all subsequent revenue and conversion events from that user are attributed to that variation, for the duration of the active experience.
Choosing between session-level and user-level attribution largely depends on the scenario. For example, suppose you're introducing a loyalty program pop-up on your e-commerce site. You choose to show the pop-up only once per week to avoid disrupting your users.
A user visits your site on Monday, sees the pop-up, but doesn't sign up and leaves the session. The same user returns on Thursday, signs up for the loyalty program, and makes a purchase. Should the pop-up be attributed to this conversion? Likely not. Limiting the attribution window to the session helps you sift out unrelated conversions, leading to a quicker determination of what genuinely influences conversion.
Alternatively, consider a customer's journey to purchasing a luxury item such as a designer handbag, which typically takes place across multiple sessions. The user might browse different products, compare prices, consult with friends, and eventually return to the website to make the purchase. Limiting your attribution to a single session could undervalue the impact of specific personalized experiences. In such a case, user-level attribution is a more suitable approach.
In conclusion, the choice of conversion attribution windows should be adapted to the context, allowing marketers to correctly assign conversions and efficiently optimize their marketing strategies.
Running Multiple Experiments Simultaneously
In Conscia, you have the flexibility to conduct several experiments concurrently. This allows you to test multiple hypotheses and learn about your customers at a faster pace. Here's how you can run multiple experiments at the same time:
Experiment Independence
Each experiment operates independently of each other, meaning a user can be part of multiple experiments at the same time. The assignments are independent, allowing you to simultaneously test various elements of your user experience.
For instance, if you have two experiments running — one for a homepage redesign (A/B test) and another for a promotional offer (multivariate test) — a user could be placed in Variant B for the homepage experiment and in Variant 2 for the promotional offer.
Bucketing Users for Multiple Experiments
Users are allocated to "buckets", each representing different experiments. If a user is part of multiple experiments, they will be placed in different "buckets" corresponding to each experiment.
Considerations for Multiple Experiments
While running multiple experiments simultaneously can speed up your learning, there are a few things you need to consider:
Interaction Effects: Be cautious of interaction effects, where one experiment may impact the outcome of another. This can complicate the analysis and potentially skew results. To prevent this, you can carefully plan your experiments and avoid overlap where interactions could occur, or use advanced statistical techniques to account for interaction.
Statistical Significance: Ensure each experiment still maintains a sufficient sample size to detect statistically significant results. Spreading users thinly across multiple experiments could reduce the power of your tests.
User Experience: Consider the overall user experience when running multiple experiments. Changes from multiple experiments could lead to a disjointed or inconsistent experience for the user.
As always, while running experiments, monitor your metrics closely and be prepared to adjust if the results indicate a negative impact on the user experience or if the data shows significant interaction effects between experiments.
Analytics and Reporting
Here are the four sections of the Experiment Report:
Summary about the Experiment: This section offers an overview of the entire A/B testing experiment and includes the following:
- Number of days running: The total duration the experiment has been live.
- Audiences: A list of different audience segments that have interacted with the test.
- Experiment visitors: The total count of visitors who have participated in the experiment.
- Winning Variant: The variant that outperforms the others based on the predetermined success metrics.
Report Filters: These allow you to refine the displayed data based on specific parameters:
- Date Range: Enables you to focus on a specific time period for analysis.
- Audience Segment: Allows you to filter results based on distinct audience groups.
- Baseline: The standard or control variant to which other variants' results are compared.
Stats compared across Variants: This part presents crucial data points for each variant, such as:
- Visitors: The total number of unique visitors that each variant has attracted.
- Conversion Event Metrics: Key indicators of specific user actions that were tracked during the experiment, listed as Metric 1, Metric 2, and Metric 3.
Detailed View for Each Metric: An in-depth analysis of each variant's performance, broken down by metric, with one variant per row:
- Unique Conversions: The number of unique visitors who completed a target action.
- Conversion Rate: The percentage of total visitors that completed the target action.
- Improvement: The change in performance relative to the control group. The control group is typically marked as '—' as it is the point of comparison.
- Sessions: The total number of individual user interactions with the variant within a given timeframe.
- Probability to be best, Confidence Interval, and Statistical Significance: These are statistical measures offering deeper insights into the data:
- Probability to be best: A predictive measure of a variant's chances of outperforming all others.
- Confidence Interval: Provides an estimated range within which the true value is likely to lie. It offers a measure of certainty about the results.
- Statistical Significance: Validates whether the results of a variant are due to the modifications made and not a product of random chance. High significance implies that the observed differences are likely due to the changes made in the variant.
Here is a sample report:
Understanding the Analytics Report for Your Experiment
Interpreting the outcomes of an A/B test is a crucial but often overlooked step in the experimentation process. Here's a guide to navigate this stage effectively.
Most platforms for experimentation offer in-built analytics to monitor relevant metrics and KPIs. Prior to dissecting an A/B test report, it's critical to grasp two key metrics:
Uplift: This metric represents the difference in performance between a test variation and the baseline variation (often the control group). If, for instance, a variation yields a per-user revenue of $5, whereas the control generates a per-user revenue of $4, the uplift is 25%.
Probability to Be Best: This metric indicates the likelihood of a variation performing optimally in the long run. It is the most practical metric in the report and is used to declare the winning variation in A/B tests. While uplift can fluctuate due to chance with smaller sample sizes, the Probability to Be Best considers sample size (as per the Bayesian approach). Calculation of this probability does not commence until there have been 30 conversions or 1,000 samples. In simpler terms, uplift answers "By how much better?" while Probability to Be Best answers "Who is better?".
Interpreting A/B Test Reports
Begin by scrutinizing the A/B test outcomes to determine if a winner has been announced. Typically, a winner is declared if the following criteria are fulfilled:
- A variation scores above 95% in Probability to Be Best (though this threshold may be adjustable in certain platforms via the winner significance level setting).
- The minimum duration for the test has been met. This ensures results are not distorted by seasonal effects.
Analysis of Secondary Metrics
While the primary metric is the basis for declaring test winners, carefully consider secondary metrics. We recommend examining these secondary metrics before finalizing the experiment and applying the winning variation universally for a couple of reasons:
- To prevent errors (for example, if your primary metric is CTR, the winning variation could decrease purchases, revenue, or AOV).
- To uncover interesting insights (for instance, a decrease in purchases per user but an increase in AOV, indicating that the variation caused users to buy less but more expensive products, ultimately yielding more revenue).
For each secondary metric, consider the uplift and Probability to Be Best scores to evaluate the performance of each variation.
Audience Breakdown Analysis
Another effective method for deeper analysis involves segmenting results by audience. This can answer questions like:
- How did traffic from different sources respond to the test?
- Which variation won for mobile users versus desktop users?
- Which variation was most effective for new users?
Choose audiences that are relevant to your business and likely to exhibit different behaviors and intent signals. Again, for each audience, review the uplift and Probability to Be Best scores to evaluate each variation's performance.
Understanding the Winners in Losing A/B Tests
Conversion Rate Optimization (CRO) and A/B testing have long been relied on for significant uplifts in conversion and revenue, identifying the best variations for site visitors. However, as personalization becomes more fundamental to customer experience, experiments that disregard the unique conditions of individual audiences may yield inconclusive results, and statistical significance becomes more challenging to achieve.
In today's world of personalization, where one-to-one interactions surpass a one-to-many approach, "average users" can't represent "all visitors." This implies that traditional ways of uncovering and delivering the best experiences, even with best practices around sample size creation, hypothesis development, and KPI optimization, may fall short.
Rather than merely accepting test results and deploying variations expected to optimize overall user experience, marketers need to understand that such an approach could compromise the experience for another portion of visitors. A segment of visitors will always find the winning variation suboptimal. Only by acknowledging this flaw can we realize that losing A/B tests may end up as winners, and that overlooked opportunities may bear the most fruit in a personalized way of thinking.
The provided examples underline the importance of analyzing the impact of test actions on different audience groups when running experiments. Thorough analysis of results for different segments can unveil deeper optimization opportunities, even for tests that fail to produce uplifts for the "average user" (who, in reality, doesn't exist).
While a quick analysis will suffice for some of these scenarios, the complexity of analysis grows with the increase in the number of tests, variations, and segments, making it a data-heavy task. With busy testing schedules and ever-shifting priorities, finding the time for thorough analysis can be a challenge.
Given these complexities, it's clear that simply accepting results at face value and unnecessarily discarding treatments can no longer be the norm.