I have a love/hate relationship with A/B testing. On one hand, I love it because it is the tool that allows me to say that something we changed caused some behaviour. This is pure actionable power. The reason I hate it, on the other hand, it’s because the expression “A/B Test” is a symbol of how badly treated these tests are. I have nothing against A/B Testing and I believe there’s a lot of bad A/B Testing.
Google A/B test or A/B testing and the first results page will be filled with links promising enlightenment about everything you need to know about A/B testing. The explanation is quite minimalist. A/B testing is defined in these pages as a split test. Each group gets 50% of the users and whatever group has the highest value of the variable of interest wins.
Sounds easy enough, right? And it makes perfect sense! If you have 1 million users and you observe a 100% increase in the metric of interest. It’s kind of a no brainer… or maybe it isn’t? Let’s dive in the wonderful world of Randomised Controlled Trials.
What is a Randomised Controlled Trial?
Have you ever seen a clinical trial of an experimental drug on a TV show like House or Grey’s Anatomy? Patients are randomly assigned to two groups. One group gets a placebo. The other group gets the experimental drug. The results of the two groups are compared to see if the group that received the experimental drug experienced an expected effect. This is a classic case of an RCT. It is a scientific experiment that tests an hypothesis.
My oh my so many bold letters… Let’s check each one.
Randomly assigned is very important. If we want to say that the experimental drug and only the experimental drug caused an expected effect, then we need to be sure that no other variable can explain it. Randomisation allows this. Let’s say that we assign men to one group and women to another. Then we cannot say that the drug caused (or didn’t cause!) the effect. What if the effect only presented itself on a given gender? What if we saw no effect? It is possible that the drug only acts on a given gender?
One group gets a placebo is what it’s called the control group. This group represents what we already know, the current truth and absence of change. In statistical lingo, this is called the null hypothesis. The other group gets the experimental drug is the test or treatment group. This group represents an alternative hypothesis, a change that allows us to reject our current truth and embrace another, alternative one.
The last important part is if the group that received the experimental drug experienced an expected effect. In practice, we are saying that we will not accept any effect or change. We will accept the expect effect or change.
Differences between bad A/B Testing and RCT
To be clear A/B Test is jargon for a type of RCT. It became the conventional name with the explosion of web analytics in the late nineties. The big difference between bad A/B Testing and good A/B Testing (that I’m referring to as RCTs) is that bad A/B Testing is not scientific. There are two main differences:
- Design of Experiments is one of my most beloved themes. Designing an experiment gives us a solid starting point to the results we will find and report on. It makes us formally state what are our objectives and allows us to calculate group sizes, state a confidence level and margin of error. Design of experiments in the context of freemium mobile games has many idiosyncrasies. It is so important that no test is run without a meeting between analysts and producers. It can be a quick check up between a producer and an analyst for a simple test or a meeting with stakeholders, producers, designers and analysts for a complex question.
- We always report RCTs and underneath the apparent simple reports and recommendations is Statistical Testing. This gives us a solid ending point to the results, opens doors to follow up questions and new hypotheses. More important than that, it does not allow us to recommend or producers and stakeholders to make decisions on weak evidence. We don’t accept any change, we only accept an expected change.
I wanted to be sure that I wouldn’t paint A/B Testing as an easy thing. There are many traps in assuming this is an easy thing to do. On the other hand, I don’t want to stop you from doing it. Quite the opposite! A/B Testing is very powerful. I’m sure A/B Testing will be one of the things I’ll write a lot about. Doing it right is very difficult but the results are awesome!