This post was written 10 months ago… yep, right after Retention 101. Since then it has been in an out of the publishing queue. I’ve been picking up things to improve it but it doesn’t make sense to keep it out… and it took too long really! I wanted to improve it beyond this but it’s better to simply publish it and follow up if I make up my mind about what is that magical improvement than to leave it lingering in the Drafts section any longer.
This post is about ways of measuring retention, how each of them relates with true churn and which should be used.
Retention 101 post was an overall intro. I gave the formula generally used to calculate retention and mentioned there are other ways of calculating it. This post is about those additional formulas, namely rolling retention and rolling window retention and also about churn.
Each retention formula has strengths and weaknesses. Some are more adequate for reporting, other’s for modelling and each has a different relationship with churn. Let’s start!
Types of Retention Rate
Imagine that both of us start playing a game. Cohort size is 2. I guess it’s not a great game… but I liked it and returned the next day and you didn’t. D1 retention is therefor 50%. The next day I didn’t return but you did. D2 retention is 50% too. But what is the churn rate on D1? Apparently you churned on D1 but you didn’t since you returned on day 2. Churn is the opposite of retention so churn rate is 50% both on D1 and D2, which we know it’s false, at least on D1.
This is a problem of insufficient information and where rolling retention comes in. While classic retention counts the users on day X, rolling retention counts the users from day X onwards. In the example above, rolling retention D1 is 50% when measured on D1 because only I played. If we measure D1 on D2 it is 100% since it counts me on D1 and you on D2. Rolling retention solves the insufficient information problem by maturing as more data comes in. If calculated in the same day, it is equal to classic retention but when measured in later days it is closer to the true unknown retention.
The graphic above was created on a simulation to compare classic and rolling retention. The simulation is published on RPubs and the code is available on GitHub. Since it is a simulation, we know the true value of retention. Classic retention was forced and rolling retention calculated.
While the distance between classic retention (the red line) and true retention (blue line) is obvious, the green dots, representing the several calculations of rolling retention, go from the values of classic retention to the values of true retention. This is very relevant in terms of churn rate since we can assume that rolling retention tends to true retention – and therefor to the calculation of churn rate… over time.
This “time to mature” is the biggest weakness of rolling retention. In theory it will allow to know the true retention and churn. In practice, although it matures rather quickly, we never know if and when it has.
To minimise this we have rolling window retention. It is a rolling retention with a predefined last day. Let’s say our window is 30 days. Then D1 rolling window retention will count the users from D1 to D31, after which it is not updated. It has both the capacity to mature of the rolling retention and the fixed nature of classic retention.
In a nutshell we have this:
Which formula to use?
Like I said each formula has its strengths and weaknesses. Classic Retention Rate is the norm. Its biggest flaw is that it overestimates churn. Its biggest strengths are ease of calculation and interpretation and also that is immutable and therefor reliable. Classic retention correlates with many metrics, both standard and custom making it a powerhouse in terms of data modelling and statistical analysis.
Rolling Retention Rate is very useful on games where players return on non-consecutive days. These are the ones where classic retention will perform the worst. As an example imagine a builder game where the player must return to the game to collect coins or harvest some crop. These games have retention hooks in the game design loop. Players return to the game more often because they have compelling reasons to return. On other games these retention hooks are not present and without a compelling reason, players return whenever they please. For these games rolling retention offers a good alternative to estimate churn.