The previous post had two big themes. The thought process of asking questions to define the events in a clear and detailed form was one. Introducing the basic metrics and the events that would allowed them was the other.
This post expands those two themes. The objective is to introduce concepts that will be used later to define reporting tables and dashboards and add more information to our two events.
Tidy data: observations of dimensions and measures
Tidy data is a simple concept. Think of a table or spreadsheet where each row is one and only one observation, for instance, on a purchases table, each row represents one purchase using the purchase event defined in the previous post.
Each row has a user ID, a date and a product ID. These are the dimensions. Think of dimensions as non numeric variables that you can filter or group by.
Each row has also a numeric variable representing the revenue of that purchase. This is a measure.
If you want more detail on types of data, check the Data 101 post.
A dataset like this, where each row represents one observation and columns represent variables, either dimensions or measures is tidy data dataset.
Dates as dimensions
Dates play a very important role on game analytics. Many reports take the form of daily time series or bar charts binned by month. There are two fundamental dates. The first is the activity date, that is, that date in which the event took place. The second is the acquisition date, or the date in which the player that triggered the event was acquired. The difference in days between these two dates is the retention day. As an example for a player acquired on January 1st 2015, retention day 1 is January 2nd and retention day 9 is January 10th of the same year.
If you read Retention 101 and other posts on retention, you have seen mentions to these three variables. In practice these three when used as dimensions create two different types of datasets.
Datasets dimensioned by activity dates deal with measures that vary on absolute dates for all the players of that date. For instance you can see the revenue of 2015-01-01, 2015-01-02, etc. These are the datasets that include KPIs like Daily Active Users, Daily Revenue, etc. I call these metrics datasets.
Datasets that are dimensioned by acquisition date and/or retention day deal with values that vary on dates relative to that acquisition date being the most well known metric the retention rate. Other metrics have been mentioned in other posts appear in these datasets like Cumulative ARPU. I call these cohort datasets.
You can soft launch a game only with the two events and capture the seven metrics to have a good starting point to evaluate the performance of your game and starting making new questions. Going beyond the basic implementation starts with adding more dimensions.
By default we use two on soft launch: country and platform. Both are present in any analytics services and both should be straightforward to implement in your own in-house solution. Country and platform are relevant because different countries and different platforms render very different player behaviour.
From the two events you can calculate many measures. Instead of adding the ones missing I’ll list all the measures that I know of that are trackable by the two events. Some of them are vanity metrics that make computation easier, especially if you use a visualisation or BI tool. Others are relevant for cohort analysis or any ad-hoc analysis that you may want to perform.
Each row represents one activity date for each unique combination of dimensions.
- DAU: count of distinct session user IDs
- Payers: count of distinct purchase user IDs
- Revenue: sum of purchase revenue
- ARPDAU: Revenue/DAU
- ARPPU: Revenue/Payers
- Conversion Rate: Payers/DAU
- Purchases: count of purchase events
- Purchases per Payer: Purchases/Payers
- Sessions: count of session events
- Sessions per Player: Sessions/DAU
Each row represents one acquisition date and one retention day for each unique combination of dimensions. We can use all measures from the metrics datasets but for the sake of simplicity I only kept the ones that are used in cohort dashboards.
Consider that all variables labeled as Cumulative represent all the sums and counts until that retention day since the acquisition date.
- Cohort Size: count of distinct session user IDs on retention day 0
- Cumulative Revenue: sum of purchase revenue
- Cumulative Payers: count of distinct purchase user IDs
- Cumulative ARPU: Cumulative Revenue/Cohort Size
- Cumulative ARPPU: Cumulative Revenue/Cumulative Payers
- Conversion Rate: Cumulative Payers/Cohort Size
- Cumulative Purchases: count of purchase events
- Purchases per Payer: Cumulative Purchases/Cumulative Payers
- Retention Rate: DAU/Cohort Size
You can build reporting tables with what was defined in this post. In a way it serves as an introduction to it. If you build tables with these datasets definition you can create dashboards that would allow you to follow your game’s performance.