I’ve asked game development communities on Facebook and Reddit what was it that they were interested in the context of this blog. I expected a large number of interests, but the truth is that most requests were in the lines of “how do I start?” The Setting Up Game Analytics category of posts that I start today is about that. How to setup game analytics in your studio. From planning and choosing the technology to defining events and integration with external services. I expect many posts in this category.
There are two things you need to be aware before embarking in this journey.
The first one is that all posts will be written in the light of The Player Lifecycle. If you don’t know the basics of it yet (or what the hell I’m talking about), I invite you to read the articles under this category.
The second one is that a full game analytics stack is a complex workflow of services, technologies and actions. My objective in showing you the full stack is not to scare you. Quite the opposite. My objective is to show you all the workflow so that you can determine where you are and where you want to be.
The Game Analytics Stack
Game is the collection of all systems that make up a game. It can simply be the game, as in a single player game. It can also be the game client and the game servers. Whenever I refer to the game I mean the game as a service. If I need to refer to the game with which the player interacts, be it a mobile or a browser app, I’ll call it the game client as opposed to the server infrastructure that serves the game which I’ll call game servers.
The game provides us information in the form of events. Many people refer to the defined collection of events as the game’s event taxonomy. Each step of the player lifecycle has specific events and custom events. Definition and review of the taxonomy is an on-going project on all games that have analytics.
A quick note: It is possible that we want to access the game servers databases, but it is far preferable to leave these databases serving the game and not analytics. We run massive queries, accessing the databases directly is a bad idea. If it has to be done, ask for a live replica of the databases that doesn’t impact the game’s production environment. If possible default to using events and not access to the game servers’ databases.
External data sources are all forms of data acquired from an external source. These sources can be communications with an external provider such as ad networks, user acquisition and tracking, app stores, public or private datasets, connections to google spreadsheets, etc.
There is one exception: analytics services and platforms. The reason why I exclude them is because they provide the rest of the workflow, partially or in whole. Since the rest of the workflow is the central piece of analytics, as far as technology and services are concerned, I will not include those services and platforms under external data sources.
We receive external data in several ways from external data sources. Sometimes we query databases, others we need to have an API up to receive their server callbacks. The big difference between events and external data is that often we define what events are, how we receive them, their structure, etc. External data are often dirtier and complex and it is common that each data source has its own pre-processing routines to enter our systems.
From this point forth lies the magic. This is what you’ll have to build, buy or borrow. There are full package services, partial package services and individual parts. Understanding the several parts is the best way to make an informed decision on what to build, buy, borrow or even neglect.
All the data that we receive should be kept as raw data. Everything else is built from raw data. Raw data, as painful as it is to work with allows everything. Aggregated data allows only what it was aggregated for. If there’s an action point you can take from this post is this: always keep your raw data or some form of access to event level data.
There are many data products. Some are well-known buzz words like A/B testing and machine learning, others are conveniences where it is easier to extract data from a central data lake to feed something into the game. These data products interact with the game based on existing knowledge they get from the data. Whatever is learned by the data product then feeds back to raw data. I’ll discuss data products as a whole and specific applications of data products individually.
Both external data sources and data products can communicate directly with the game and often they do. I called this services data for lack of a better expression. This can be achieved through APIs and SDKs and I added it to the graphics for you to know that it exists, but it is something that is mostly dealt by the development teams.
Extraction, Transformation and Logging, known for years by IT and DBA folks as ETL, is the process that transforms raw data in reporting tables and datasets. This is not all that ETL is but for the purpose of this post it is a simplification that fits nicely. From my personal experience, ETL in this context also feeds itself, meaning, the input is not only raw data but also the processed and/or aggregated data. I won’t go into much detail regarding ETL. In my opinion it is a data engineering task and I’m spoiled at Miniclip with top class data and software engineers that make my work very easy as long as I defined clearly what is it that I need.
The things that I need are reporting tables and datasets. The big difference between the two is that reporting tables are aggregated tables that are often used in reporting and visualisation tools. These are massively important to the business, from stakeholders and producers to designers and analysts, they are the entry point for most people in the organisation. Datasets, on the other hand, are either one-time queries on raw data (through ETL) or on structured representations of raw data e.g. large columnar databases.
Datasets are rarely directly exposed. They are used mostly for ad-hoc analysis and research conducted by data analysts, scientists and engineers.
I hope this was a fine introduction to the Setting Up Game Analytics category. There will be a lot to write about this. This is a huge field and it would be great to have feedback on what matters to you the most. So leave a comment. I’m looking forward to hearing from you.