Why questions are better than data

Fred entered the room. The walls are covered by whiteboards scribbled with words that he reads in his LinkedIn feed and hand made scatterplots and line charts in red and black, green and blue. This is the lair of the data analysts and data engineers. He is very proud that his company has a data science team. He has been reading a lot of nice stuff about data science and big data and he brags about its impact to his friends.

“Hey Gabriel! I want to ask you something.” Gabriel takes his eyes of his monitor and smiles back to Fred: “Hey, what’s up? What do you need?” Fred requests “If you have the time, can you send me an Excel file with… let’s say… the last 6 months of in-app sales?”

Gabriel thinks for a couple of seconds. “I don’t see why not… if you buy me a beer… or ten!” he replies. As Fred gives a thumbs up and starts moving towards the door. Gabriel interrupts him “…but mate, what do you want it for? You can get those data from the reporting tools.” Fred returns frowning like a kid that forgot to bring his homework to school. “You’re right.” Fred continues – “The problem is that the dashboard export will give me aggregated data and I want each individual purchase with user id, player level, country and acquisition source.” They think into each others eyes for a couple of seconds and Fred continues “And product id and sale value of course.” Gabriel nods and gestures Fred to wait. His fingers flow over the keyboard as a console in one of the screens fills up with characters. “That is quite a big dataset, Fred. Can you handle 2 million rows of data?”
Fred’s shoulders drop slightly pulling the corners of his lips to a disappointment frown. “Even if Excel can handle it, I’m not sure I can.” Gabriel straightens up in his chair. “What question do you want to answer with this dataset?” he asks. “I want to know if the players that buy premium cars also buy premium currency. I think that these players are not upgrading the premium cars because there’s not enough premium currency and later on leave the game because other players with worse base cars are overpowering them with upgrades.”
“Dude…” Gabriel winks “if I pull that analysis, you owe me eleven beers.”

So you’re a fiction writer now?

Not really although there are somethings inside a certain drawer somewhere in my house but that’s not the point.

The point is that questions are better than data. There are two perspectives to this. The business user, in our little story, Fred. And the data analyst, Gabriel.

Fred has a question. In fact, Fred has more than that, he has an hypothesis. We will discuss this whole hypothesis thing on a later post but this is the single most important thing: there’s a business question here. A very clear one: do players with premium cars but less upgrades churn because players with worse and better upgrades beat them given that they don’t have enough premium currency to get upgrades.

The 3 common mistakes of business users

There are three common mistakes business users do when it comes to data. Fred did all of them!

Fred’s first mistake is that he believes he can answer the question as long as he has the data. While this is true for simple descriptive stuff over aggregated data (think bar chart of total revenue per month) anything beyond this involves specialised skills. Sometimes this pops up in the form of asking a certain data visualisation. “Make me a pie chart with this and that” is the single worst thing a business user can do to himself data wise because as soon as that visualisation appears he believes he is looking at the answer. Data analysts and data scientists are often specialised in crafting the visualisation that best answers a given question.

Fred’s second mistake is that he believes he knows what data he needs to answer the question. If you think about the question, you’ll notice that more information is needed, e.g. when players left the game or what interactions between the two types of player occurred.

However the third mistake is the big one! Fred didn’t ask a question. And the worst part is that he had one! There are times when it’s difficult to put up a question. It is preferable to dig into what is causing the doubts and define a question than to ask for data… or a graphic for that matter.

In this hypothetical example, Gabriel is likely to pull several datasets to answer this question, find associations between different purchases, classify players in the first group according to their exposure to players with worse cars and better upgrades and from that classification infer if the return rate is statistically different between the two classes. And to wrap it up choose a visualisation that makes the answer so obvious that it looks easy to do.

Maybe that’s the problem… that the end result seems anyone can do it. But that’s how we get the message across: making it look and sound as simple as possible, but not simpler.

Final Thoughts

In the end, a large part of the responsibility of this is on the analyst side. If you are an analyst, you should always ask “what is your question?” or “what is the business question?” I kid you not, if you are an analyst and you don’t ask this you are doing a bad service to you and your business user colleagues.

If on the other hand you are a business user, I can only hope that this leads you to not asking for data but rather asking questions.


