Yep it’s a rant… announcing ahead that a blog post is a rant is almost a tradition since before blogs were cool and web forums were the thing. If someone in some obscure web forum started a post with <rant> or “Rant Mode On” and other derivatives, he was signaling “I’m not a troll but I’m really upset!”
It is a message with mixed signals between “keep reading because I’m going to get nasty and you’ll like it” and “you can just skip if you need apologies if and when a web user get’s crazy”.
Either you got the idea by now or you know what I’m talking about for two whole paragraphs and you’re itching for the juicy stuff. So, here goes…
Rant Mode On!
What am I talking about?
There are many things that are objectively better than others. For instance, one week vacation in Maiorca vs. lifetime in prison. That should be a no brainer and you don’t need to be in prison to know that one week in a Mediterranean island is probably better. These two things are so different that it is a clear “black or white” situation. In Island vs. Prison, Island ALWAYS wins!
However many things have infinite shades of grey. In the context of analytics and data science, there are too many of X vs. Y cases that are shaded. For instance:
- Languages, e.g. R vs. Python
- Fields, e.g. Statistics vs. Machine Learning
- Algorithms, e.g. Random Forests vs. SVM
It is a pure waste of time and and a misuse of intellect. Yeah I’ll say it out loud (as if it wasn’t written in the title): it is DUMB! As I see it, it appears due to ignorance of the “opposite” side and I quote opposite because there is no opposite. All of these things complement each other.
It is impossible for anyone to know everything in the realm of data science. Therefore, each one of us surrounds itself with the tools that solve whatever problems we have. It is only natural that there are other possibilities but the mindset seems to be “if I use it, it’s the right one”. It’s a bit like religion, there are hundreds of them, but mine is the one!
What is borderline frustrating is that the whole field of data science as a practice revolves around a well known even if often misinterpreted theorem called “No Free Lunch Theorem” which states (according to Wikipedia):
(…) any two optimization algorithms are equivalent when their performance is averaged across all possible problems. (…)
Regardless of your thoughts or knowledge on the theorem, I’d like to generalise this to everything in data science. Any two languages, any two fields, any two techs, any two products, or like the case of the theorem, any two algorithms. And my reasoning for this is there is no such thing as one of the things completely dominating the “opposite” other. Isn’t it ironic that one of the fundamental theorems that fuels so many of our decisions isn’t taken into consideration when we are discussing our preferences? Aren’t we, the data dudes, the people that should be evidence based and skeptical about opinions?
Why aren’t we on this?
Each and every one of them has use cases where it out performs the competition. That is undeniable. But being an Island vs. a Prison, I have not met one. So why can’t we just get along and share our experiences instead of aimlessly debate which is better? There isn’t a global better. There’s a local better.