I got into data science for the results. I approached data science not because I found it enticing or interesting, I approached it because I had a problem to solve. For me data science was the chisel and the hammer, it was the tool I needed to carve the sculpture I wanted to make. I did not particularly like hammers or chisels but I loved the idea of making a beautiful sculpture.
My case is probably not typical. It is not very common now for people to have a problem and go into data science to solve it, but it is now common for people to go into data science as they go into calculus in high school. To go in with the idea of learning the toolkit without knowing what they’ll be making with it. To learn how to hammer and to chisel with no particular aim but to hammer and to chisel.
This can create an obsession with improving the tools and with the metrics that tell you how good they are. In data science it creates obsessions with scoring metrics on standardized data sets. It ends up confusing the goal of making better tools with the goal of making sculptures. In the real world however, you’re usually not faced with the problem of improving tools, you’re given a job to do, with whatever tools you have available.
The shock from changing from academia to real world problems in data science can be quite big. In real life people don’t care too much about how fancy your model is, or how much you can improve your testing scores in cross-validation metrics. In the real world what counts is whether your models can accurately predict the future. Future ad revenue, future sales numbers, future fires, future crime rates, future whatever it’s all about the future. The elusive ability to “add real value” that the people who hire data scientists want more than anything else.
And as the proverb goes, it’s very hard to make predictions, specially about the future. This is because the future has some uncertainty related with it, being able to successfully predict it requires you to be very wary of all the sources of bias that you might carry into model building and data analysis. Biases that can be mostly immaterial when you’re doing academic research or just learning from curated vanilla problems.
To those of you who want to eventually use data science skills for real world applications my invitation, which is what I’m trying to do with this blog, is to ask you to leave the iris data set behind and – even if you know nothing about data science – start with a real life question that you want to answer. Think about the sculpture you want to make, then pick up the hammer and the chisel and go for it. It will take time but learning to use tools with a clear objective is – at least in my opinion – way more gratifying than just trying to learn how hammers hammer and chisels chisel.