So you think you can ‘Data Science’ ?
“No, sir. CSV is not a database.” I said as I sipped my third cup of coffee at 9 am. That is how my day began. How about you?
We all know the good old overly simplistic(?) role of Data Scientist – Data Cleaning, Exploration, Statistical modelling/Machine Learning and Visualizations. I don’t think this needs re-iteration. Lets talk about three key qualities.
Pragmatic data curiosity – I feel data cleaning, transformations, feature engineering – all these are important, but secondary. First you need to understand your dirty data – How was it collected? How many sources? Have some values been defaulted/imputed? Can I assume this? Wait, doesn’t this look odd? But is it odd enough to spend my entire day investigating? These are the type of questions you need to think through. Try to become a practical skeptic; this’ll become second nature as you build domain experience.
ABC(Always be coding) – You don’t start as a proficient coder; but enough practice will get the job done. Eventually you should be able to code as you think. Often you’ll need hack your way out. The formal agile methodology with documentation & ‘sprints’ rarely work in DS. Sometimes you might do 4-5 sprints all in a day if required. A solid coding experience will go a long way!
Storytelling – If you’re telling the client that ‘your model had to be penalized to account for the high false positive rate’, good luck keeping them awake through the snooze-fest. Simplify! Don’t fret over the tool or the type of graph you’re using. At the risk of sounding hyperbolic – Ask yourself if a grandma who can barely face-time on an iPhone might comprehend your results. If the answer is No, there is a better way to communicate your analysis & recommendations. Find it.
Good luck. Thank you!
By: Swapnil Phulse