Most common mistakes people make when working with data
What are some common mistakes people make when working with data?
From my short experience, what I’ve encountered as the most common mistakes when working with data were (and still are):
- People not realizing that data needs to be cleaned, usually, but more frequently on academia, people with not much experience try to solve everything using complicated models that they find in libraries and stackoverflow. But the problem is that they don’t take the time to clean and arrange the data as it is required.
- Trying to use a plethora of tools to solve a problem, frequently, instead of gaining in-depth knowledge about a few powerful tools, people try to use multiple tools, one for each specific problem, without even realizing that most of the well established tools have the same functionalities.
- Not understanding what they are using, instead of going in-depth in the models and try to understand how they work and how they are built, I see many people trying to use as many models as they can (Neural Networks, SVMs, etc, etc) to see how they can have better performances, when what they should have done is to study the models and select the ones that are suited for the job in hands.
For me, this is the three biggest problems when working with data and machine learning problems. People should focus on understanding the problems and what the tools are to solve them before trying to do anything.
By: Vasco Lopes