Pandas, Pandas, Pandas
Data cleaning and data manipulation is the skillset that I wish I learned early on … and practiced often!
I am learning how to use Python for data science, which uses the pandas library for tasks related to data manipulation. So, to illustrate the importance of data manipulation in data science, I will use pandas as an example. As a bonus I got to create a catchy title based on lyrics by Desiigner.
My latest group project involved looking at the General Election results from the 2016 Election between Donald Trump and Hillary Clinton.
We looked at battleground states such as Iowa and wanted to gain insight into the counties that switched from Obama in the 2012 election to Trump in 2016. This involved a lot of county level demographics data.
We made it work and got cool insights through other libraries. Plotly in order to show county level demographic visualizations, and matplotlib in order to visualize key data.
But none of this would happen without the skills utilized via pandas to clean and manipulate data. The imported CSV files were messy and excessive, but they contained nuggets of vital information that we needed to proceed. Creating useful dataframes out of those initial files with relevant information to our project lay the foundation for all the ensuing analysis, plots, and maps.
Using pandas for data manipulation was the most difficult and time-consuming part of the project – luckily, I had a group to help me through.
Now when I look through all the datasets online for project ideas, I know that it is unlikely they will come prepared to do the data analysis that I want. But after diving into the pandas library deeper, I look forward to this task. If only I had known the power of data manipulation sooner!
By: Mudit Mathur