Python and The Beautifully Ridiculous Irrationality of Human Choice
Human beings are wonderful, terrible, amazing, fascinating… and WILDLY inconsistent when it comes to taking surveys! Analyzing the data produced by these fabulous creatures sometimes feels like being a porcupine trying to make balloon animals: The end result will be pretty darn awesome, but there will be a lot of loud noises, swearing, and possibly some tears on the way there.
So… one very important technique I wish I’d learned/mastered early on is DATA CLEANING! Analysis is sooooo much easier with a nice, clean dataset. But particularly when it comes to data generated from choice experiments such as Choice-Based Conjoint and MaxDiff (which is primarily what I work with), we need to balance “cleaning out inconsistencies” with “accounting for the beautiful irrationality of the humans that take our surveys” if we want the end results to be realistic. Clean too little and the results might be biased by respondents choosing based on an overly-simplistic heuristic; clean too much and we’ll lose the true randomness inherent in choice and human behavior.
Something I love about Python is how data frames simplify the process of creating and working with multiple filters for cleaning data. This allows me to test out different approaches to cleaning without any data loss and makes it really easy to return to a previous version. I’ve learned quite a bit in practice over the years, but I’m always eager to learn more techniques to help streamline the process and improve the quality of our data!
~ Tracey Di Lascio-Martinuk