Data Quality, does it have to be so difficult?
I am increasingly hearing and reading about data quality and governance. It’s not the super sexy and spicy topic like visualizations, AI, ML, etc. However, it deserves a lot more attention. I don’t know if I can even count the number of times I hear about dirty data and data cleansing. Most of the statistics I hear is that we spend on average 70% to 80% of our time on these activities. Does it need to be so hard? What can we do to help reduce quality problems?
In application the aspect of quality is measured qualitatively and quantitatively. The data needs to address an intended business use case to facilitate actionable insights and decisions. Adding more sources of data increases complexity. In addition, the more times a data source is touched or modified (think ETL) poses yet another opportunity for quality control problems. Different views of quality can cause a disconnect and agreement. Businesses that are data driven tend to recognize this issue better than others because their business is run a strategy, culture, people and technology that supports the quality cause. Informatica has a good view of this topic (see https://www.informatica.com/services-and-training/glossary-of-terms/data-quality-definition.html). We face a lot of barriers to get to quality data with the explosion of technology resources, competition for employee time, inability to focus on what really matters and how to agree on quality. Data governance, profiling, reporting and master data management are core aspects of quality. But as much as these are great methods to drive for quality I can think of numerous cases where that drive is complicated by technology teams, competing needs (lack of time), technology plug and play or lack thereof, data transformations, and multiple data source needs / systems to meet business requirements that create roadblocks.
An approach that can be helpful
Executive management can consider employing scrum agile strategy and SME teams who are tasked with organization wide data needs. This effort would evaluate and understand the technology infrastructure and have visibility to data needs across an organization. Their task would include ensuring the clear development of data literacy, governance, master data management, and data quality reporting. Since this team is broad and diverse it might best be in a center of excellence.
By moving to a robust and system wide up to date quality approach, we can reduce the leak points. We can ask and document questions about why business requirements need transformation of data, the joining of multiple data sources / systems, and other issues to empower the team to evaluate and implement appropriate, documented and controlled updates that can be tracked. This doesn’t mean we move away from disparate data needs across an organization. It does mean, however, that the organization has a total view of the requirements that share common and competing needs and elements. Clearly such a team must be every experienced in the business, which for a large organization are often very broad and complex. In addition, they must possess the technical and communication skills to partner cross functionally to empower the business units to realize value and actionable insights. Such a group would ideally have a key team member in the various business units, operational areas and perhaps geographies for multinational organizations. The end idea here is a clear view at the topic level of the organization. Agile teams are equipped with the skills and experience to deliver on this recommendation and if the organization truly practices scrum there will be scum masters who are empowered to execute the program.
To be successful this approach will need to agile, there can’t be micromanagement. Based on my personal experiences an effort like what I am recommending will need executive leadership buy-in and a data strategy that helps reduce bottlenecks, backlog management, prioritization, etc. It will require commitment and constant change.
Data quality doesn’t have to be hard. It might be complex and constant flux and it might be a big time and people investment, but the results of quality will reduce expenses, improve employee reliance and belief in the data quality, empower the organization to have a total visibility approach, and keep an organization focused on delivering value.
Principal Analyst | Data Engineer