4 Strategies for Improving Data Quality in Retail
The speed of AI development has dominated technology news for the past year. As prospective users of new AI technology try to determine its use cases, the answer frequently involves data quality.
Generally speaking, starting with better data produces better results. In dbt’s 2024 State of Analytics Engineering report, the largest issue voiced by data professionals was data quality. “Poor data quality is the most prevalent problem today,” the report states. “As data-related complexity increases, maintaining quality is becoming a greater struggle for teams. Already a concern in the last edition of this report, this has escalated as a significant hurdle for data practitioners; 57% of respondents now highlight it as one of their chief obstacles in preparing data for analysis, up from 41%.”
The challenge, however, is understanding what factors contribute to improving data quality in retail, which can vary wildly based on the context within an organization and which particular use cases they may be considering.
What is Data Quality?
High-quality data is easy to state as a goal, but its definition is more complicated. It is highly situational and use case dependent. Consider this: Does a finance team need the same accuracy as a sales analyst? When a TV ad runs for three weeks, how should its cost be registered? By day? Based on sales conversions? If a customer starts browsing at 11:58 PM and checks out at 12:03 AM, which day claims that revenue?
Beyond accuracy, data must also be representative. A misspelled email address could easily pass by all the automated data validation, integrity, and accuracy checks, but it won’t do much good when actually trying to reach somebody. A customer could move and forget to update their mailing address. A price could have changed and completely warped KPIs in downstream reporting.
The point is accuracy doesn’t really exist. We agree to it, and data quality should be viewed in the same light. Teams must find a common definition for quality and accuracy within each use case, otherwise there will always be a disconnect and misalignment.
Strategies for Improving Data Quality in Retail
There are many ways retailers can improve their data quality to harness its full potential and drive strategy and operations. We break down some of these strategies below.
Address data quality at the origin
Proactively addressing data quality at the origin can significantly reduce downstream issues and resource drain. It is common for data engineering resources to compile, validate, and clean data after it lands in a downstream location. However, by tackling the data quality at the source, either in logic during the load process or within a source system, organizations can achieve multiple benefits, such as:
- Reduced workload by correcting errors early, minimizing the need for corrective actions downstream
- Improved data consistency and trust by addressing issues at the source
- Better scalability and cost savings as the volume of data grows
Addressing data quality at the origin is even more crucial when implementing a data lake strategy. Data lakes require proper data governance around quality to remain valuable. This may include different ways of organizing and storing data, as well as creating rules to ensure data consistency. Without governance, a data lake can quickly become polluted and unusable.
Increase data literacy
Data literacy is the ability to understand and communicate with data. This can enhance the value of data across an organization and the data quality from the source to the end-users. According to a dbt survey, nearly half of data professionals believe that improving data understanding among their colleagues would greatly benefit their organization.
To boost data literacy, organizations should consider adopting some of these common best practices:
- Host regular data training sessions
- Build and actively maintain a data dictionary
- Encourage cross-functional collaborations
- Recognize and reward data-driven decision-making
The impact of data literacy on data quality is tangible across all levels of an organization. For example, a store manager with strong data literacy might notice unusual patterns in sales data, flagging potential issues before they impact inventory decisions. This increased understanding fosters a data-driven culture across business functions where decisions are based on reliable information rather than intuition alone.
Broaden data engineers’ perspective
Like end-users benefiting from increased data literacy, data engineers who understand business use cases for data are better equipped to identify and prioritize quality in their pipelines. This broader perspective leads to increased accuracy and relevancy when meeting the needs of various downstream users across the organization.
For example, imagine a finance analyst requesting a sales table. A data engineer with a limited perspective might simply try to deliver the requested table without further thought. This resulting table may contain basic sales metrics but will lack the necessary dimension for in-depth analysis, like payment types or transaction times. Compare this with an engineer with a deeper perspective of the use case. The data engineer meets with the analyst to understand the use case: a trend analysis of recent promotions. The engineer then prioritizes the quality of the transaction date, promotion types, and customer data. This enables the financial analyst to create relevant customer segmentations, evaluating promotion effectiveness within groups and trusting the validity of transaction timestamps.
This use-case driven approach encourages data engineers to:
- Understand the wider context of data use across different departments and business functions
- Anticipate data requirements beyond the initial request
- Proactively address data quality issues that may impact use cases at the source
- Facilitate more effective collaboration between cross-functional teams
Implement data traceability
Establishing data traceability is crucial for maintaining data quality. It provides a comprehensive understanding of data’s lifecycle and ensures its integrity at every step. This involves three components, each contributing to the integrity and trustworthiness of the data:
- Data Lineage: Tracks data’s complete lifecycle and identifies potential failure points
- Data Governance: Defines responsibilities and processes for maintaining data quality
- Metadata Management: Provides context about data, enhancing understanding and trust
These work together to offer a robust framework for data traceability and set the foundation for continuous improvement, in turn, enabling organizations to generate accurate and reliable insights by leveraging data.
Wrapping Up
In practice, implementing each of the above strategies may lack the glamor of brand-new software or upgrades. However, they represent critical, long-term investments in data quality. By committing to these approaches, organizations can proactively prevent or mitigate the effects of data debt, or the lack of data governance resulting in inconsistent data quality, before they become a significant issue.
Clarkston’s team of experienced data experts is here to help identify problems and recommend solutions for the challenges organizations face when creating and maintaining or improving data quality in retail. Learn more about our Digital + Data Analytics services today.
Subscribe to Clarkston's Insights
Contributions by Matt McMichael