As more organizations embrace digitization and process improvement, they often fail to reap the full benefits of this transformation by neglecting to ensure that their underlying data is trustworthy. That should be Step 1 in any changeover.
Enterprises integrate an average of six different types of data and 10 different data management technologies, according to a 2019 IDC enterprise data integration and integrity survey. As a result, data workers waste an average of 15 hours per week on data search, preparation and governance processes.
These challenges create data quality-assurance issues that make their impact felt downstream. Just as physical production lines are at the mercy of the quality of incoming raw materials, investments in digitization and process improvement can’t reach their full potential if the incoming raw data is tainted. Bad data in means bad data out.
As businesses begin to take this into consideration, their perception of data changes. Rather than simply working to access as much data as possible and getting it into the hands of different teams across the business, they’re considering issues such as data integrity and lineage.
In this eWEEK Data Points article, with industry information supplied by data quality software provider Talend, we offer a set of best practices for IT managers.
Data Point No. 1: The 1-10-100 Rule of Data
The “1-10-100” rule highlights the hidden cost of poor data quality, with every dollar spent on improving data quality saving $10 in correction costs or $100 in failure costs.
Traditionally, oversight of data quality was limited to specialist data experts, but the democratization of data throughout a business ensures more parties have input into the trustworthiness of that data. Data can offer valuable insight across the business from finance to marketing, but these teams require the ability to ensure its reliability.
Data Point No. 2: The 5 T’s of Data Trust
Metrics such as Talend’s Trust Score, a unique equality indicator of a data set that puts this power into the hands of non-technical people, can be valuable. But how can we determine whether we can trust available data sets? Talend does this by applying “the 5Ts” of trust, to verify data as being thorough, transparent, timely, traceable and tested. Each “T” plays a significant role in data trustworthiness.
The 5T’s of Talend Data Trust:
- Thorough: Trusted data is clean and complete
- Transparent: Trusted data is accessible and understandable
- Timely: Trusted data is readily available
- Traceable: Trusted data tells you where it came from and how it has been used
- Tested: Trusted data has been rated and certified by other users
Data Point No. 3: The importance of data governance
Appreciating data provenance–the origin of data and its journey–offers businesses new levels of data quality assurance with clear business benefits, just as primary producers take advantage of provenance data to track quality assurance from paddock to plate. This is particularly important when businesses must meet growing regulatory obligations, including privacy laws across various jurisdictions.
Data Point No. 4: Working with untrustworthy data can have long-term consequences
Working with untrustworthy data can have major impacts, including long-term reputational damage. Respected medical journal The Lancet was recently forced to retract a heavily criticized study into the effects and side effects of the controversial drug hydroxychloroquine in combating COVID-19.
The study prompted the World Health Organization to pause trials on the drug, but they resumed after the publication’s retraction-w-ith several of the paper’s authors stating they were unable to verify the reliability of the data at the heart of the study.
Data Point No. 5: The value of data trust scores
Data trust scores are not simply automated metrics; they are also influenced by ratings given by people working with that data set along the data supply chain. While these people are not all data scientists, they often possess the expertise to offer valuable insights–such as local sales managers who know their customers and market better than data scientists working in head office.
The further this extends back through the data supply chain, the more an organization reaps the benefits of feeding high-quality data into business processes. It also helps avoid the significant consequences of working with questionable data, from poor product planning and marketing to unintended regulatory breaches.
Data Point No. 6: Summary
Managing a data pipeline is not just about maintaining the plumbing, it’s about ensuring that data is trustworthy. A trust score helps organizations make better decisions on higher quality data In the modern organization; everyone is both a data consumer and a data producer, which means they have a role to play in ensuring that this data meets key quality requirements and underpins great business outcomes.
If you have a suggestion for an eWEEK Data Points article, email cpreimesberger@eweek.com.