Despite the scary term, dark data is neither technically evil nor contains dark content. To grasp the importance of dark data, we must understand the term and its broader implications.
What is dark data?
Organizations routinely collect an exceptionally large amount of data that they hope will help them make better decisions and achieve growth. Startlingly, most of this data never gets used. This unused data is called dark data.
Dark data has become an increasing problem as data generation and collection is growing exponentially across industries. Many different categories of dark data exist in organizational databases. For example, data such as employee profiles, raw survey data, customer information, email correspondence and financial statements can all be classified as dark data if they have not been used constructively. In recent years, dark data has played a critical role in the data strategies of companies worldwide.
How serious is the problem?
On the surface, dark data might seem innocuous, but that is only the case if you think of it as data that has not yet been used constructively. However, the scale of dark data makes it a challenge for organizations.
A 2018 IBM study showed that 80% of all collected data is dark data. Many organizations are only now waking up to the scale of the problem and are prioritizing trying to get to grips with the scope of the problem.
Why should you tackle dark data?
Firstly, dark data represents significant opportunity costs. The data could be used to produce vital business insights but it’s going to waste. It’s a massive lost opportunity for the organization. Secondly, with GDPR and new data protection regulations, organizations must build the necessary governance infrastructure to protect themselves from a legal perspective. Dark data significantly increases the footprint that needs to be governed and protected. Since dark data might not be actively used, it might not be actively governed, which leads to serious security risks and compliance issues.
Finally, dark data might be costing you money by using infrastructure and resources that could be freed up to reduce costs or could be invested in high-value projects.
Tackling dark data
Dark data could be an indicator of deeper problems in your data strategy. You might be holding irrelevant data or have problems with data quality, data silos, and data discoverability. Something is preventing employees from discovering and using the data properly. If the data is not useful, you can archive or delete it unless there is a legal reason to hold it. If it’s potentially valuable data, you could analyze ROI to make appropriate investments, turning dark data into a lucrative opportunity.
Things you can do to tackle dark data:
- Run company-wide data cataloging and data mapping to classify, structure, and manage the data. Your data map can show what categories of data are stored and exactly where they are stored.
- Make the data catalogs available to employees and drive awareness across the organization. By exposing dark data, you enable employees to discover and use the dark data to help the organization.
- Implement a policy to regularly audit and prune your databases, data lakes, and unstructured data sources.
- In many cases, dark data contains data from deprioritized old initiatives or duplicated data. If you don't have a legal reason to hold the stale data, archive it.
- Applying strong encryption to dark data across different storage settings can help you keep it secure and prevent vulnerabilities in your system.
- To prevent privacy issues, you can de-identify and mask sensitive data stored in the dark data.
- Enable business analytics and intelligence tools to make the dark data accessible to your employees and partners.
Dark data is slowly making it onto the agenda of chief data officers across the globe. If used constructively, dark data can give significant value to organizations. Sorting out your company’s dark data reduces data risks and generates better insights.