Much of the data that organizations collect will be used to improve the way they do business. Whether it’s information on how users use your product, results gathered from marketing efforts or internal statistics on development processes; these are assets that have nothing to do with dark data.
However, in this constantly growing ocean of data, not all information is equally profitable. Along with this valuable information, a quantity of data is also stored that has no real tactical value and, moreover, continues to increase.
Gartner defined this unmanaged information as dark data or giving data. This type of information represents an expensive and potentially risky effort for organizations that could become a major obstacle to taking advantage of Big Data.
Different Categories Of data
The information available in the organization can be classified into four categories:
- Known and used data: This information is identified and used for analytical purposes or any other purpose that adds value to the organization.
- Data known, but not used: This information has been stored after being identified in the analysis processes, but it is not possible to use it, either due to lack of time, budget or lack of knowledge on how to achieve it. Sometimes the non-use is due to the fact that the size and format of these data exceed the possibilities of the company.
- Known but disorganized data: This phenomenon has a lot to do with the appearance of Big Data. When we work with Hadoop structures, we tend to store without an order, relegating this work and modelling to the future. But the day to focus on organizing all that information, in most cases, never comes. Although the first step is that these data do not escape, and collecting them is a wise move; The effort does not end up being profitable since it costs a lot to extract value from those Hadoop clusters.
- Unknown data: As they have not been identified, this data cannot be used by companies.
There are a couple of aspects to keep in mind, that most of this data is unstructured and that, while the first group is the smallest, the largest volume of data is concentrated in the last category.
A study by EMC Digital, which compares the evolution, how the size of digital data is increasing, getting to multiply by 10 in seven years; but especially how this proportion persists in the small proportion of those data that are available in analytical systems and embedded systems, and from mobile devices. In other words, it is a large size of information that not only grows but also contributes to increasing the challenge of being able to take advantage of it, partly due to the type of formats, where more than 90% is unstructured and, therefore, of difficult to use and difficult to consume.
What Is Dark Data?
Data has long been a prisoner of technology, applications, and people. Fortunately, the maturity of the systems allows us to release them and make them available to the entire organization.
And in this liberalization concept, dark data appears, dark data, a term created by Gartner and defined as “the information assets that companies process and store during their business activities, but that they cannot use for other purposes, as analytical vision or monetization. In other words, they are not convertible or reusable and, therefore, do not add value.
However, this concept of dark data evolves over time. Today, the value of much of that data can be accessed and exploited. Although there are still large areas that are not accessible and continue to fulfil this promise, they are obscure data.
What Problems Does Dark Data Cause?
Dark data acts as a burden on organizations since this type of data carries drawbacks such as:
- The increased cost of data management.
- Increased risk.
There are many problems associated with dark data that can become more frequent as time passes. The first of these is the most obvious: space. As unorganized data continues to grow, they take up storage space that could otherwise be used for more valuable assets. More storage means more overhead, which, especially in the Big Data era, is a major concern in most organizations.
Aside from increased storage costs, having large amounts of unstructured or unorganized data can potentially carry serious security risks. Along with outdated and seemingly useless documents, obscure data is also likely to contain confidential information that hackers might want to intercept.
At the other end of the spectrum, the organization may also be missing out on great opportunities by allowing dark data to accumulate in the database. There is likely to be great untapped potential within that mass of information, and the more it increases, without order or control, the more complicated it becomes for the organization to extract the value from that data.