How to Accelerate and Streamline Data Classification Projects

Protecting sensitive information is a challenge for many organizations. To protect certain files, businesses must first find them, and that's no easy task with terabytes of data and hundreds of thousands of files on SharePoint sites, network attached storage devices and file systems. Here, Knowledge Center contributor Raphael Reich explains how to streamline data classification projects, including how to identify data owners, know which metadata accelerates searches, and streamline reporting and remediation.


Organizations can quickly become overwhelmed with managing and protecting all of the unstructured data in their possession. Unstructured data includes all of the documents, spreadsheets, presentations and more that are stored on shared file servers, network-attached storage (NAS) devices, SharePoint sites, etc. It accounts for roughly 80 percent of business data. In addition to being the majority of business data, unstructured data grows in excess of 50 percent per year, making it hard to keep pace with this key business resource.

To deal with unstructured data, many organizations initiate data classification projects in the hopes of identifying their most sensitive data, fixing any problems and implementing proper controls. Regrettably, there are both business and technical challenges that prevent data classification deployments from reaching their full potential.

From a business perspective, a lack of actionable results is the primary challenge. Data classification solutions produce a list of files with sensitive content, but the question of what the files mean to the business and what to do with them is not inherently obvious. On the technical side, the issue is that data classification solutions scan every file looking for relevant content and are, consequently, slow to deliver results. And on subsequent searches, these solutions must look at all files again, making it virtually impossible to keep pace with data growth and change.

The following are five measures that organizations can take to accelerate the pace of producing actionable data classification results:

Measure No. 1: Determine who owns the data

Data owners are a critical component to managing unstructured data. They understand the importance of data assets to the business and are, therefore, integral to the process of classifying this data. They can help determine who should and should not have access, what type of protections the data should have, and point out when the data is no longer relevant to the business. When it comes to sensitive data, owners can help determine whether data is at risk and what remediation steps are required.

Identifying owners is not easy to do though. The locations of data and the names of data folders, directories or sites often provide little indication of true data ownership, and file system metadata about data ownership goes stale quickly. The most common methods for identifying data owners-phone calls and e-mail messages-are not efficient or effective processes.

The best way to track data owners is to have an automated, repeatable process in place. One of the most effective ways to determine data owners is to track who is accessing the data. Over time, the top users of data will become obvious and these users will be able to tell organizations who own the data.