How to Accelerate and Streamline Data Classification Projects

Written by

Published January 18, 2010

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Organizations can quickly become overwhelmed with managing and protecting all of the unstructured data in their possession. Unstructured data includes all of the documents, spreadsheets, presentations and more that are stored on shared file servers, network-attached storage (NAS) devices, SharePoint sites, etc. It accounts for roughly 80 percent of business data. In addition to being the majority of business data, unstructured data grows in excess of 50 percent per year, making it hard to keep pace with this key business resource.

To deal with unstructured data, many organizations initiate data classification projects in the hopes of identifying their most sensitive data, fixing any problems and implementing proper controls. Regrettably, there are both business and technical challenges that prevent data classification deployments from reaching their full potential.

From a business perspective, a lack of actionable results is the primary challenge. Data classification solutions produce a list of files with sensitive content, but the question of what the files mean to the business and what to do with them is not inherently obvious. On the technical side, the issue is that data classification solutions scan every file looking for relevant content and are, consequently, slow to deliver results. And on subsequent searches, these solutions must look at all files again, making it virtually impossible to keep pace with data growth and change.

The following are five measures that organizations can take to accelerate the pace of producing actionable data classification results:

Measure No. 1: Determine who owns the data

Data owners are a critical component to managing unstructured data. They understand the importance of data assets to the business and are, therefore, integral to the process of classifying this data. They can help determine who should and should not have access, what type of protections the data should have, and point out when the data is no longer relevant to the business. When it comes to sensitive data, owners can help determine whether data is at risk and what remediation steps are required.

Identifying owners is not easy to do though. The locations of data and the names of data folders, directories or sites often provide little indication of true data ownership, and file system metadata about data ownership goes stale quickly. The most common methods for identifying data owners-phone calls and e-mail messages-are not efficient or effective processes.

The best way to track data owners is to have an automated, repeatable process in place. One of the most effective ways to determine data owners is to track who is accessing the data. Over time, the top users of data will become obvious and these users will be able to tell organizations who own the data.

Document What Data Is Of Interest

Measure No. 2: Document what data is of interest

Documenting the key words, phrases and patterns that are of interest to a business requires both investigative work and an understanding of what’s driving the need to find data. The natural starting point is to work with data owners and security and risk managers to identify and document what data is of interest to an organization. In many organizations, regulatory compliance is a driver. Regulations often specify which data is sensitive and what measures are required to protect it. Intellectual property (IP), customer data and employee information are other common types of information requiring special attention.

Establishing different levels of sensitivity that are based on the type of content your organization needs to manage and protect will help provide additional structure to this task. Industry best practices show that a good rule of thumb is to constrain an organization’s hierarchy to four levels. More than that and it becomes difficult and impractical to manage. Examples of four levels to begin with can include Secret data, Confidential data, Private data and Public data.

Measure No. 3: Focus and accelerate with metadata

Metadata-data about your data such as file sizes, types and locations-should be used to focus and accelerate your data classification projects. Metadata adds another dimension to the search process, effectively providing a shortlist of where to look and what to expect.

For example, if you want to identify credit card data that is at-risk, you can use permissions metadata to find files that are accessible by too many people. You can then look inside those files for credit card data. In fact, any sensitive data found in overly-accessible files has a clear remediation path: fix the access permissions to the data so that it is based on least-privilege (that is, business need-to-know). The following are examples of metadata and how it can be used to focus and accelerate data classification:

1. Data access permissions

A careful analysis of file, folder and site permissions will tell organizations who can access their sensitive data and which data is overly-accessible.

2. Data access activity

Data access activity provides important information such as which folders are the most frequently used and which folders are not being used at all. It also indicates which data was recently added or modified. That intelligence is tremendously useful, for example, in reducing the time spent searching. After the initial classification scan has occurred, subsequent searches can be restricted to just that data which needs to be classified (that is, the data that has not yet been searched). For specific users or groups, organizations can determine what data they have been accessing to see who has actually been using the sensitive data.

3. Data ownership

Data ownership information helps limit searches to data owned by specific people. So, if organizations are working with individuals to help them get control over their sensitive data, this piece of metadata will narrow sensitive data searches to just the relevant data.

Communicate and Remediate

Measure No. 4: Communicate and remediate

Finding sensitive data is obviously an important part of classification projects but it’s not the final stage. After obtaining results, organizations need to get it into the hands of decision makers-which are typically data owners and Governance, Risk Management, and Compliance (GRC) teams-so that these people can understand the situation and begin formulating remediation strategies and plans.

Data owners are typically in the best position to identify exactly what the content is, whether the data is stored in the right place, and who should and should not have access to it. They can also help build a remediation strategy and process, especially once they are armed with specific examples involving their own data. GRC staff can provide the overall oversight needed to ensure that data is being protected in accordance with the organization’s objectives. And, these teams can use result reports as the basis of documentation for audit requirements.

Measure No. 5: Regularly recheck data

Businesses should establish a process of periodically rechecking data to ensure an accurate view of sensitive data. Data is constantly growing and changing, thus there is a need to do so. Ideally, organizations should limit searches to newly-added data to determine if it contains sensitive information and to existing data that has been modified to determine if it has either gained or lost relevance to classification projects. Organizations should provide data owners and GRC staff with updated intelligence based on rescanning.

Final thoughts

To find the important data among all an organization’s unstructured data, a data classification solution is needed because there is simply too much data to process and keep pace with manually. While there are many solutions to choose from, a solution that leverages the power of metadata is critical for achieving actionable results. Without metadata, data classification projects can take far too long, and the results they produce typically don’t have the context required to remediate problems. Metadata can dramatically cut the time it takes to produce results and can help provide the context required for problem remediation.

Raphael Reich is Senior Director of Marketing at Varonis Systems. Raphael brings over 16 years of product marketing and management experience to Varonis. Prior to joining Varonis, he held product marketing and management roles at Cisco, Check Point, Echelon and Network General. Raphael was also a software engineer at Digital Equipment Corporation. He holds a Bachelor’s degree in Computer Science from UC Santa Cruz and an MBA from UCLA. He can be reached at [email protected].

How to Accelerate and Streamline Data Classification Projects

Get the Free Newsletter!

Get the Free Newsletter!

MOST POPULAR ARTICLES

9 Best AI 3D Generators You Need...

RingCentral Expands Its Collaboration Platform

8 Best AI Data Analytics Software &...

Zeus Kerravala on Networking: Multicloud, 5G, and...

Datadog President Amit Agarwal on Trends in...

Advertisers

Menu

Our Brands