eWEEK Data Points: Six Reasons to Classify Unstructured Data

eWEEK RESOURCE PAGE: Why classifying various formats such as emails, documents, spreadsheets, presentations, PDFs and audio and image files lacking fixed fields must be done in advance in storage so they can be easily searched.


Many U.S. companies that do business in Europe have undertaken data classification projects to comply with the EU’s General Data Protection Regulation (GDPR), but those initiatives have largely been limited to databases and other core business systems that comprise just 20% of an organization’s information assets.  

The remaining 80% represents unstructured information such as emails, documents, spreadsheets, presentations, PDFs and audio and image files lacking fixed fields that can be easily searched for sensitive information, such as Social Security and patient identification numbers. This large pool of non-indexed data all must be stored safely but is rarely classified, despite the availability of automated data classification technology and the benefits for information retrieval, compliance and risk management. 

Go here to read eWEEK's Top Cloud Storage Companies list.

Go here to see eWEEK’s listing of Top Data Storage Companies.

In this eWEEK Data Points article, Mike Sprunger, Senior Manager of Cloud and Network Security at the Cloud + Data Center Division of Fortune 500 technology provider Insight Enterprises, outlines six reasons to extend classification efforts to unstructured data residing on servers, laptops, tablets, smartphones and other edge devices. 

Data Point No. 1: Reason 1: Regulatory Readiness 

Privacy requirements are continually evolving, from SOX, HIPAA, GLBA and PCI to GDPR, the GDPR-inspired California Consumer Privacy Act of 2018, other possible state legislation and new federal regulations now being considered in Congress. Given that 80% of business information sits in unstructured files, personally identifiable information or other data requiring protection either now or in the future is likely scattered throughout these unclassified repositories. Implementing automated classification tools will facilitate compliance with these rules–including GDPR’s “right to be forgotten” policy requiring deletion of all personal data at a consumer’s request–and minimize deadline pressure when new mandates take effect.  

Data Point No. 2: Reason 2: Faster Data Searches  

Metadata tags applied through classification technology sit outside the file, reducing search time by eliminating the need for business systems to open and read petabytes worth of file content. This not only expedites legal and regulatory compliance efforts but also speeds information flow and collaboration across the organization. 

Data Point No. 3: Reason 3: Improved Security Controls 

If your CEO is creating a presentation deck for a board of directors meeting, you don’t want proprietary information like company revenues or product roadmaps falling into the wrong hands. Categorizing data by sensitivity level enables effective application of security controls like encryption, identity and access management (IAM) and data loss prevention (DLP) to prevent data leakage. These downstream security applications can combine the metatags and sensitivity labels created by the classification system with company governance rules to automatically trigger the appropriate security actions, such as blocking highly sensitive information from leaving the company by email, flash drives or other means.  

Data Point No. 4: Reason 4: Email Protection 

Email has become a de facto data repository for virtually every organization, leaving sensitive information contained in email communications without the safeguards needed to prevent access by unauthorized users. Today’s data classification technology can apply company-specific handling rules to assign a sensitivity level for each email as users type. (The same is true of documents, spreadsheets, presentations and other unstructured files.) In addition, incoming email attachments can be protected by setting up rules that disconnect the attachment from the email message, move it to a protected repository, and replace the file with a link in the email that can be opened only with appropriate permissions. 

Data Point No. 5: Reason 5: Classification-based File Storage 

With the right tools and rules, unstructured files that have undergone classification can be  automatically moved to the data repository with the protection level suited to the file content. Metadata created during the classification process is scanned while a file is in transit, the appropriate sensitivity label is applied, and higher-sensitivity files like financial documents are placed in more secure repositories. This adds another layer of defense against data leakage. Some tools also support digital rights management, enabling metatags to include additional protections such as expiration dates on shared files.    

Data Point No. 6: Reason 6: Retention Policy Enforcement 

Not all data has the same shelf life. For example, while most internal emails may need to be available for historical purposes up to three years, contract negotiation details sent through internal email may need to be retained much longer. Data labels allow for identification of these specific use cases and associated movement into the appropriate repository for long-term retention. Metadata also allows for automatic purging of data that is no longer needed, controlling storage growth and in some cases reclaiming substantial storage space. This reclamation reduces operational and management costs. 

If you have a suggestion for an eWEEK Data Points article, email [email protected].