Symantec is adding new machine learning technology to its data loss prevention (DLP) product to ease efforts to classify sensitive data and define policies.
The feature, called Vector Machine Learning, will be included in Symantec Data Loss Prevention 11 when it becomes available during the first half of 2011. The technology, the company explained, aims to go beyond traditional fingerprinting and data describing approaches used to find sensitive information.
"Vector Machine Learning (VML) is used to develop policies that define sensitive data, or data that theDLP system should look for or detect," explained Robert Hamilton, senior product marketing manager for Symantec. "Vector Machine Learning is trained using positive and negative sample documents to create a profile that is then used within a DLP policy."
"[For] example: A software developer needs to protect its proprietary code from leaving the organization via e-mail or USB drives," he said. "While it needs to protect proprietary code, it doesn't want the DLP system to flag open source which can move around freely. So it uses samples of proprietary source code as the positive examples, and samples of the open source as the negative samples. The profile developed by VML is then configured into a policy that they name 'Proprietary Source Code.'"
The feature can help automate policy creation, Jon Oltsik, an analyst with Enterprise Strategy Group, told eWEEK.
"When you get beyond canned policies, many DLP technologies are hard to program and somewhat inflexible," he said. "Machine learning can help create a map of users and data that can help pinpoint where sensitive content is, who accesses it and whether actual use supports business processes and security policies."
Another capability slated for Version 11 is a new application file access control feature to ensure applications such as iTunes and Skype do not transmit sensitive data. Symantec also added a FlexResponse feature to allow users to apply encryption or Enterprise Rights Management (ERM) to files found on the endpoint as part of the discovery scanning process.
Other work is being done to streamline the remediation process by identifying locations where data is at the greatest risk and automatically notifying the associated data owners. This is done through a risk scoring feature that prioritizes folders based on the amount and severity of sensitive data they contain as well as how many people have the ability to read or write to files in the folders, the company said.
"Organizations have a lot of unstructured data, often terabytes, and the sensitive data is hidden within that vast sea," Hamilton said. "DLP can be intimidating-customers are concerned that DLP is going to tell them they have thousands of unprotected files, and they'll have concerns about where to start their cleanup efforts. Risk Scoring helps them quickly find hot spots of risk out on their network file shares in order to understand where to start their cleanup efforts."