Symantec is adding new machine learning technology to its data loss prevention
(DLP) product to ease efforts to
classify sensitive data and define policies.
The feature, called Vector Machine Learning, will be included in
Symantec Data Loss Prevention 11 when it becomes available during the first
half of 2011. The technology, the company explained, aims to go beyond
traditional fingerprinting and data describing approaches used to find
sensitive information.
"Vector Machine Learning (VML) is used to develop policies that define
sensitive data, or data that the
DLP system should look for or detect," explained Robert Hamilton,
senior product marketing manager for Symantec. "Vector Machine Learning is
trained using positive and negative sample documents to create a profile that
is then used within a DLP policy."
"[For] example: A software developer needs to protect
its proprietary code from leaving the organization via e-mail or USB
drives," he said. "While it needs to protect proprietary code, it
doesn’t want the DLP system to flag open
source which can move around freely. So it uses samples of proprietary
source code as the positive examples, and samples of the open source as the
negative samples. The profile developed by VML is then configured into a policy
that they name 'Proprietary Source Code.'"
The feature can help automate policy creation, Jon Oltsik, an analyst with
Enterprise Strategy Group, told eWEEK.
"When you get beyond canned policies, many DLP
technologies are hard to program and somewhat inflexible," he said. "Machine
learning can help create a map of users and data that can help pinpoint where
sensitive content is, who accesses it and whether actual use supports business
processes and security policies."
Another capability slated for Version 11 is a new application file access
control feature to ensure applications such as iTunes and Skype do not transmit
sensitive data. Symantec also added a FlexResponse feature to allow users to
apply encryption or Enterprise Rights Management (ERM) to files found on the
endpoint as part of the discovery scanning process.
Other work is being done to streamline the remediation process by
identifying locations where data is at the greatest risk and automatically
notifying the associated data owners. This is done through a risk scoring
feature that prioritizes folders based on the amount and severity of sensitive
data they contain as well as how many people have the ability to read or write
to files in the folders, the company said.
"Organizations have a lot of unstructured data, often terabytes, and
the sensitive data is hidden within that vast sea," Hamilton
said. "DLP can be intimidating—customers
are concerned that DLP is going to tell them
they have thousands of unprotected files, and they'll have concerns about where
to start their cleanup efforts. Risk Scoring helps them quickly find hot spots
of risk out on their network file shares in order to understand where to start
their cleanup efforts."