Google Adds Redaction, Masking Features to Data Loss Prevention API

New data loss prevention features in Google Cloud are designed to help organizations better manage sensitive and personal identity data.

New Google Cloud DLP Features

Google is giving organizations more ways to protect sensitive data such as credit card and Social Security numbers in workloads on its cloud platform.

The company Oct. 19 said it has added new data de-identification features to the existing capabilities offered by its cloud Data Loss Prevention (DLP) API.

Google's DLP API, initially released this March, is designed to help companies find, classify and protect some 50 different types of sensitive data in cloud data stores, email applications and in other locations.

The company has previously described the technology as something that businesses can use to minimize the potential exposure of sensitive data they collect or copy internally and externally.

Administrators can use the API to warn users when they are about to store sensitive data in an application or storage system, the company has noted. Organizations can use the API to scan datasets on Google's Cloud Storage, BigQuery analytics service and the Google Datastore NoSQL document database.

This week's announcement pertains to the addition of new data redaction, data masking and tokenization features in Google's DLP API.  The goal is to give companies a way to remove or block personal identification information from a dataset to make it more difficult for someone to associate the remaining data with any particular information.

"If like many enterprises you follow the principle of least privilege or need-to-know access to data, the DLP API can help you enforce these principles in production applications and data workflows," said Scott Ellis, Google product manager in a blog.

For example with the new redaction and data suppression feature in the DLP API, an organization could ensure that its technical customer support staff does not see a customer's identifying information when troubleshooting a problem, Ellis said.

Similarly, an organization that wants to analyze large population trends could use the API to suppress records containing personally identifying data so researchers work only with a properly anonymized data set.

The two other new features that Google has added to the DLP API address similar goals. The data-masking feature for instance gives users a way to partially obscure a data element—such as the first five digits of a Social Security Number or the last seven digits of a telephone. Such masking can ensure that individual pieces of identifying data cannot be easily tied back to the owner of that data.

The new tokenization feature meanwhile is designed to protect sensitive data by replacing a direct identifier with a token or a pseudonym.  "This can be very useful in cases where you need to retain a record identifier or join data, but don’t want to reveal the sensitive underlying elements," Ellis said.

Regulations such as the Payment Card Industry Data Security Standard consider data tokenization to be an acceptable alternative to encryption for the purposes of protecting credit card data.

Jaikumar Vijayan

Jaikumar Vijayan

Vijayan is an award-winning independent journalist and tech content creation specialist covering data security and privacy, business intelligence, big data and data analytics.