Securing Hadoop Data: 10 Best Practices

1 - Securing Hadoop Data: 10 Best Practices
2 - Start Your Hadoop Planning Early
3 - Consider Privacy Concerns
5 - Check for Exposure
4 - Be Aware of Sensitive Data
6 - Real or Desensitized?
7 - Support the Relevant Techniques
8 - Be Consistent Across the Board
9 - Tailored or Off the Rack?
10 - Make Sure Everything Fits
11 - Make Decryption Available
1 of 11

Securing Hadoop Data: 10 Best Practices

by Darryl K. Taft

2 of 11

Start Your Hadoop Planning Early

Determine the data privacy protection strategy during the planning phase of a deployment, preferably before moving any data into Hadoop. This will prevent the possibility of damaging compliance exposure for the company and avoid unpredictability in the rollout schedule.

3 of 11

Consider Privacy Concerns

Identify what data elements are defined as sensitive within your organization. Consider company privacy policies, pertinent industry regulations and governmental regulations.

4 of 11

Check for Exposure

Determine the compliance exposure risk based on the information collected.

5 of 11

Be Aware of Sensitive Data

Discover whether sensitive data is embedded in the environment, assembled or will be assembled in Hadoop.

6 of 11

Real or Desensitized?

Determine whether business analytics needs require access to real data or if desensitized data can be used. Then, choose the right remediation technique—masking or encryption. If in doubt, remember that masking provides the most secure remediation while encryption provides the most flexibility, should future needs evolve.

7 of 11

Support the Relevant Techniques

Ensure that the data protection solutions under consideration support both masking and encryption remediation techniques, especially if the goal is to keep both masked and unmasked versions of sensitive data in separate Hadoop directories.

8 of 11

Be Consistent Across the Board

Ensure the data protection technology used implements consistent masking across all data files—Joe becomes Dave in all files—to preserve the accuracy of data analysis across all data aggregation dimensions.

9 of 11

Tailored or Off the Rack?

Determine whether a tailored protection for specific data sets is required and consider dividing Hadoop directories into smaller groups where security can be managed as a unit.

10 of 11

Make Sure Everything Fits

Ensure the selected encryption solution interoperates with the company's access-control technology and that both allow users with different credentials to have the appropriate, selective access to data in the Hadoop cluster.

11 of 11

Make Decryption Available

Ensure that when encryption is required, the proper technology—Java, Pig, etc.—is deployed to allow for seamless decryption and ensure expedited access to data.

Top White Papers and Webcasts