Securing Hadoop Data: 10 Best Practices | eWeek

Securing Hadoop Data: 10 Best Practices

Securing Hadoop Data: 10 Best Practices
Written By
Darryl K. Taft
Darryl K. Taft
May 14, 2013
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More


Securing Hadoop Data: 10 Best Practices

1 - Securing Hadoop Data: 10 Best Practices

by Darryl K. Taft


Start Your Hadoop Planning Early

2 - Start Your Hadoop Planning Early

Determine the data privacy protection strategy during the planning phase of a deployment, preferably before moving any data into Hadoop. This will prevent the possibility of damaging compliance exposure for the company and avoid unpredictability in the rollout schedule.


Consider Privacy Concerns

3 - Consider Privacy Concerns

Identify what data elements are defined as sensitive within your organization. Consider company privacy policies, pertinent industry regulations and governmental regulations.


Check for Exposure

5 - Check for Exposure

Determine the compliance exposure risk based on the information collected.


Be Aware of Sensitive Data

4 - Be Aware of Sensitive Data

Discover whether sensitive data is embedded in the environment, assembled or will be assembled in Hadoop.


Advertisement

Real or Desensitized?

6 - Real or Desensitized?

Determine whether business analytics needs require access to real data or if desensitized data can be used. Then, choose the right remediation technique—masking or encryption. If in doubt, remember that masking provides the most secure remediation while encryption provides the most flexibility, should future needs evolve.


Support the Relevant Techniques

7 - Support the Relevant Techniques

Ensure that the data protection solutions under consideration support both masking and encryption remediation techniques, especially if the goal is to keep both masked and unmasked versions of sensitive data in separate Hadoop directories.


Be Consistent Across the Board

8 - Be Consistent Across the Board

Ensure the data protection technology used implements consistent masking across all data files—Joe becomes Dave in all files—to preserve the accuracy of data analysis across all data aggregation dimensions.


Tailored or Off the Rack?

9 - Tailored or Off the Rack?

Determine whether a tailored protection for specific data sets is required and consider dividing Hadoop directories into smaller groups where security can be managed as a unit.


Make Sure Everything Fits

10 - Make Sure Everything Fits

Ensure the selected encryption solution interoperates with the company’s access-control technology and that both allow users with different credentials to have the appropriate, selective access to data in the Hadoop cluster.


Make Decryption Available

11 - Make Decryption Available

Ensure that when encryption is required, the proper technology—Java, Pig, etc.—is deployed to allow for seamless decryption and ensure expedited access to data.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.