Strengthening the Security of Hadoop Projects: 12 Best Practices

1 - Strengthening the Security of Hadoop Projects: 12 Best Practices
2 - Plan for Information Security From the Start
3 - Get In Early on Projects, Ask Questions About the Data
4 - Tie Into Your Corporate Email and Identity System
5 - Encrypt Your Data
6 - Log Everything and Keep Backups
7 - Set Up a Security Steering Committee
8 - Identify and Tag Your Sensitive Data
9 - Voice Your Security Requirements
10 - Expect More From Your Commercial Hadoop Distribution
11 - Empower and Layer Security, One Coat at a Time
12 - Understand Data's Lineage
13 - Protect All the Data
1 of 13

Strengthening the Security of Hadoop Projects: 12 Best Practices

by Chris Preimesberger

2 of 13

Plan for Information Security From the Start

Your Apache Hadoop environment eventually will store some form of sensitive data if it doesn't already. You should have a plan to secure your data within Apache Hadoop from the start in order to avoid time-consuming and costly security maintenance and incidents down the road.

3 of 13

Get In Early on Projects, Ask Questions About the Data

Apache Hadoop projects are probably already popping up in your organization; don't wait until after the fact to ask questions about the data. As a leader charged with protecting your organization's sensitive data, you need to know where the sensitive data is, who will have access to it, what the access rules in the source system are and if they carry into Hadoop. You will also need to know if any of the data is subject to HIPAA (Health Insurance Portability and Accountability Act), PCI DSS (Payment Card Industry Data Security Standard), SOX (Sarbanes-Oxley Act) or any other regulatory requirements.

4 of 13

Tie Into Your Corporate Email and Identity System

Chances are that you already have a corporate identity system, LDAP, Active Directory or a simple Gmail.com log-in in place; tie your Apache Hadoop users and groups to this. Establishing centralized user access control and management early on will help you in many administrative tasks as well as security audits down the line.

5 of 13

Encrypt Your Data

The argument that encryption could slow down systems is no longer valid. Apache Hadoop distributions support over-the-wire encryption and are now starting to enable data-at-rest encryption that has little to no impact on speeds. With faster hardware and built-in cryptographic acceleration available, there is never any reason to skip this critical step.

6 of 13

Log Everything and Keep Backups

IT and/or security managers need to enable all the logging and monitoring capabilities of the platform and maintain a centralized way of viewing, auditing and archiving this data. They need to continually monitor logs and transactions proactively for any suspicious activity and reactively for forensics, root cause analysis and sometimes evidence retention.

7 of 13

Set Up a Security Steering Committee

Security has many layers, including everything from physical security and risk mitigation when using your laptop, mobile phone or public WiFi to having security steps during the HR on-boarding and termination processes. Set up a security steering committee comprising members from IT, HR and even line-of-business employees (marketing, sales, etc.). If you don't already have an information security officer, at a minimum assign this role to someone in IT and send him or her to a security class to learn where to start.

8 of 13

Identify and Tag Your Sensitive Data

Data access should never be open by default; it should always be set on a "need-to-know" basis. Make sure you have processes in place that allow you to identify and tag sensitive data and request access to that data. Data security tagging capabilities are in very early stages within Apache Hadoop, but you can start now by segregating data in directories using naming conventions or separate metadata to tag and identify your sensitive data.

9 of 13

Voice Your Security Requirements

Apache Hadoop distributions, developers, users and the security community are all looking for real customer use cases to voice their security requirements. Reach out or, even better, contribute code back into Hadoop under the Apache license, even if it is only opening a ticket and writing a requirement. There are many security features in the open-source Apache Hadoop roadmap, and the ones that garner more interest will go to the top of the list.

10 of 13

Expect More From Your Commercial Hadoop Distribution

Add security to the list of things you should expect from your Hadoop support subscription. Setting up a secure Hadoop cluster is not trivial and touches many areas, including Kerberos and keytab configuration, SSH (Secure Shell cryptographics), SSL (Secure Sockets Layer) certificates, RSA Key management, SSO (single sign-on) integration, secure logging, cryptographic ciphers, role-based access control and secure cluster provisioning—just to name a few.

11 of 13

Empower and Layer Security, One Coat at a Time

Be a friend to business and productivity by empowering and enabling your business to securely tap into data sets in Hadoop in order to extract knowledge in ways that were not possible before. Add security in layers that reduce risk without completely blocking business; if you put up complete barriers, users will go around security all together with skunkworks projects, which is a more dangerous proposition.

12 of 13

Understand Data's Lineage

Hadoop provides many abilities to ingest data from various sources. It is a good security practice to keep track of the data lineage (from where it came). It is important to understand the sources for all data sets, including derived data sets to support compliance and audit requirements. Hadoop provides tools that will automatically track upstream sources of new data sets and provide full lineage and auditing-enable them.

13 of 13

Protect All the Data

Not all of the important and/or interesting data is stored directly in the Hadoop Distributed File System (HDFS). Many important data repositories exist outside HDFS in the form of metadata stores and files; the protection of all sensitive data inside and out of HDFS requires careful consideration.

Top White Papers and Webcasts