Strengthening the Security of Hadoop Projects: 12 Best Practices

 
 
By Chris Preimesberger  |  Posted 2014-09-17
 
 
 
 
 
 
 
 
 
  • Previous
    1 - Strengthening the Security of Hadoop Projects: 12 Best Practices
    Next

    Strengthening the Security of Hadoop Projects: 12 Best Practices

    by Chris Preimesberger
  • Previous
    2 - Plan for Information Security From the Start
    Next

    Plan for Information Security From the Start

    Your Apache Hadoop environment eventually will store some form of sensitive data if it doesn't already. You should have a plan to secure your data within Apache Hadoop from the start in order to avoid time-consuming and costly security maintenance and incidents down the road.
  • Previous
    3 - Get In Early on Projects, Ask Questions About the Data
    Next

    Get In Early on Projects, Ask Questions About the Data

    Apache Hadoop projects are probably already popping up in your organization; don't wait until after the fact to ask questions about the data. As a leader charged with protecting your organization's sensitive data, you need to know where the sensitive data is, who will have access to it, what the access rules in the source system are and if they carry into Hadoop. You will also need to know if any of the data is subject to HIPAA (Health Insurance Portability and Accountability Act), PCI DSS (Payment Card Industry Data Security Standard), SOX (Sarbanes-Oxley Act) or any other regulatory requirements.
  • Previous
    4 - Tie Into Your Corporate Email and Identity System
    Next

    Tie Into Your Corporate Email and Identity System

    Chances are that you already have a corporate identity system, LDAP, Active Directory or a simple Gmail.com log-in in place; tie your Apache Hadoop users and groups to this. Establishing centralized user access control and management early on will help you in many administrative tasks as well as security audits down the line.
  • Previous
    5 - Encrypt Your Data
    Next

    Encrypt Your Data

    The argument that encryption could slow down systems is no longer valid. Apache Hadoop distributions support over-the-wire encryption and are now starting to enable data-at-rest encryption that has little to no impact on speeds. With faster hardware and built-in cryptographic acceleration available, there is never any reason to skip this critical step.
  • Previous
    6 - Log Everything and Keep Backups
    Next

    Log Everything and Keep Backups

    IT and/or security managers need to enable all the logging and monitoring capabilities of the platform and maintain a centralized way of viewing, auditing and archiving this data. They need to continually monitor logs and transactions proactively for any suspicious activity and reactively for forensics, root cause analysis and sometimes evidence retention.
  • Previous
    7 - Set Up a Security Steering Committee
    Next

    Set Up a Security Steering Committee

    Security has many layers, including everything from physical security and risk mitigation when using your laptop, mobile phone or public WiFi to having security steps during the HR on-boarding and termination processes. Set up a security steering committee comprising members from IT, HR and even line-of-business employees (marketing, sales, etc.). If you don't already have an information security officer, at a minimum assign this role to someone in IT and send him or her to a security class to learn where to start.
  • Previous
    8 - Identify and Tag Your Sensitive Data
    Next

    Identify and Tag Your Sensitive Data

    Data access should never be open by default; it should always be set on a "need-to-know" basis. Make sure you have processes in place that allow you to identify and tag sensitive data and request access to that data. Data security tagging capabilities are in very early stages within Apache Hadoop, but you can start now by segregating data in directories using naming conventions or separate metadata to tag and identify your sensitive data.
  • Previous
    9 - Voice Your Security Requirements
    Next

    Voice Your Security Requirements

    Apache Hadoop distributions, developers, users and the security community are all looking for real customer use cases to voice their security requirements. Reach out or, even better, contribute code back into Hadoop under the Apache license, even if it is only opening a ticket and writing a requirement. There are many security features in the open-source Apache Hadoop roadmap, and the ones that garner more interest will go to the top of the list.
  • Previous
    10 - Expect More From Your Commercial Hadoop Distribution
    Next

    Expect More From Your Commercial Hadoop Distribution

    Add security to the list of things you should expect from your Hadoop support subscription. Setting up a secure Hadoop cluster is not trivial and touches many areas, including Kerberos and keytab configuration, SSH (Secure Shell cryptographics), SSL (Secure Sockets Layer) certificates, RSA Key management, SSO (single sign-on) integration, secure logging, cryptographic ciphers, role-based access control and secure cluster provisioning—just to name a few.
  • Previous
    11 - Empower and Layer Security, One Coat at a Time
    Next

    Empower and Layer Security, One Coat at a Time

    Be a friend to business and productivity by empowering and enabling your business to securely tap into data sets in Hadoop in order to extract knowledge in ways that were not possible before. Add security in layers that reduce risk without completely blocking business; if you put up complete barriers, users will go around security all together with skunkworks projects, which is a more dangerous proposition.
  • Previous
    12 - Understand Data's Lineage
    Next

    Understand Data's Lineage

    Hadoop provides many abilities to ingest data from various sources. It is a good security practice to keep track of the data lineage (from where it came). It is important to understand the sources for all data sets, including derived data sets to support compliance and audit requirements. Hadoop provides tools that will automatically track upstream sources of new data sets and provide full lineage and auditing-enable them.
  • Previous
    13 - Protect All the Data
    Next

    Protect All the Data

    Not all of the important and/or interesting data is stored directly in the Hadoop Distributed File System (HDFS). Many important data repositories exist outside HDFS in the form of metadata stores and files; the protection of all sensitive data inside and out of HDFS requires careful consideration.
 

IT research firms Gartner, IDC and Wikibon are among those that have estimated the current big data analytics software market at about $18 billion, with the potential to exceed $50 billion by 2017. The market is ramping up fast because enterprises are now seeing results from early projects and are willing to invest in additional IT that supports big data initiatives. There are still some barriers to adoption, with security and compliance concerns being chief among them. Before security-conscious organizations will commit to building more big data programs, however, they must be convinced that data stored and accessed within Apache Hadoop environments can be protected against attacks that could lead to data breaches. This must be accomplished with regulatory compliance in mind at all times. In this slide show, put together with eWEEK reporting and industry insight from Cloudera, we offer a list of 12 best practices for continually monitoring and strengthening data security and reducing the risk of becoming the next major data breach headline.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel