Hadoop Distributions: 10 Requirements for a Full-Service Deployment
A growing number of enterprises using Apache Hadoop have found it to be an indispensible analytical tool capable of unlocking business insights previously hidden deep in data that can improve decision making and help gain a competitive edge. Its many advantages have helped develop an entirely new ecosystem to work with cloud-hosted services from Google, Amazon, Rackspace, GoGrid, Joyent and others. However, organizations have also discovered that Hadoop has some serious limitations. It can require considerable expertise to integrate, use and manage Hadoop in data management systems. It also often requires considerable effort to protect the data and keep the cluster of servers operational. The biggest problem is that it can produce an overflow of new data into storage containers that goes unmanaged or is weakly managed. What are the critical dimensions that ensure Hadoop is deployable in a wide variety of enterprise environments? eWEEK’s resource for this slide show, MapR executive Jack Norris, believes Hadoop must be easy to integrate into the enterprise in addition to being more robust in its operation, performance, scalability and reliability. Specifically, full-scale Hadoop platforms should provide the following data points listed in this slide show.
Data Snapshots to Protect Data
Without snapshots (of all the data in a storage device), if a user accidentally deletes a file, the deletion is replicated three times and the data is lost. The same applies to application data corruption. Many Hadoop users have lost valuable data due to such incidents. With snapshots, users can easily recover data to a point in time, the same way they would in any enterprise-class file system.