111 Things IT Managers Considering Hadoop Deployment Should Know
2Why Consider Using Hadoop in the First Place?
Apache Hadoop enables big data apps for both operations and analytics and is one of the fastest-growing technologies providing competitive advantage for businesses across industries. Hadoop is a key component of the next-generation data architecture, providing a massively scalable distributed storage and processing platform. Hadoop enables organizations to build new data-driven applications, while freeing up resources from existing systems.
3Is Your Organization Ready for Hadoop?
Key considerations critical to the success of a big data/Hadoop project are upper management’s commitment and vision to use data to generate new sources of revenue; the presence of data governance programs and other enterprise programs that can guide Hadoop (and other data-driven projects); and whether business-driven use cases for Hadoop have been identified and agreed upon.
4Are Unstructured Data Sets Increasing in Your Organization?
The largest driver of big data solutions is the development of unstructured data, which doubles about every two years. Most organizations have so much unstructured data that it is unlikely they will analyze all of it. The proliferation of this unstructured data is difficult for traditional systems to capture, store and process, yet Hadoop can do so very easily and in a cost-effective manner.
5Is There an Increase in Data Sources?
We’re talking about data from Websites, sensors, non-traditional compute devices, social networks and so on. IDC estimates that by 2020 there will be 32 billion connected devices because of the growth in the Internet of things (IoT). By capturing vast amounts of information from new and different data sources and using analytics on the information from these different sources, enterprises can obtain critical insights into the strengths and weaknesses of their business, identify growth opportunities for new product lines and act on the data.
6Do You Have an Identified Use Case for Hadoop?
For organizations with a need to quickly gain insights and create opportunities from big data, the first step will be to choose the best business solutions, as well as infrastructure technologies that will support fast data and big data at scale and, in turn, enhance operational applications. It is important to start with a small project. For example, develop and perform an initial test before adding more data. Taking on too much will lead to higher-than-expected costs.
7Would Combining Operational/Analytic Data Sets for New Apps Be Beneficial?
It takes a certain critical mass of big data volume before exploring and profiling that data produces an accurate assessment of big data’s unique value to an organization. If an organization is already committed to capturing, governing, and analyzing big data and already has solid competencies in data management and analytics, proceeding to a Hadoop project can be very positive.
8Is Management Interested in Saving Money Using All Its Data?
The power of big data solutions continues to grow significantly every year, and the cost of collecting, managing and storing data also is increasing. As a result, organizations are rethinking their enterprise architecture to find ways to reduce cost and increase the flexibility of their data management/storage processing solution. As Hadoop deployments grow within an organization, the architectural differences between Hadoop distributions begin to show dramatic cost differences across capital and operational expenses. These differences can reduce total cost of ownership by 20 to 50 percent.
9Do You Have a Culture Willing to Adopt New Technologies?
Prior to taking on a big data project, most organizations will need to look at and upgrade their analytical skills. That means that the organization must view analytics as central to solving problems and have the ability to identify opportunities. Adult learners who participate in real-world, analytics-based decisions that let them learn by doing result in more successful projects.
10Do You Have Terabyte-Plus Data Sets?
Of all the “Vs” for big data (volume, variety, velocity and veracity), volume is the most relative. One organization’s big data might be a few hundred gigabytes, while another might not consider Hadoop or big data until they reach a petabyte of data. Some organizations might want to use all their data (structured, unstructured, poly-structured) in a Hadoop environment, and some may just want to use the unstructured data sets. Practical advice suggests setting some limit on the amount of data for a first-time technology deployment; Hadoop is no different in this regard.
11Are Your Existing Tools Delivering a Full View?
One of the main benefits of Hadoop is the ability to combine legacy operational data stores with new analytic data sets in a data lake. These data sets could be queried for insight within the data lake and/or be prepared for further analysis within visualization tools. If existing tools aren’t able to dip into the data lake and deliver a full view of all the combined data sets and sources, it might be time to look to the Hadoop ecosystem or for adopting a converged data platform.
12Is Price per TB of Conventional DB/Data Warehouse Tools Too High?
Data that was previously too expensive to store is now available for analysis to improve business insights, at 1/10th to 1/50th of the cost on a per-terabyte basis. With corporate data growing at around 40 percent per year, traditional systems are unable to cope and scale affordably. Hadoop enables the economical capturing and storing of data from every touch point in an organization, while eliminating separate silos to process that data.