SHARE

Banishing the Confusion of Eight Big Data Myths

Written By

Dec 9, 2014

3 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Banishing the Confusion of Eight Big Data Myths
Myth 1: We Must Hire a Hadoop Expert
Myth 2: Buying a Big Data Solution Means I’m Using Big Data
Myth 3: Big Data Is a Fad That Will Go Away in a Few Years
Myth 4: Businesses Need One Data Scientist for All Big Data Needs
Myth 5: Traditional Enterprise Data Warehouses Will Go Away
Myth 6: Apache Spark Is the Future of Hadoop
Myth 7: Big Data Is Only for the Largest of Enterprises
Myth 8: Big Data Is for Hadoop Experts

Banishing the Confusion of Eight Big Data Myths

1 - Banishing the Confusion of Eight Big Data Myths

by Chris Preimesberger

Myth 1: We Must Hire a Hadoop Expert

2 - Myth 1: We Must Hire a Hadoop Expert

Hadoop is built on intricate concepts such as MapReduce, YARN, Spark and Hadoop Distributed File Systems (HDFS), and the constant change and announcements of subsystem-level technology further convolute the picture. Plenty of products and tools reduce the complexity and shield users entirely from this. There are open-source application frameworks and commercial products that significantly improve productivity and accessibility when working with Hadoop, up to the point where companies can use internal resources to execute on their big data strategy: enterprise Java developers, data warehouse developers and data analysts can quickly and easily leverage Hadoop.

Myth 2: Buying a Big Data Solution Means I’m Using Big Data

3 - Myth 2: Buying a Big Data Solution Means I'm Using Big Data

You’ve just convinced your organization to adopt a big data strategy, and you’ve purchased a solution. What’s next? Enterprises often get stuck at a point where they have the hardware and Hadoop software in place but don’t have the skill set to take advantage of it. Using big data means that you are using your data, executing a data strategy and helping your business with cost savings, revenue opportunities or additional insights. The key is lowering the bar for your organization to execute and deliver data products as quickly as possible. Delivering and running these production applications reliably and on time is the next set of challenges. When you achieve this level, you will know because your users will want more.

Myth 3: Big Data Is a Fad That Will Go Away in a Few Years

4 - Myth 3: Big Data Is a Fad That Will Go Away in a Few Years

Ninety percent of the world’s data was created in the last three years. Sticking your head in sand and hoping that it will go away is a career-ending move. We may drop the “big” in big data in a few years, but whether you like it or not, your company will be in the business of data.

Myth 4: Businesses Need One Data Scientist for All Big Data Needs

5 - Myth 4: Businesses Need One Data Scientist for All Big Data Needs

For too long, businesses have been upholding the myth of the data science hero—the virtuoso who slays dragons and emerges with a treasure of an amazing app based on insights from big data. The truth is they can’t afford to rely on a single data scientist or developer because employees can leave an organization at any time. By building a “big data app factory” of processes and teams, companies can ensure that great work can be done over and over again—regardless of personnel changes.

Myth 5: Traditional Enterprise Data Warehouses Will Go Away

6 - Myth 5: Traditional Enterprise Data Warehouses Will Go Away

It’s unlikely that the technology of the past will completely go away. Enterprises will continue to rely on traditional enterprise data warehouses (EDWs). However, with the rapid evolution of Hadoop and accompanying products and technologies, the role of the EDW in the enterprise will significantly diminish. The flow of data will change, and it’s likely that Hadoop will be its first stop.

Myth 6: Apache Spark Is the Future of Hadoop

7 - Myth 6: Apache Spark Is the Future of Hadoop

As usual, the new, sexy young object is always the most alluring. Apache Spark is currently one of those: It is a fast and general engine for large-scale, clustered data processing. However, rest assured, another will come along and take its place as the hottest thing on the market. What people often forget is that old reliable is old and reliable for a reason, as it usually has the breadth and depth needed to move your big data project forward. Resist the urge to move to the latest; if it ain’t broke, don’t fix it. Stick with what you know.

Myth 7: Big Data Is Only for the Largest of Enterprises

8 - Myth 7: Big Data Is Only for the Largest of Enterprises

The “big” in big data is misleading. Everyone—including organizations large and small—is in the business of data. Sure, large enterprises collect massive amounts of data, but the abundance of data that small enterprises can collect and leverage for competitive advantage also can be immense. Just because your data may be small in volume does not mean you shouldn’t have a data strategy in place.

Myth 8: Big Data Is for Hadoop Experts

9 - Myth 8: Big Data Is for Hadoop Experts

Enterprises today are rapidly adopting Hadoop to process, manage and make sense of growing volumes of data, and enterprises are now leveraging existing internal resources to drive their data strategies forward. There are now mature, reliable tools readily available for all software engineers to use to unlock the full potential of big data and Hadoop. As a result, no Hadoop expertise is required.