New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

1 - New Big Data and Hadoop Projects: 10 Tips for Keeping on Track
2 - Figure Out What You're Trying to Solve
3 - Define Your Business Questions
4 - Stay Focused on the Most Important Questions First
5 - Get Help From People Who Know What They're Doing
6 - Know Where Your Data Emanates
7 - Invest in Understanding the Data
8 - Storing the Data
9 - Processing the Data
10 - Expect Data Corruption and Bad Data in General
11 - Design and Implementation
1 of 11

New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

by Chris Preimesberger

2 of 11

Figure Out What You're Trying to Solve

You can't use your data if you don't know what you want to do with it. With this understanding, you will be able to steer your company in the right direction. Figure this out early and stick to the plan.

3 of 11

Define Your Business Questions

These include questions about the target audience, how best to market to it, how to expand market reach, how to be effective with costs, and how to engage and interact with customers in the most positive way possible. These categories involve varying amounts of data. They are all crucial to discovering what problems do exist so that they can be understood and solved for the betterment of the company.

4 of 11

Stay Focused on the Most Important Questions First

This is not easy because all questions are important in their own right. Prioritize and stay focused. Questions will evolve, and new ones will be added.

5 of 11

Get Help From People Who Know What They're Doing

You'll need a technical expert who knows the ins and outs of the project and how the solution is to be built. If your technical expert isn't well-versed on the business side, get someone who is, someone who knows every aspect of the business model, finances, the products or services, and how everything is tied together.

6 of 11

Know Where Your Data Emanates

If you're using the data for suggestive selling, you're probably drawing on user events, products viewed, click-throughs and site referrals. If you're looking to streamline your supply chain, you almost certainly have data pertaining to raw materials, supplier key performance indicators, bills of lading, warehousing and even driver performance. Knowing this will help you figure out how much data you'll have.

7 of 11

Invest in Understanding the Data

Where is it, and which data is coming from where? The best way to handle this is the process of data profiling. Also, expect schema changes and plan for your system to be able to handle them. If you can identify the problem areas at the beginning, it will be less difficult and take less time to handle them up-front, as opposed to once the system is built.

8 of 11

Storing the Data

Once you know where data is coming from and how much you'll potentially have, you'll have a good idea of how it should be stored. Maybe the data isn't expected to grow all that much, so you don't need something scalable. Perhaps you collect massive amounts of data on a daily basis, so going with something cloud-based for maximum scalability is the way to go.

9 of 11

Processing the Data

What's being analyzed? Structured data such as log files, semi-structured emails or tweets, or unstructured data, such as satellite feeds, or all of the above? If you're going with the first option, good old SQL Server might be what the doctor ordered, but if you need to process at least one other variety of data, Hadoop might be the most effective solution.

10 of 11

Expect Data Corruption and Bad Data in General

Whether it's due to human error or bugs, you will have bad data. Plan for this up-front; it will save headaches in the long run. Look closely at de-duplication, data-combers and other quality-assurance software.

11 of 11

Design and Implementation

This is often a major stumbling block. Personnel or financial decisions will have to be made. With Hadoop, for example, if you have the trained manpower to spare, it will cost less than if you have to contract with someone to build it. If no one possesses the required skill set, they'll need to learn it. But if pulling programmers away from their current tasks and spending a lot on training or a contractor is not an option, a software-as-a-service (SaaS) subscription platform may be the best alternative.

Top White Papers and Webcasts