New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

By Chris Preimesberger  |  Posted 2013-12-09 Print this article Print

With businesses around the world now using more cloud services and instituting new big data analytics-driven ecosystems, it's important for IT managers and C-level execs to keep informed about all of these advancements. Those who aren't up to speed risk being left behind, and that means potentially losing customers to companies that already have adapted. It is the most basic law of the business food chain: adapt or get eaten. It's advantageous to implement an IT system that will enable an enterprise to make use of the data it collects as soon as it comes into storage. Obviously, this is easier said than done because plenty of things need to be taken into consideration before building a new system or remodeling an older one. Management also will require that this system runs at peak performance and that the company gets a positive return on its business investment. The following eWEEK slide show, with input from Hadoop-as-a-cloud service provider Xplenty, offers advice for businesses starting to build an optimal solution that includes big data analytics elements.

  • New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

    by Chris Preimesberger
    1 - New Big Data and Hadoop Projects: 10 Tips for Keeping on Track
  • Figure Out What You're Trying to Solve

    You can't use your data if you don't know what you want to do with it. With this understanding, you will be able to steer your company in the right direction. Figure this out early and stick to the plan.
    2 - Figure Out What You're Trying to Solve
  • Define Your Business Questions

    These include questions about the target audience, how best to market to it, how to expand market reach, how to be effective with costs, and how to engage and interact with customers in the most positive way possible. These categories involve varying amounts of data. They are all crucial to discovering what problems do exist so that they can be understood and solved for the betterment of the company.
    3 - Define Your Business Questions
  • Stay Focused on the Most Important Questions First

    This is not easy because all questions are important in their own right. Prioritize and stay focused. Questions will evolve, and new ones will be added.
    4 - Stay Focused on the Most Important Questions First
  • Get Help From People Who Know What They're Doing

    You'll need a technical expert who knows the ins and outs of the project and how the solution is to be built. If your technical expert isn't well-versed on the business side, get someone who is, someone who knows every aspect of the business model, finances, the products or services, and how everything is tied together.
    5 - Get Help From People Who Know What They're Doing
  • Know Where Your Data Emanates

    If you're using the data for suggestive selling, you're probably drawing on user events, products viewed, click-throughs and site referrals. If you're looking to streamline your supply chain, you almost certainly have data pertaining to raw materials, supplier key performance indicators, bills of lading, warehousing and even driver performance. Knowing this will help you figure out how much data you'll have.
    6 - Know Where Your Data Emanates
  • Invest in Understanding the Data

    Where is it, and which data is coming from where? The best way to handle this is the process of data profiling. Also, expect schema changes and plan for your system to be able to handle them. If you can identify the problem areas at the beginning, it will be less difficult and take less time to handle them up-front, as opposed to once the system is built.
    7 - Invest in Understanding the Data
  • Storing the Data

    Once you know where data is coming from and how much you'll potentially have, you'll have a good idea of how it should be stored. Maybe the data isn't expected to grow all that much, so you don't need something scalable. Perhaps you collect massive amounts of data on a daily basis, so going with something cloud-based for maximum scalability is the way to go.
    8 - Storing the Data
  • Processing the Data

    What's being analyzed? Structured data such as log files, semi-structured emails or tweets, or unstructured data, such as satellite feeds, or all of the above? If you're going with the first option, good old SQL Server might be what the doctor ordered, but if you need to process at least one other variety of data, Hadoop might be the most effective solution.
    9 - Processing the Data
  • Expect Data Corruption and Bad Data in General

    Whether it's due to human error or bugs, you will have bad data. Plan for this up-front; it will save headaches in the long run. Look closely at de-duplication, data-combers and other quality-assurance software.
    10 - Expect Data Corruption and Bad Data in General
  • Design and Implementation

    This is often a major stumbling block. Personnel or financial decisions will have to be made. With Hadoop, for example, if you have the trained manpower to spare, it will cost less than if you have to contract with someone to build it. If no one possesses the required skill set, they'll need to learn it. But if pulling programmers away from their current tasks and spending a lot on training or a contractor is not an option, a software-as-a-service (SaaS) subscription platform may be the best alternative.
    11 - Design and Implementation

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel