New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

 
 
By Chris Preimesberger  |  Posted 2013-12-09 Print this article Print
 
 
 
 
 
 
 
 
 
  • Previous
    1 - New Big Data and Hadoop Projects: 10 Tips for Keeping on Track
    Next

    New Big Data and Hadoop Projects: 10 Tips for Keeping on Track

    by Chris Preimesberger
  • Previous
    2 - Figure Out What You're Trying to Solve
    Next

    Figure Out What You're Trying to Solve

    You can't use your data if you don't know what you want to do with it. With this understanding, you will be able to steer your company in the right direction. Figure this out early and stick to the plan.
  • Previous
    3 - Define Your Business Questions
    Next

    Define Your Business Questions

    These include questions about the target audience, how best to market to it, how to expand market reach, how to be effective with costs, and how to engage and interact with customers in the most positive way possible. These categories involve varying amounts of data. They are all crucial to discovering what problems do exist so that they can be understood and solved for the betterment of the company.
  • Previous
    4 - Stay Focused on the Most Important Questions First
    Next

    Stay Focused on the Most Important Questions First

    This is not easy because all questions are important in their own right. Prioritize and stay focused. Questions will evolve, and new ones will be added.
  • Previous
    5 - Get Help From People Who Know What They're Doing
    Next

    Get Help From People Who Know What They're Doing

    You'll need a technical expert who knows the ins and outs of the project and how the solution is to be built. If your technical expert isn't well-versed on the business side, get someone who is, someone who knows every aspect of the business model, finances, the products or services, and how everything is tied together.
  • Previous
    6 - Know Where Your Data Emanates
    Next

    Know Where Your Data Emanates

    If you're using the data for suggestive selling, you're probably drawing on user events, products viewed, click-throughs and site referrals. If you're looking to streamline your supply chain, you almost certainly have data pertaining to raw materials, supplier key performance indicators, bills of lading, warehousing and even driver performance. Knowing this will help you figure out how much data you'll have.
  • Previous
    7 - Invest in Understanding the Data
    Next

    Invest in Understanding the Data

    Where is it, and which data is coming from where? The best way to handle this is the process of data profiling. Also, expect schema changes and plan for your system to be able to handle them. If you can identify the problem areas at the beginning, it will be less difficult and take less time to handle them up-front, as opposed to once the system is built.
  • Previous
    8 - Storing the Data
    Next

    Storing the Data

    Once you know where data is coming from and how much you'll potentially have, you'll have a good idea of how it should be stored. Maybe the data isn't expected to grow all that much, so you don't need something scalable. Perhaps you collect massive amounts of data on a daily basis, so going with something cloud-based for maximum scalability is the way to go.
  • Previous
    9 - Processing the Data
    Next

    Processing the Data

    What's being analyzed? Structured data such as log files, semi-structured emails or tweets, or unstructured data, such as satellite feeds, or all of the above? If you're going with the first option, good old SQL Server might be what the doctor ordered, but if you need to process at least one other variety of data, Hadoop might be the most effective solution.
  • Previous
    10 - Expect Data Corruption and Bad Data in General
    Next

    Expect Data Corruption and Bad Data in General

    Whether it's due to human error or bugs, you will have bad data. Plan for this up-front; it will save headaches in the long run. Look closely at de-duplication, data-combers and other quality-assurance software.
  • Previous
    11 - Design and Implementation
    Next

    Design and Implementation

    This is often a major stumbling block. Personnel or financial decisions will have to be made. With Hadoop, for example, if you have the trained manpower to spare, it will cost less than if you have to contract with someone to build it. If no one possesses the required skill set, they'll need to learn it. But if pulling programmers away from their current tasks and spending a lot on training or a contractor is not an option, a software-as-a-service (SaaS) subscription platform may be the best alternative.
 

With businesses around the world now using more cloud services and instituting new big data analytics-driven ecosystems, it's important for IT managers and C-level execs to keep informed about all of these advancements. Those who aren't up to speed risk being left behind, and that means potentially losing customers to companies that already have adapted. It is the most basic law of the business food chain: adapt or get eaten. It's advantageous to implement an IT system that will enable an enterprise to make use of the data it collects as soon as it comes into storage. Obviously, this is easier said than done because plenty of things need to be taken into consideration before building a new system or remodeling an older one. Management also will require that this system runs at peak performance and that the company gets a positive return on its business investment. The following eWEEK slide show, with input from Hadoop-as-a-cloud service provider Xplenty, offers advice for businesses starting to build an optimal solution that includes big data analytics elements.

 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel