2Figure Out What You’re Trying to Solve
3Define Your Business Questions
These include questions about the target audience, how best to market to it, how to expand market reach, how to be effective with costs, and how to engage and interact with customers in the most positive way possible. These categories involve varying amounts of data. They are all crucial to discovering what problems do exist so that they can be understood and solved for the betterment of the company.
4Stay Focused on the Most Important Questions First
5Get Help From People Who Know What They’re Doing
You’ll need a technical expert who knows the ins and outs of the project and how the solution is to be built. If your technical expert isn’t well-versed on the business side, get someone who is, someone who knows every aspect of the business model, finances, the products or services, and how everything is tied together.
6Know Where Your Data Emanates
If you’re using the data for suggestive selling, you’re probably drawing on user events, products viewed, click-throughs and site referrals. If you’re looking to streamline your supply chain, you almost certainly have data pertaining to raw materials, supplier key performance indicators, bills of lading, warehousing and even driver performance. Knowing this will help you figure out how much data you’ll have.
7Invest in Understanding the Data
Where is it, and which data is coming from where? The best way to handle this is the process of data profiling. Also, expect schema changes and plan for your system to be able to handle them. If you can identify the problem areas at the beginning, it will be less difficult and take less time to handle them up-front, as opposed to once the system is built.
8Storing the Data
Once you know where data is coming from and how much you’ll potentially have, you’ll have a good idea of how it should be stored. Maybe the data isn’t expected to grow all that much, so you don’t need something scalable. Perhaps you collect massive amounts of data on a daily basis, so going with something cloud-based for maximum scalability is the way to go.
9Processing the Data
What’s being analyzed? Structured data such as log files, semi-structured emails or tweets, or unstructured data, such as satellite feeds, or all of the above? If you’re going with the first option, good old SQL Server might be what the doctor ordered, but if you need to process at least one other variety of data, Hadoop might be the most effective solution.
10Expect Data Corruption and Bad Data in General
11Design and Implementation
This is often a major stumbling block. Personnel or financial decisions will have to be made. With Hadoop, for example, if you have the trained manpower to spare, it will cost less than if you have to contract with someone to build it. If no one possesses the required skill set, they’ll need to learn it. But if pulling programmers away from their current tasks and spending a lot on training or a contractor is not an option, a software-as-a-service (SaaS) subscription platform may be the best alternative.