If you are going to implement big data in your enterprise, start with the applications. While that is good advice for any tech or business exec considering where to invest his or her technology dollar, that advice is particularly trenchant in the world of big data.
I attended the GigaOM Structure Data conference in New York in March, and starting with the applications is my first piece of advice after being enveloped in a Hadoop, Cloudera, Amazon Web Services and Splunk universe of acronyms and new buzzwords. I've got nine other items you need to include on your tech agenda.
1. Start with the applications
While big data is a hot term, it doesn't mean much unless you can use it to build your company's bottom line. What new applications could your company create if you could meld information outside your company's confines with your traditional internal product development process? Here are few examples.
- Spit. I'll credit Jack Norris, vice president of marketing at MapR Technologies with highlighting the efforts of Ancestry.com and 23andMe to add DNA results to the genealogy research company. Spit in a tube, mail it away and get the DNA ancestry results.
- Big data and health evaluation. Aetna's head of innovation Michael Palmer is in the forefront of health care reform that attempts to use big data to discover potential health problems before they become expensive, life-threatening events. Aetna and other health organizations have to tread a fine line between developing patient profiles, assuring privacy and getting people to act on the findings at an early stage. Palmer's project involves using patient data to develop new ways of preventing diabetes and heart attacks.
The types of applications that big data enables are potentially company-altering. But you need a company structure that can think in new ways and encourage innovation before you get mired in a technology evaluation. Create the new application environment and then gather the tools to make it happen.
2. Think physical
Despite all the talk about the "Internet of Things," and the Google self-driving car, the physical world remains largely disconnected from the digital world. We still rely on passwords when biometric measures would be much better at assuring security. Did you realize your gait, the way you walk, may be the most secure identifier? I didn't either. But Ira Hunt, the CTO of the CIA, included that tidbit in a far-ranging discussion of big data, analytics and the difference between data and analysis. In any case, whether it is spit, driverless cars or smart cities, the action around big data is unfolding in big part where the physical and digital worlds intermix. Once the connection is made, the reverse is also true. Setting up data networks still involves a lot of manual effort and human intervention while software-defined networks at least hold the promise of allowing the network to keep up with real-time demand.
3. Go simple, but big
"Simple algorithms and lots of data trump complex models," said MapR's Norris. This may be the biggest story in big data. Creating simple queries over huge data sets can provide insights much more quickly and with more accuracy than trying to create sophisticated algorithms narrowed toward smaller samples. The scene completion process for Google's Street View (which removes offensive or embarrassing images and "fills in" the scene) went from using a complicated formula over about 150,000 photos to a simple formula, but with more than 1 million photos with vastly superior results, said Norris. The same process could apply to financial services, customer sentiment, weather forecasting or anywhere big data sets could be combined with a simple query process.