If you are going to implement big data in your enterprise, start with the applications. While that is good advice for any tech or business exec considering where to invest his or her technology dollar, that advice is particularly trenchant in the world of big data.
I attended the GigaOM Structure Data conference in New York in March, and starting with the applications is my first piece of advice after being enveloped in a Hadoop, Cloudera, Amazon Web Services and Splunk universe of acronyms and new buzzwords. I’ve got nine other items you need to include on your tech agenda.
1. Start with the applications
While big data is a hot term, it doesn’t mean much unless you can use it to build your company’s bottom line. What new applications could your company create if you could meld information outside your company’s confines with your traditional internal product development process? Here are few examples.
- Spit. I’ll credit Jack Norris, vice president of marketing at MapR Technologies with highlighting the efforts of Ancestry.com and 23andMe to add DNA results to the genealogy research company. Spit in a tube, mail it away and get the DNA ancestry results.
- Big data and health evaluation. Aetna’s head of innovation Michael Palmer is in the forefront of health care reform that attempts to use big data to discover potential health problems before they become expensive, life-threatening events. Aetna and other health organizations have to tread a fine line between developing patient profiles, assuring privacy and getting people to act on the findings at an early stage. Palmer’s project involves using patient data to develop new ways of preventing diabetes and heart attacks.
The types of applications that big data enables are potentially company-altering. But you need a company structure that can think in new ways and encourage innovation before you get mired in a technology evaluation. Create the new application environment and then gather the tools to make it happen.
2. Think physical
Despite all the talk about the “Internet of Things,” and the Google self-driving car, the physical world remains largely disconnected from the digital world. We still rely on passwords when biometric measures would be much better at assuring security. Did you realize your gait, the way you walk, may be the most secure identifier? I didn’t either. But Ira Hunt, the CTO of the CIA, included that tidbit in a far-ranging discussion of big data, analytics and the difference between data and analysis. In any case, whether it is spit, driverless cars or smart cities, the action around big data is unfolding in big part where the physical and digital worlds intermix. Once the connection is made, the reverse is also true. Setting up data networks still involves a lot of manual effort and human intervention while software-defined networks at least hold the promise of allowing the network to keep up with real-time demand.
3. Go simple, but big
“Simple algorithms and lots of data trump complex models,” said MapR’s Norris. This may be the biggest story in big data. Creating simple queries over huge data sets can provide insights much more quickly and with more accuracy than trying to create sophisticated algorithms narrowed toward smaller samples. The scene completion process for Google’s Street View (which removes offensive or embarrassing images and “fills in” the scene) went from using a complicated formula over about 150,000 photos to a simple formula, but with more than 1 million photos with vastly superior results, said Norris. The same process could apply to financial services, customer sentiment, weather forecasting or anywhere big data sets could be combined with a simple query process.
10 Big Data Trends From the GigaOM Structure Data Conference
4. The Internet of Lots of Things
This gets to the essence of big data which, aside from being pretentiously given the initial capitals of a proper noun, can really mean anything or nothing. The scale of the technology infrastructures developed by the likes of Google, Facebook and Amazon Web Services is a new model of computing that extends from hardware, through software through those new applications. Hardware engineers have to reconsider what their infrastructure looks like based on vast arrays of disposable hardware, software systems are built around interconnecting in-house applications with outside services, and the apps themselves may be unknown until the final development process is complete. The business potential from combining and analyzing customer sentiment, demographic patterns and weather trends won’t become apparent until those combinations take place. Making the transition from a technology environment of scarcity to overwhelming capacity may be the most difficult transition for today’s chief information officers.
5. The emerging platform
Hadoop is “damn hard to use,” said Todd Papaioannou, the founder of Continuuity and former big data engineer at Yahoo, who was in charge of developing 45,000 Hadoop servers within Yahoo’s 400,000-node private cloud. As he told the GigaOM attendees, “Hadoop is hard—let’s make no bones about it. It’s damn hard to use. It’s low-level infrastructure software, and most people out there are not used to using low-level infrastructure software.” That is both the promise and problem with Hadoop and big data. Hadoop was an outgrowth of efforts by Google to create a framework for data-intensive and distributed applications. Big data is more concept than product. In between the two is a need for programming tools, infrastructure management and enterprise-level security and compliance—in short all the elements needed to create scalable, flexible and secure computing infrastructures. This new Internet-style computing model is revolutionary and new. The platform is emerging, and executives need to realize that as with all emerging platforms, lots of platform needs are still unmet.
6. Making the Big Shift
While Hadoop, Hdapt, Alteryx and other startups were touting their paradigm-busting, chasm-hopping, shark-jumping companies, not all was quiet on Wall Street. Oracle missed on its financial results and sent shivers throughout the established tech industry. Tibco, the company somewhat synonymous with enterprise infrastructure, missed big. And Dell saw its best-laid plans to go private, tossed into the competing-bids game that may not include its namesake founder. So, as I sat at the Chelsea Piers at the GigaOM conference, it was impossible not to think that a big shift from up-front, long-deployment, high service cost enterprise software to rapid, outside-in big data running on disposable, inexpensive hardware will upend the tech industry.
7. Discerning the signal versus noise
The signal versus noise theme was best outlined by CIA CTO Ira Hunt. The current thinking around big data and business intelligence tends to be built around a very simplistic model. You acquire lots of data in lots of formats from lots of sources, apply some business intelligence and voila you get your answer. As Hunt pointed out, the volume of information available continues to increase from everything from social networks to sensors and manipulating that data has become an art in itself. In my opinion, the number of people who might be described as data analysis artists is very small, and the executives thinking that a big data dive is all they need to reform their business are mistaken.
10 Big Data Trends From the GigaOM Structure Data Conference
8. Dealing with a new model of application development
If the term big data is a concept in search of meaning, a new model for business application development holds meaning, which can make or break a company. Lengthy business requirements development, lengthy iterations and the ongoing divide between application developer and infrastructure operations teams has sent many grand software projects to a confused and costly death. “We now live in a world of disposable apps,” Aetna’s head of innovation Michael Palmer said to me in an interview. Instead of a lengthy and expensive development process, use several companies to develop a simple app and pick the one you like the best. Once you have found the best app, go on to iterating on the next app. The model is much more like trying out apps from an Apple or Android app store than the old enterprise model. This is a big change in enterprise application development and includes knowing as much about how to meld apps through API management as the actual app creation.
9. Applying new rules
Boiling down two days of data discussion is not easy. But I’d start with these. While big data might be getting ahead of itself in enterprise promises, it is real in bringing new capabilities to business. You need to think about the skills you have in your company and developing the data skills to adapt to this new model. Open source, which often has a bit of a fringe reputation in the enterprise, will be part of your technology future. Established vendors are going to promise they can give you all the capabilities of the startups with added stability, but I haven’t seen any evidence so far. Think about your applications from the outside in, instead of inside out.
10. What’s on the Fringe?
Maybe it isn’t the fringe anymore, but consider how 3D printing will change your design process, how sensor-based data gathering will strain your current networks and how your employees living on mobile smartphones downloading apps from app stores may be just the people you need to build your next technology road map.
Eric Lundquist is a technology analyst at Ziff Brothers Investments, a private investment firm. Lundquist, who was editor-in-chief at eWEEK (previously PC Week) from 1996-2008 authors this article for eWEEK to share his thoughts on technology, products and services. No investment advice is offered in this article. All duties are disclaimed. Lundquist works separately for a private investment firm, which may at any time invest in companies whose products are discussed in this article, and no disclosure of securities transactions will be made.