NEWS ANALYSIS: Enterprise data architects are turning their attention to developing hybrid databases that can process structured and unstructured inputs for big data analytics.
Hybrid hardware infrastructures are a big topic in the enterprise technology segment. CIOs and tech managers want to meld their internal capabilities with cloud computing in a mix that preserves the old investments and creates a path to the new. The same thing is happening in the data and business analysis segments.
I spent a couple days at the Big Data Innovation
Summit in Boston. While the concepts of unstructured data, Hadoop and social data were much discussed, the main point of discussion was how to create a hybrid enterprise data structure meshing traditional structured data usually stored in a data warehouse with unstructured data derived from a wide variety of sources. The data mesh-up issue was much discussed because it is the real priority for enterprise technology.
The traditional data stores of structured data often in the form of business transactions are sitting in a data warehouse and accessed by SQL. This is the realm of Oracle, IBM and Microsoft, where the data is the central repository of the company’s customer transactions, inventory and anything else stored in rows and columns. It is that data that has been the object of warehousing, cleansing and querying through business analytics. There is little reason to dismantle those data warehouses.
There are many reasons to capture the click streams, social interactions, sentiment and multimedia that are generated outside the corporation but are integral to the company’s well-being and future. That data does not lend itself to traditional capture, cleansing and storage.
That data belongs to the world of Hadoop, HBase, NoSQL and all the other flavors of data capture, storage and analysis invented by the Web-facing giants including Google, Yahoo, Facebook and Twitter. The scale of storage is huge, the query techniques are different (you often don’t know what you are looking for until the data is captured), and the technology architectures and terminology are unfamiliar to the traditional database world.
“Traditional databases do not go away,” said Sastry Malldi, the chief architect for StubHub. StubHub was acquired by eBay in 2007 for $310 million. The company began as a place to buy and sell event tickets (and was sued by the New England Patriots). But it is evolving into an organization offering the full scope of lodging, transportation and amenities associated with events.
Buying and selling tickets is a classic transactional event, while offering opinion and atmosphere surrounding an event is unstructured and non-transactional. The task of melding the transactional base with the unstructured future is the type of hybrid project facing many enterprises. Malldi is currently dealing with 25 different data sources.
The more unstructured data coming into the company, the more structured you have to become in dealing with all those sources. StubHub uses a four-layer data approach overseen by a data management umbrella. The data and data management reside in eBay’s private infrastructure cloud.
The base layer represents those 25 data sources feeding into the infrastructure. These sources include structured and unstructured data with a goal of creating a platform that can accept data from a wide range of inputs. The second layer of data imports is designed to cleanse data and recognize data dependencies. The third layer is where analytics takes place. The fourth layer is the user-facing layer where e-commerce, advanced analytics and visualization take place.
A more in-depth look at the StubHub infrastructure is available here
(current as of 2012), but dealing with bursty data flows (huge demand right before the event, nothing after), the need to institute payments and fraud detection, and the requirements to deliver tickets in a variety of formats is as complex an enterprise database problem to be found anywhere.
The hybrid database will be the goal of the enterprise data architect for years to come. The creation of the platform that can accept a wide variety of structured and unstructured inputs and produce information friendly to consumers and accessible to business managers will be the foundation for successful companies.
Eric Lundquist is a technology analyst at Ziff Brothers Investments, a private investment firm. Lundquist, who was editor-in-chief at eWEEK (previously PC WEEK) from 1996-2008, authored this article for eWEEK to share his thoughts on technology, products and services. No investment advice is offered in this article. All duties are disclaimed. Lundquist works separately for a private investment firm, which may at any time invest in companies whose products are discussed in this article and no disclosure of securities transactions will be made.