Splunk Brings Google-Like Search to the Data Center

 
 
By Lisa Vaas  |  Posted 2005-08-09
 
 
 
A band of search gurus on Monday announced what they claim is the first search engine designed for rifling through machine code in order to troubleshoot the data center.

The San Francisco startup, named Splunk, brings together talent culled from Infoseek, SourceForget.net and Yahoo Inc.

CEO Michael Baum—former vice president of e-commerce at InfoSeek—said that the creation of the Splunk Personal Server was inspired by the hassles of building and managing data center environments.

"We discovered the problem has moved from managing the physical hardware to managing the logical infrastructure: how all these different components of software, whether in the firewall, router or database server, are all talking to each other," he said.

The startup team has spent the past three years studying the problem and bringing their collective experience at building large-scale search engines for the Internet to the problem of applying search technology to this very different domain.

"At Infoseek and Yahoo, the problem was all about HTML documents, all about, How can I bring sub-second query response times to tens of millions of users at the same time?" Baum said.

Given that Internet searches take place in one language at a time, that challenge was a piece of cake compared with the difficulty of indexing and searching machine-generated data in the form of log files or traffic coming over a messaging bus, for example, Baum said.

In a typical data center, troubleshooting means poring over machine data from a wide array of machine languages, usually with nothing in common except the time and date.

"IDC and Gartner talk about $100 billion being spent in 2005 in managing the worlds data centers," Baum said. "Most is spent on manual troubleshooting."

Baum compared the current state of troubleshooting to using the Internet without a search engine. "Fifteen years ago, when the Internet was used by just technical people, they used to look through directories of files on machines to find the document they were interested in," he said. "And theyd do it by hand. Once they found the document they were interested in, you find it, find a couple of interesting concepts, but there were no documents linked. You have to go find related documents. Thats literally what people have to do in the data center. Its a very painful process that takes a lot of time and takes a lot of experts from different domains."

An example of the type of scenario in a data center thats a pain to troubleshoot is when a VoIP (Voice Over IP) stack runs phone systems in a call center. That stack will connect to some type of CRM (customer relationship management) application, say Siebel or SAP or PeopleSoft. When the stack hits a call, it records call routing data, hits the CRM, which looks up the number, and then executes a call-routing function.

The single transaction of a customer calling in to get someone on the phone to work on a problem involves leaving a trail of evidence in the data center: within the VoIP stack, the call center and the CRM applications. To understand the complete transaction and why it wasnt completed or why performance was poor, data must be pieced together from all events.

Thats a manual process today. Splunk is designed to allow users to start with, say, the customers phone number. After typing it in, the search technology will find the first event in the VoIP infrastructure. Like a Google search, Splunk returns a list of links to similar or related events. Through advanced data mining techniques, users can click and search to navigate their way through the complex maze of the data center.

IBM open-sources new search technology. Click here to read more.

Splunk handles machine-generated data from a list of sources that includes Web servers, application servers, e-mail servers, databases and network devices. Users access the Splunk Servers AJAX interface through a Web browser. Once data is indexed, users can navigate through systems using time and keyword searches or can navigate the links of event relationships by pointing and clicking on Splunks results.

Twenty-five companies in a range of industries have been beta testing Splunk since June. Splunk is now available for free public beta download.

The founders are also planning to launch sometime this week SplunkForge.net, a community site for open-source development of services around the Splunk server kernel. The site will already have been seeded with modules to extend the server with services such as making the search faster or managing and dealing with large, distributed infrastructures.

Check out eWEEK.coms for the latest database news, reviews and analysis.

Rocket Fuel