One ant on the tablecloth would not spoil a picnic, but a thousand ants would.
Similarly, a single log file can be easily stored and searched. But businesses in todays Internet age are practically flooded with such an abundance of log file data that their IT systems can be overwhelmed and the data can lose its usefulness.
Thats where startup Addamark Technologies Inc. steps in. Early next year, the San Francisco-based developer will roll out its LMS (Log Management System) software, which promises to help companies make sense of the mountains of log data accumulating on their systems through patented compression and query technologies.
Logs list requests for files in a system. Ubiquitous in enterprises, they are generated by applications, Web servers, firewalls and switches.
IT managers can use log data to track problems or hackers, test that applications are working properly, and generally get a view into how systems are working.
Marketing people ideally can use logs to see how customers are using a Web site and to identify potential customers. Companies in some industries, such as financial services, are required by regulators to keep log data.
But the massive amounts of log data being generated and the speed at which it accumulates pose a challenge to understanding it.
In addition, with so many different sources creating log data, IT departments that want to correlate it can be thwarted because the logs may be stored in formats that cant be queried in a unified way.
AtomShockwave Corp. uses an early version of Addamarks LMS to track activity on its Shockwave. com and AtomFilms.com Web sites, which together can get up to 1 million unique visitors and produce as much as 30GB of log data a day.
“LMS compresses our logs and generates a range of statistical reports that allow us to track user activity on the sites,” said Scott Roesch, vice president and general manager of online services at AtomShockwave, also in San Francisco. “Addamarks major innovation has to do with the way they handle compression of log files, which allows the files to be stored efficiently and processed very quickly.”
The twist that LMS adds to traditional compression is that it time-stamps each log before compressing it to a size that is 15 to 40 times smaller than it would be in a relational database.
When a user wants to query that log, the system doesnt uncompress the entire log file, as conventional storage technology would, but uses the time stamp to find just the data needed for that query and uncompresses that part of the log, according to Addamark CEO Mark Searle.
“As more aspects of business are conducted electronically, log files grow immensely; simultaneously, as log files grow, the data is becoming more important,” Searle said. “People have resigned themselves to doing things that are suboptimal, like taking samples of the [log] data” instead of querying large blocks of it.
The problem with looking at a sample of log data is that a company tracking a security breach cant prove that something didnt happen. Nor can it drill back into the data to do a forensic analysis to trace the steps of a hacker, Searle said.
Typically, the latest log files are kept in live databases before they are compressed and archived. Once theyre archived, though, their value diminishes because they cant be easily queried.
Nielsen/NetRatings Inc. keeps about three months worth of raw log data—or about 600GB—live in an Oracle Corp. database, but Senior Director of Operations Kelley Wood wants to keep more online. Nielsen/NetRatings has numerous panelists around the world who send it data as they surf the Web.
This data is captured and stored on the Milpitas, Calif., companys servers in the form of logs in a proprietary format and design, where it is used to create the reports the company sells.
“The biggest problem in this process is huge data volumes and just being able to handle it,” Wood said. “We do manage to move the log files through our system and get them stored onto tapes for archive purposes, but keeping them online and available for query has always been an issue.”
Wood has begun testing LMS to see if it allows him to keep more log files active in his Oracle database and thus increase the depth of the reports Nielsen/ NetRatings analysts produce.
“Initial numbers show that I can have nine to 12 months of data in LMS, and thats on a very inexpensive cluster of machines,” Wood said. “If I throw some hardware at this, I will be able to have a lot of data stored for a small investment.”
To aid the mining of log data, LMS includes a tool that uses parallel extended SQL to query compressed data. It includes pre-formatted reports and can create new ones.
“The query tools look very good,” Wood said. “We have a few canned reports that they have written for us to do some basics. Because the query needs for this data will be 100 percent ad hoc, well be developing the queries ourselves. The SQL interface should be fine. If we decide to do more and go further, our engineers will be able to expand the interface and/or work with Addamark to do so.”