Database Legend: How Real-Time Data Analysis Will Transform Society

Database pioneer and StreamBase founder Mike Stonebraker foresees vast societal shifts as real-time data analysis changes how we treat the infirm, how we keep our soldiers from harm and even how we locate our wandering bambini at Disneyland.

Mike Stonebraker is a database superstar. Not only is the former UC/Berkeley computer science professor the father of the popular relational databases Ingres and Postgres, he was also the founder of Illustra Information Technologies Inc., acquired by Informix, which in turn was acquired by IBM.

The next project for this database pioneer takes shape in the form of StreamBase Systems Inc., a company thats churning out software designed to process, analyze and act on real-time data "within milliseconds of its arrival." Stonebraker is StreamBases founder and chief technology officer.

StreamBase announced its Stream Processing Engine at the DEMOConference on Monday in Scottsdale, Ariz. eWEEK.com Database Editor Lisa Vaas recently got a chance to talk with Stonebraker about the issue of real-time data analysis, about how it leaves relational databases in its dust and, most importantly, how this cutting-edge technology is poised to transform our society. Financial services comes to mind, of course, but what really fires up Stonebraker are prospects like revolutionizing the care of emergency-room patients, the care of soldiers on the front lines or simply the ability to find your child when shes lost at Disney World.

Youve said that streaming data on the fly is something that ordinary relational databases cant handle. Why?

Heres a quick, simple little problem. This was a pilot we were asked to do early on. [It was] a large, mutual funds company. They subscribe to every feed on the planet, [including feeds such as Reuters]. They have a current application that watches each feed to determine if the data is late, so they can say, "Dont trust Reuters now, the feed is screwed up."

They defined "late" as [when the] inter-arrival time of ticks between the same stocks is greater than a certain number. You see an IBM tick, and if you dont see another IBM tick in x seconds, its an indication of late data.

They wanted to issue an alarm if you saw a late tick. Then they wanted to say, "If you see 100 late ticks that are coming from the feed vendor, then ring the red telephone."

The current application is written on top of bare metal in C++. They were unhappy with the performance of the current application, and it was hard to maintain. And expensive.

/zimages/5/28571.gifIn addition to StreamBases real-time data analysis technology, DEMO showgoers were treated to peeks at more support for mobility from emerging technology companies. Read more here.

On this application, they said, "How fast can you go?" We processed about 150,000 messages per second on this, on a $1,500 PC, a commodity piece of hardware. Their current production application does about 3,000 messages per second. The best we could get out of one of the very popular relational databases was 900 messages per second.

Next Page: Elephants store data.