Database Legend: How Real-Time Data Analysis Will Transform Society - Page 2
In round numbers, were two orders of magnitude faster than the elephants. And the two orders of magnitude are on identical hardware. If you normalize for clock speed of our production application vs. theirs, were one order of magnitude faster. What accounts for this speed gain? There are three big reasons: One, the elephants store the data. Theres no need to store the data. One of the characteristics of real-time, streaming data, its like IT sushi. It has high value right now, and the value decays very quickly. Theres no need to keep the data around for the long term in some sort of repository. That just takes up time, latency and resources to do that.[Finally,] if you want to count to 100, which is what this [application] had to do in order to decide to ring the red phone, the most efficient way to do that is with four lines of C++. In this application, it makes sense to mix small amounts of code in a general-purpose environment with database-oriented processing steps. We can do that in our architecture: freely intermix C++ with our StreamSQL primitives. The relational guys all run client/server, and C++ code has to run in the client in a separate place from the server. So the client/server architecture slows you down on this style of application. What types of enterprises need this type of fast analysis? Financial services, industrial process control, monitoring oil refineries, the government: Military and homeland security is full of this style of application. Weve been talking to one of the three-letter agencies. The guys who wont give you their business cards. Theyre monitoring Arabic chatter. When the czar of homeland security says, "The chatter has changed," theres a real-time system processing incoming feeds, computing statistics on incoming Arabic language streams, to actually determine that. They started yakking with us on piloting that application. Another example: network monitoring, for DOS [denial of service] attacks. Fraud detection. Next Page: Financial firms seek to thwart identity theft.
Reason No. 2 is when youre looking for the inter-arrival time between ticks, thats a time-series notion. When youre doing real-time stream processing, we have time-oriented primitives in the bottom of the screen. We have extended SQL to something we call StreamSQL, which has extra stuff in it. Weve had to add another notion to SQL, the notion of time windows. You can do SQL-like calculations over time windows. Do them in real time as data is flying by.