Startup Brings Data Analytics Platform to MySpace
Three years after its founding, Aster Data Systems has emerged from stealth mode with an Internet-scale, massively parallel processing database and a well-known customer-MySpace.
Dubbed Aster nCluster, the product is aimed at overcoming the price and performance challenges of large-scale data warehouses. At the core of the nCluster architecture are algorithms and processes that control the placement, partitioning, balancing, replication and querying across clusters of intelligent nodes.
The data partitioning algorithm, called POD (Performance Optimized Dimensional) Partitioning, attempts to maximize the locality of queries by doing intelligent placement in advance. So when a query hits a node, it finds as much data locally as possible, explained Aster Chief Technology Officer Tasso Argyros. Another key innovation is an algorithm referred to as Precision Scaling, which provides for incremental scaling of new hardware nodes, ensuring that each addition of a commodity node will proportionally speed up the workload, he said.
"Once a new node is powered on in the cluster, Aster nCluster takes over the entire process of imaging the node, migrating data over to the new node from an overloaded node, constructing the requisite indexes on the newly gained data, and creating the requisite backups," Argyros said.
Out of the gate, Aster Data counts MySpace as one of its customers. According to Aster Data, the social network has more than 100 nodes and the capability of loading millions of rows per second. With nCluster, MySpace can use powerful analytic extensions to simplify and speed analysis, he said.
"At MySpace, Aster is used to collect all page view data directly from the Web servers," Argyros said. "As you can imagine, MySpace needed to understand traffic on their site intraday, requiring terabytes of data to be loaded in their data warehouse on a daily basis. More importantly, they needed complex queries to quickly return results."
Aster Data was founded in 2005 by three Ph.D. students from the Computer Science Department at Stanford University. The company is positioning itself as provider of MPP (massively parallel processing) databases for organizations that need to store and analyze large amounts of data.
"Our software then makes it very easy and reliable to manage these servers by handling issues such as scale-up, replication and failures, and enables queries on the cluster to perform superfast ... by optimizing for the right system bottlenecks," Argyros said.