Startup Aster Data Systems' database can turn commodity hardware into a high-performance analytic engine.Three years after its founding, Aster Data Systems has emerged from
stealth mode with an Internet-scale, massively parallel processing
database and a well-known customer—MySpace.
Dubbed Aster nCluster, the product is aimed at overcoming the price and
performance challenges of large-scale data warehouses. At the core of the
nCluster architecture are algorithms and processes that control the placement,
partitioning, balancing, replication and querying across clusters of
intelligent nodes.
The data partitioning algorithm, called POD (Performance Optimized
Dimensional) Partitioning, attempts to maximize the locality of queries by
doing intelligent placement in advance. So when a query hits a node, it finds
as much data locally as possible, explained Aster Chief Technology Officer
Tasso Argyros. Another key innovation is an algorithm referred to as Precision
Scaling, which provides for incremental scaling of new hardware nodes, ensuring
that each addition of a commodity node will proportionally speed up the
workload, he said.
"Once a new node is powered on in the cluster, Aster nCluster takes
over the entire process of imaging the node, migrating data over to the new
node from an overloaded node, constructing the requisite indexes on the newly
gained data, and creating the requisite backups," Argyros said.
Click here to read about why on-demand business intelligence can pay off—for some.
Out of the gate, Aster Data counts MySpace as one of its customers.
According to Aster Data, the social network has more than 100 nodes and the
capability of loading millions of rows per second. With nCluster, MySpace can
use powerful analytic extensions to simplify and speed analysis, he said.
"At MySpace, Aster is used to collect all page view data directly from
the Web servers," Argyros said. "As you can imagine, MySpace needed
to understand traffic on their site intraday, requiring terabytes of data to be
loaded in their data warehouse on a daily basis. More importantly, they needed
complex queries to quickly return results."
Aster Data was founded in 2005 by three Ph.D. students from the Computer
Science Department at Stanford University.
The company is positioning itself as provider of MPP (massively parallel
processing) databases for organizations that need to store and analyze large
amounts of data.
"Our software then makes it very easy and
reliable to manage these servers by handling issues such as scale-up,
replication and failures, and enables queries on the cluster to perform superfast
… by optimizing for the right system bottlenecks," Argyros said.