At the GigaOM Structure "big data" conference, DataStax announced Brisk, a new big data platform for low-latency apps and Hadoop analytics.
NEW YORK - At
Big Data conference
the commercial sponsor of Apache Cassandra
, unveiled Brisk, a new
distribution that enhances the Hadoop and Hive platform with scalable
low-latency data capabilities.
the Structure Big Data event here on March 23 as the platform to launch its new
solution for low-latency applications and Hadoop and Hive analytics.
interview with eWEEK at the Structure event, Matt Pfeil, CEO and co-founder of
DataStax, said the Brisk platform can act as the low-latency database for
extremely high-volume Web and real-time applications while providing tightly
coupled Hadoop and Hive analytics. The Structure Big Data conference
enabled the big data community to discuss the best technologies for managing
and harnessing ever-increasing volumes of data.
of -big data' is twofold," Pfeil said in a statement. "The analytical side is
well-understood and served by Hadoop and Hive. However, we live in a real-time
world and the ability for applications to interact with big data at low-latency
is equally important. Apache Cassandra was bred for big data, real-time
scenarios, and using it to power Apache Hive and Apache Hadoop gives users a
single solution that serves both needs."
Brisk is an enhanced open-source Hadoop and Hive distribution that uses
Cassandra for many of its core services, Pfeil said. Brisk provides integrated
Hadoop MapReduce, Hive and job and task tracking capabilities, while providing
a Hadoop Distributed File System compatible storage layer powered by Cassandra.
It also exposes the full power of Cassandra for real-time applications. The
result is a single integrated solution that provides increased reliability, simpler
deployment and lower TCO than traditional Hadoop solutions.
A key benefit
of DataStax' Brisk is the tight feedback loop it allows between a real-time
application and the analytics that follow. Traditionally, users would be forced
to move data between systems via complex extract, transform and load processes,
or perform both functions on the same system with the risk of one impacting the
other. DataStax' Brisk, a new Hadoop and Hive distribution, will be available
under Apache open-source license within 45 days of this announcement.
the power of Cassandra-including its simplicity, scalability and speedy
reads/writes-to Hadoop, DataStax has created a powerful system that speeds up
the time between data creation and analysis." Tim Estes, CEO of Digital
Reasoning, said in a statement. "We can count on some of Cassandra's
unique capabilities to aid projects that have multiple data center locations,
and large and complex bulk ingest demands. We've been thrilled to work with the
DataStax team to push its capabilities to some of the most demanding customers-particularly
in the Defense and Intelligence Community."
vice president of marketing at DataStax, explained some key uses of Brisk:
Websites-Provide real-time data access and storage for millions of
simultaneous users. Directly perform Hive analysis on the latest data, and
immediately feed analytic insights back into the application behavior.
real-time summaries and aggregates to allow a continuously up-to-date view
of important business metrics. Send alerts when anomalies occur.
event processing-Track and react instantly to millions of sensors or other
distributed feeds, while allowing deeper analytic questions to be asked of
the historical data at any moment.
and capital markets-Process, store and trigger actions based on a
high-volume real-time event stream. Perform analytics on historical data,
and update models directly into the application.
Cassandra Project develops a highly scalable second-generation distributed
database, bringing together Dynamo's
fully distributed design and Bigtable's
ColumnFamily-based data model. Cassandra was open-sourced by Facebook in 2008
, and is now
developed by Apache committers
and contributors from many companies
. Cassandra is in
use at Digg
, Cisco, SimpleGeo, Ooyala, OpenX and more companies
that have large, active data sets.
The largest production cluster has over 100TB of data in over 150 machines.
"Not much else
can compete with Cassandra in terms of performance," Pfeil said.
Pfeil said he
and a former colleague from Rackspace decided to leave the hosting company to
create a startup around Cassandra after having worked with Cassandra at
to support the open-source project," Pfeil said. "We employ 80 percent of the
people working on it. And we'll continue to build products that help users use
Cassandra more easily and effectively."
value of DataStax' Brisk and Cassandra, Weir said, "It would be as if Watson
was not just taking cues from its vast knowledge base, but was also taking in
all the other variables around him, like the other players and how they're
playing, and assessing all of that in real time. You can run the real-time
processing and the analytics at the same time. We're bridging that gap between
real time and analytics."
keying in on the emerging importance of technologies like Cassandra, Pfeil said,
Rackspace, is "storing a large amount of really small files on commodity
hardware, and we had to expect failure to happen, so we had to find ways to
scale horizontally. You don't need a supercomputer to make everything work
anymore; you can use cheap commodity computers."
Cassandra, data is automatically replicated to multiple nodes for fault tolerance.
Replication across multiple data centers is supported. Failed nodes can be
replaced with no downtime
"The tide has
turned," Pfeil said. "Big data for enterprises used to be a problem; now,
it's an opportunity."