Close
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Development
    • Networking
    • Servers

    DataStax Soups Up Hadoop with Apache Cassandra

    By
    Darryl K. Taft
    -
    March 23, 2011
    Share
    Facebook
    Twitter
    Linkedin

      NEW YORK – At the Structure Big Data conference, DataStax, the commercial sponsor of Apache Cassandra, unveiled Brisk, a new distribution that enhances the Hadoop and Hive platform with scalable low-latency data capabilities.

      DataStax used the Structure Big Data event here on March 23 as the platform to launch its new solution for low-latency applications and Hadoop and Hive analytics.

      In an interview with eWEEK at the Structure event, Matt Pfeil, CEO and co-founder of DataStax, said the Brisk platform can act as the low-latency database for extremely high-volume Web and real-time applications while providing tightly coupled Hadoop and Hive analytics. The Structure Big Data conference enabled the big data community to discuss the best technologies for managing and harnessing ever-increasing volumes of data.

      “The challenge of -big data’ is twofold,” Pfeil said in a statement. “The analytical side is well-understood and served by Hadoop and Hive. However, we live in a real-time world and the ability for applications to interact with big data at low-latency is equally important. Apache Cassandra was bred for big data, real-time scenarios, and using it to power Apache Hive and Apache Hadoop gives users a single solution that serves both needs.”

      DataStax’ Brisk is an enhanced open-source Hadoop and Hive distribution that uses Cassandra for many of its core services, Pfeil said. Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking capabilities, while providing a Hadoop Distributed File System compatible storage layer powered by Cassandra. It also exposes the full power of Cassandra for real-time applications. The result is a single integrated solution that provides increased reliability, simpler deployment and lower TCO than traditional Hadoop solutions.

      A key benefit of DataStax’ Brisk is the tight feedback loop it allows between a real-time application and the analytics that follow. Traditionally, users would be forced to move data between systems via complex extract, transform and load processes, or perform both functions on the same system with the risk of one impacting the other. DataStax’ Brisk, a new Hadoop and Hive distribution, will be available under Apache open-source license within 45 days of this announcement.

      “By marrying the power of Cassandra-including its simplicity, scalability and speedy reads/writes-to Hadoop, DataStax has created a powerful system that speeds up the time between data creation and analysis.” Tim Estes, CEO of Digital Reasoning, said in a statement. “We can count on some of Cassandra’s unique capabilities to aid projects that have multiple data center locations, and large and complex bulk ingest demands. We’ve been thrilled to work with the DataStax team to push its capabilities to some of the most demanding customers-particularly in the Defense and Intelligence Community.”

      Michael Weir, vice president of marketing at DataStax, explained some key uses of Brisk:

      • High-volume Websites-Provide real-time data access and storage for millions of simultaneous users. Directly perform Hive analysis on the latest data, and immediately feed analytic insights back into the application behavior.

        Retail-Maintain real-time summaries and aggregates to allow a continuously up-to-date view of important business metrics. Send alerts when anomalies occur.

      • High-volume event processing-Track and react instantly to millions of sensors or other distributed feeds, while allowing deeper analytic questions to be asked of the historical data at any moment.

      • Finance and capital markets-Process, store and trigger actions based on a high-volume real-time event stream. Perform analytics on historical data, and update models directly into the application.

      The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model. Cassandra was open-sourced by Facebook in 2008, and is now developed by Apache committers and contributors from many companies. Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX and more companies that have large, active data sets. The largest production cluster has over 100TB of data in over 150 machines.

      “Not much else can compete with Cassandra in terms of performance,” Pfeil said.

      Pfeil said he and a former colleague from Rackspace decided to leave the hosting company to create a startup around Cassandra after having worked with Cassandra at Rackspace.

      “We continue to support the open-source project,” Pfeil said. “We employ 80 percent of the people working on it. And we’ll continue to build products that help users use Cassandra more easily and effectively.”

      Explaining the value of DataStax’ Brisk and Cassandra, Weir said, “It would be as if Watson was not just taking cues from its vast knowledge base, but was also taking in all the other variables around him, like the other players and how they’re playing, and assessing all of that in real time. You can run the real-time processing and the analytics at the same time. We’re bridging that gap between real time and analytics.”

      Moreover, keying in on the emerging importance of technologies like Cassandra, Pfeil said, Rackspace, is “storing a large amount of really small files on commodity hardware, and we had to expect failure to happen, so we had to find ways to scale horizontally. You don’t need a supercomputer to make everything work anymore; you can use cheap commodity computers.”

      With Cassandra, data is automatically replicated to multiple nodes for fault tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime

      “The tide has turned,” Pfeil said. “Big data for enterprises used to be a problem; now, it’s an opportunity.”

      Darryl K. Taft
      Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

      MOST POPULAR ARTICLES

      Cybersecurity

      Visa’s Michael Jabbara on Cybersecurity and Digital...

      James Maguire - May 17, 2022 0
      I spoke with Michael Jabbara, VP and Global Head of Fraud Services at Visa, about the cybersecurity technology used to ensure the safe transfer...
      Read more
      Cloud

      Yotascale CEO Asim Razzaq on Controlling Multicloud...

      James Maguire - May 5, 2022 0
      Asim Razzaq, CEO of Yotascale, provides guidance on understanding—and containing—the complex cost structure of multicloud computing. Among the topics we covered:  As you survey the...
      Read more
      Big Data and Analytics

      GoodData CEO Roman Stanek on Business Intelligence...

      James Maguire - May 4, 2022 0
      I spoke with Roman Stanek, CEO of GoodData, about business intelligence, data as a service, and the frustration that many executives have with data...
      Read more
      Applications

      Cisco’s Thimaya Subaiya on Customer Experience in...

      James Maguire - May 10, 2022 0
      I spoke with Thimaya Subaiya, SVP and GM of Global Customer Experience at Cisco, about the factors that create good customer experience – and...
      Read more
      IT Management

      Intuit’s Nhung Ho on AI for the...

      James Maguire - May 13, 2022 0
      I spoke with Nhung Ho, Vice President of AI at Intuit, about adoption of AI in the small and medium-sized business market, and how...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2021 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×