The Apache Software Foundation (ASF) has announced the availability of Apache Cassandra v2.0, the latest version of the highly scalable, NoSQL big data distributed database.
Apache Cassandra powers massive data sets quickly and reliably without compromising performance, whether running in the cloud or partially on-premise in a hybrid data store, ASF officials said.
New features in Apache Cassandra v2.0 include lightweight transactions, triggers and CQL (Cassandra Query Language) enhancements that increase productivity in creating modern, data-driven applications.
“The headlining features in 2.0 are lightweight transactions, CQL enhancements and triggers,” wrote Jonathan Ellis, chair of the Apache Cassandra Project Management Committee and CTO at DataStax, in a blog post on the new release. “But 2.0 also features a lot of internal optimizations and improvements,” he added.
ASF officials said Cassandra’s fully distributed architecture provides critical fault tolerance to ensure applications will not go offline, and its linear scalability allows them to reach massive sizes while successfully handling thousands of requests per second.
“In five years, Apache Cassandra has grown into one of the most widely used NoSQL databases in the world and serves as the backbone for some of today’s most popular applications,” Ellis, who also is vice president of Apache Cassandra, said in a statement. “Cassandra 2.0 makes it easier than ever for developers to migrate from relational databases and become productive quickly.”
When DataStax announced a new version of its database platform–based on Cassandra–in July, Ellis said, “Version 2.0 continues our focus on the developer experience. Features like lightweight transactions and cursors make the Cassandra Query Language even more powerful and easy to use, while we continue to make performance improvements under the hood.”
New features and improvements include:
- Lightweight transactions allow ensuring operation ‘linearizability’ similar to the ‘serializable’ isolation level offered by relational databases, which prevents conflicts during concurrent requests
- Triggers, which enable pushing performance-critical code close to the data it deals with, and simplify integration with event-driven frameworks like Storm
- CQL enhancements such as cursors and improved index support
- Improved compaction, keeping read performance from deteriorating under heavy write load
- Eager retries to avoid query timeouts by sending redundant requests to other replicas if too much time elapses on the original request
- Custom Thrift server implementation based on LMAX Disruptor that achieves lower message processing latencies and better throughput with flexible buffer allocation strategies
“What I’m struck by is how this release is characterized by a slew of relational database-like features, such as the enhancements to CQL–which is much like SQL, lightweight transactions and triggers,” Andrew Brust, founder and CEO of Blue Badge Insights and a big data guru, told eWEEK. “Seems that the next frontier for so many NoSQL and big data companies is to assume attributes of the technologies they have implicitly been discrediting.”
Apache Delivers Cassandra 2.0 Open-Source NoSQL Database
Ellis lists several performance optimizing, spring cleaning and operational changes that stand out in Cassandra 2.0 in his post, including that Java 7 is now required and streaming has been rewritten to be more transparent and robust. Cassandra 2.0 also removes emergency memory pressure valve logic. “The intent here was to give operators enough breathing room to fix misconfigurations causing heap pressure, but it was never as reliable as we would have liked. And now that the important storage engine metadata has been moved off-heap, memory shortages will be obvious much earlier,” Ellis wrote.
ASF officials said the Apache Cassandra developer community includes some of the brightest minds in big data. Hundreds of organizations, from startups to large-scale enterprises such as Adobe, Cisco and IBM, rely on Cassandra to power their mission-critical applications online.
“At Ooyala, we’re building some of our most ambitious projects to date on top of Apache Cassandra,” said Al Tobey, tech lead for compute and data services at Ooyala, said in a statement. “The maturation of CQL3, vnodes, and new features such as the PAXOS-backed compare-and-set (CAS) added in Cassandra 2.0 will help us build and deploy those projects confidently.”
Apache Cassandra is used by many highly visible organizations such as Accenture, CERN, Cloudkick, Comcast, Constant Contact, Dell, Digg, Ericsson, Eventbrite, GoDaddy, Houghton Mifflin Harcourt, HP, Instagram, Intuit, Mahalo, Microsoft MetricsHub, Morningstar, NASA, Netflix, Nextag, OpenWave, PBS Kids, Pitney Bowes, Plaxo, Polyvore, Real Networks, Reddit, Sony Network Entertainment, SoundCloud, Spotify, Squidoo, Stormpath, Symantec, Twitter, Wildfire, WSO2, and ZoomInfo. A listing of where Apache Cassandra is used and deployment details can be found here.
“Paying down a lot of the technical debt accumulated over five years of intense open-source development, and solidifying the Native Binary Transport for CQL 3, has put the project on a great footing,” said Aaron Morton, an Apache Cassandra committer and co-founder and principal consultant of The Last Pickle, in a statement. “The addition of Lightweight ‘Compare-and-Set’ transactions and cursors brings another set of features that make it easier for developers to harness the performance and scale of Cassandra. And the experimental Trigger support will allow open-source contributors to provide feedback for this often-requested feature.”
“It’ll be really helpful to have conditional updates built into Cassandra,” said Jon Haddad, senior architect at Shift, in a statement. “Right now there’s a few places where we have to use external locking to manage isolation, and having built in support in the database will be amazing.”