Cloudant Merges BigCouch Into Apache CouchDB

By Darryl K. Taft  |  Posted 2013-07-22

Cloudant Merges BigCouch Into Apache CouchDB

Cloudant, a provider of a distributed database-as-a-service (DBaaS), announced that it has delivered on its promise to integrate core capabilities of its distributed database service to the open-source Apache CouchDB project.

CouchDB serves as the foundation of Cloudant's technology stack in the form of BigCouch, an open-source variant of CouchDB that the company developed to support large-scale, globally distributed applications.

BigCouch is an open-source, highly available, fault-tolerant, clustered and API-compliant version of Apache CouchDB. BigCouch enables users to create clusters of CouchDBs that are distributed over an arbitrary number of servers. While it appears to the end user as one CouchDB instance, it is in fact one or more nodes in an elastic cluster, acting in concert to store and retrieve documents, index and serve views, and serve CouchApps.

After four years of operating BigCouch in production, Cloudant has merged the BigCouch code into the CouchDB code base, making it possible to manage and replicate data with CouchDB at a much larger scale.

"There are a lot of reasons people love CouchDB, like its elegant programming model, data durability, flexible indexing, and, most of all, its unique way of replicating and synching data across data centers or devices," Adam Kocoloski, co-founder and CTO at Cloudant, said in a statement. "We're merging the horizontal scaling and fault-tolerance framework we built for BigCouch into CouchDB so people can more easily scale all that CouchDB goodness across multiple servers and keep it running nonstop. It's our way of saying thanks and helping to grow the community of CouchDB developers and users."

"The code merger of BigCouch and Apache CouchDB is good for the open-source community and developers that require a scalable Web-aware database," Travell Perkins, CTO at Fidelity Investments, said in a statement. "As a classically trained computer scientist, I'm interested in the inner workings of my database solutions as much as the practical utility they provide dynamic data and use cases.

"I've tried a lot of NoSQL solutions over the years with varying degrees of success. After working with the distributed clustering capabilities being built into CouchDB, I think we are approaching the ideal JSON-centric database for enterprise workloads at scale," Perkins continued.

The open-source BigCouch database project was developed in 2008 by the Cloudant co-founders, who had previously been using CouchDB for managing and distributing the petabytes of data generated every second by CERN's Large Hadron Collider. They developed a horizontal clustering and fault-tolerance framework for BigCouch that was inspired by the Amazon Dynamo research paper.

For the code merger, Cloudant engineers imported sections of BigCouch code into the Apache CouchDB repositories, adapting the database to run in a clustered environment and to better replicate databases across clusters and between data centers.


Cloudant Merges BigCouch Into Apache CloudDB

However, going forward, Cloudant will cease development of BigCouch to further participate in the CouchDB community and keep CouchDB and Cloudant clustering functionality in sync. Cloudant engineers will continue to make cluster-scaling and fault-tolerance enhancements within the CouchDB project and will reuse that code in Cloudant's database service, the company said.

"We're working to integrate BigCouch's clustering technology with CouchDB—we've set the stage and welcome more project committers to get involved," said Jan Lehnardt, vice president of Apache CouchDB, in a statement. "With Cloudant's work to fine-tune BigCouch large-scale database replication, Apache CouchDB now has a complete strategy for replicating data across distributed systems, whether nodes are Erlang clusters in the same data center or on the other side of the world. Developers now have more options for moving data closer to their users and a simpler strategy for synchronizing that data throughout a larger system."

The key accomplishment of the merged code is the BigCouch clustering capability, Cloudant officials said. Among other improvements to Apache CouchDB, Cloudant has contributed a new compactor process that creates smaller and better-organized post-compaction databases.

CouchDB users can now experience significant improvements in compaction and replication speed, as well as boosts in high-concurrency access performance. Additional improvements include better index update speeds, updated aggregate reduce functions, smooth hot-code updates, improved logging and streamlined libraries. Cloudant engineers also refactored internal code, removing complicated sections and boosting overall performance, the company said.

A preview of the merged software is available now, and a general release of CouchDB with the merged BigCouch functionality is targeted to be available following the Apache community release process.

In May, Cloudant announced $12 million in series B funding from Devonshire Investors, a private equity firm affiliated with Fidelity Investments; Rackspace Hosting, an open cloud infrastructure provider; and Toba Capital. The company also announced that current investors — Avalon Ventures, In-Q-Tel, Samsung Venture Investment Corporation — purchased additional shares. Cloudant said the funding would be used to support its global expansion and grow the company's support, service and go-to-market strategies.

"We hear all the time from customers that dealing with the complexities of large-scale systems infrastructure just slows them down," Pat Matthews, senior vice president of corporate development at Rackspace, said in a statement on the funding. "Developers want control of their infrastructure, but they don't want to have to manage it 24/7. Cloudant is the natural extension of this idea at the database layer. We're partners that share a commitment to delivering the highest level of customer support, which is why investing in Cloudant works so well from a Rackspace perspective."


Rocket Fuel