Database Options for the Cloud

Cloud computing is stretching the relational database management system (RDBMS) model to the breaking point. Geir Magnusson, a cloud computing expert, discusses various options developers have for how to design, develop, deploy and manage cloud-scale applications.

NEW YORK -- Although a relational database management system is what most developers are used to working with, you're not likely to use one in the cloud, a cloud computing expert said as he listed the strengths and weaknesses of various cloud computing options.

Speaking at the Web 2.0 Expo here, Geir Magnusson, vice president of engineering and co-founder of 10gen, said "an RDBMS is what you need, but not in the cloud." However, object/relational mapping [O/R mapping] is one way of getting around the impedance mismatch between object-oriented languages and data stored in a relational system. "O/R mapping blends the power of an RDBMS with the programming simplicity of an ODBMS [object database management system]," Magnusson said, noting that there is support for O/R mapping in Java, Python, Ruby, .NET and Groovy. "O/R mapping is everywhere."

Magnusson delivered a talk entitled: "The Sequel to SQL: Why You Won't Find Your RDBMS in the Clouds."

For his part, Magnusson defined "the cloud" as "un-localized or anonymous computing services and/or resources." He listed several types of cloud computing options. SAAS (software as a service) is one option, signified by PAAS (platform as a service) is another option, signified by 10gen and Google AppEngine. The TAAS (tools as a service) option is represented by Amazon's SimpleDB. And the HAAS (hardware as a service) option is represented by Amazon's Elastic Compute Cloud (EC2), Magnusson said.

The cloud has become popular because of the benefits and savings cloud computing affords, such as capital and operational expenditure savings, as well as the opportunity to better serve customers with improved availability and accessibility. However, there is a catch. "From a data perspective you need to do data duplication and distribution," Magnusson said. And for larger data sets the data has to be partitioned, he said.

Expanding on the notion of HAAS, Magnusson discussed the option of "plate spinning on EC2. You have multiple VMs [virtual machines] on MySQL with one master and many slaves. And if one master goes away you can generate a new master from the remaining" VMs. However, "this is not my opinion of cloud computing; you're actually doing clustering," he said.

Meanwhile, Magnusson discussed Google's Bigtable, which Google is exposing through its AppEngine offering. Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size -- petabytes of data across thousands of commodity servers. Bigtable supports large-scale storage of data objects called entities. But Bigtable has a few constraints. Indexes are required for all queries and fetch requests are limited to 1,000 entities, among other things, Magnusson said.

Amazon's SimpleDB is a tabular store, Magnusson said. "I think it's meant as a metadata store for S3 [Amazon Simple Storage Service]." SimpleDB supports auto indexing, is eventually consistent, no cross domain joins, query is limited to 250 items and everything is a string, Magnusson said.

Meanwhile, 10gen's Mongo is a database for the 10gen platform. 10gen is developing a new PAAS technology designed to help developers quickly and easily build dynamic, scalable, mission critical Web sites and applications. The 10gen software stack is analogous to Google App Engine in that it provides a new stack of tools -- database, grid management, application server -- that are purpose-built to run in a cloud environment.

Mongo is an object/document store, Magnusson said. "Think of it as a binary JSON [JavaScript Object Notation]," he said. It is a dynamic language ODBMS targeted for the cloud. "Language bindings let you work with the language of your choice," Magnusson said.

Another cloud computing option is AppJet, which is somewhat like 10gen. he said. "It's like a JavaScript-based app server in the sky."

The takeaway from this synopsis? "No one is doing relational and data is treated in a clustered way," Magnusson said.

In addition, Magnusson mentioned a few other "technologies to watch," including the Drizzle technology out of Sun Microsystems' MySQL unit. Drizzle is a lightweight SQL database for the cloud and Web. "It's derived from MySQL and it's relational," Magnusson said. "It's a fork of the MySQL database that is optimized for cloud and network applications."

Another technology to watch is CouchDB, Magnusson said. CouchDB is a document-oriented database from the Apache Software Foundation that is written in the Erlang programming language. And then there is Hadoop, which Magnusson described as a combination distributed file system and map reduce engine.

With so many new options to choose from and a vast cloud environment to address, "programming is going to change and I think things are going to really be fun," Magnusson said.