Amazon SimpleDB a Solid Choice for Simple Web-based Data Storage

Tech analysis: Amazon in December released the beta version of SimpleDB, which is part of Amazon Web Services. Amazon SimpleDB offers businesses using cloud-based applications a place to store simple data. While not useful for all Web-based data storage, Amazon SimpleDB can work well in environments where users need to quickly look up data.

In December, Amazon released the beta version of its SimpleDB product. SimpleDB is part of a suite of tools making up the Amazon Web Services, or AWS.

It has been in the works for quite some time; indeed, Amazon created an early program that people were signing up for a year earlier. Over that year, Amazon made many tweaks and improvements to SimpleDB, apparently listening to the concerns of the people trying it out.

Now, with this new beta release (which anybody can sign up for), we can see what the final product will most likely look like. I thought I'd take it for a test drive, and since I'm primarily a Visual Studio developer, my tool of choice was the C#/Visual Studio library. (Amazon offers several official libraries for other platforms, such as Java, Perl, and PHP.)

Fitting In

SimpleDB takes a rather unique approach to storage, giving your cloud-based applications a place to store simple data. The approach is similar to that of a spreadsheet and definitely not relational. (It reminds me of Google's BigTable, which is available in a smaller form, called DataStore, through the Google App Engine.) Amazon also has several other database offerings should SimpleDB not fit with your needs. For example, the company offers Amazon Simple Storage for storing files, and within its main cloud platform-EC2 (Elastic Compute Cloud)-you can run any of several database servers, including SQL Server. You're not limited to just SimpleDB.


The idea behind SimpleDB (like that of Google's DataStore) is fast reading. Most Web sites-but not all-need to retrieve information quickly, much more quickly than they need to save data. Amazon's own site is such an example. People want to browse for books and other products, and want to see the pages come up quickly. There's little data storing taking place beyond your browsing history. People don't want to have to wait for these pages to load. However, when people are, for example, entering a message into a forum on Amazon, the posting might have a very short but noticeable delay, and people seem to be forgiving of that. In general, most people seem to be happy with fast reads and maybe-not-quite-as-fast writes. That's what SimpleDB (and Google DataStore) offer.

While most databases today are relational (and implement the SQL language), SimpleDB is definitely not relational. Instead of creating data in tables with identical rows, you create sets of data (called domains) that contain data items. Each item can have multiple "attributes," where each attribute is given a name.

This is where things get quite different from a traditional, relational database. The sample docs in SimpleDB give a pretty good example. Suppose you create a domain that stores information on products. One product might be an article of clothing and would have attributes such as color and size. Another product might be something altogether different, such as a car engine. That product wouldn't have color and size, but rather something like make, model and year.

In a relational database, this would either require two separate tables or a single table with empty columns for those attributes that do not apply. (For example, size would be left empty for a row containing information about an engine.) But in SimpleDB, size wouldn't even exist for the engine item, and make and model wouldn't exist for a sweater. Yet the engine and the sweater can both be stored in a single domain.

Thus, you could have the following data in a single domain:

Item1: Category=Clothes; SubCategory=Sweater; Name=Cathair Sweater; Color=Siamese, Size=Small, Medium, Large.

Item2: Category=Car Parts; SubCategory=Engine; Name=Turbos; Make=Audi; Model=S4; Year=2000,2001,2002.

Notice the attributes Category, SubCategory and Name are used in both data items. But the other attributes are unique to the item. But also notice that you can store more than one value for an attribute. For Item1, the sweater, the Size has three values: Small, Medium and Large. For Item2, the engine, the Year attribute also has three values: 2000, 2001 and 2002.