DynamoDB Integrates With Amazon Elastic MapReduce

By Darryl K. Taft  |  Posted 2012-01-18

Amazon Web Services Launches DynamoDB, a New NoSQL Database Service

Amazon Web Services has again delivered key technology to keep itself ahead of the cloud computing pack with a new high-performance, highly scalable NoSQL database service known as DynamoDB.

AWS quietly keeps delivering new capabilities that help its customers out of jams and continue to confound its competitors. Amazon DynamoDB is a fully managed NoSQL database service that provides extremely fast and predictable performance with seamless scalability, said Adam Selipsky, vice president of marketing, sales, product management and support at AWS.

With a few clicks in the AWS Management Console, customers can launch a new DynamoDB database table, scale up or down their request capacity for the table without downtime or performance degradation, and gain visibility into resource utilization and performance metrics. Amazon DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases so they don't have to worry about hardware provisioning, setup and configuration, replication, software patching, partitioning, or cluster scaling. To get started with Amazon DynamoDB, visit www.aws.amazon.com/DynamoDB.

"Scaling a database is as easy for a developer as turning up a dial to add database capacity seamlessly or to remove it by turning the dial down again," Werner Vogels, CTO of Amazon, told eWEEK. "That's it. You tell the service the number of requests it has to handle per second, and it does the rest automatically. So we spread the data across enough hardware to provide consistent performance, which also protects against downtime. Before DynamoDB, this was something developers actually had to manage themselves."

Unlike DynamoDB, traditional databases are not designed to scale to the performance needs of modern applications, which can experience explosive growth and cause a single database to quickly reach its capacity limits. Mitigating this by distributing a workload across multiple database servers is complex and requires significant engineering expertise and time investment by application developers. Amazon DynamoDB addresses the problem of scalability by automatically partitioning and repartitioning data as needed to meet the latency and throughput requirements of highly demanding applications. Additionally, Amazon DynamoDB's pay-as-you-go pricing enables customers to "dial in" and pay for only the resources they need.

"Amazon has spent more than 15 years tackling the challenges of database scalability, performance and cost-effectiveness using distributed systems and NoSQL technology," Vogels said in a statement. "Amazon DynamoDB is the result of everything we've learned from building large-scale, non-relational databases for Amazon.com and building highly scalable and reliable cloud computing services at AWS.

"Customers can now remove the operational headaches of managing distributed systems and deploy a non-relational database in a matter of minutes. DynamoDB automatically scales to enterprise needs, and is designed for rapid performance no matter the size of the database. Amazon DynamoDB is already in use by many teams and products within Amazon, including the Amazon.com advertising platform, Amazon Cloud Drive, IMDb and Kindle."

Amazon DynamoDB offers low, predictable latencies at any scale, and customers typically enjoy single-digit millisecond latencies for database read and write operations. Amazon DynamoDB stores data on solid-state drives (SSDs) and replicates it synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability. Businesses can get started with Amazon DynamoDB using a free tier that provides 100MB of storage, and five writes and 10 reads per second (up to 40 million requests per month) free of charge, Selipsky said.

Of the new technology, Vogels added: "It's not only about scalability, it's also about performance-it is fast. In the past if database architects and database administrators needed to guarantee the performance of their applications, they needed to buy extremely expensive hardware to be able to scale up or go scale out and do partitioning and things like that, which introduce tremendous complexity. Now, within DynamoDB we've done a lot of innovation to make sure one can make use of ADB at this massive scale to automatically spread data across enough hardware to deliver this consistently fast performance."

Moreover, "Customers should expect single-digit millisecond response times," Vogels told eWEEK. "We are pretty stoked about this one. This is something that our customers have been asking for, for quite a while.

"There are a few big customer groups looking for this: those that already use NoSQL solutions and want a solution that's completely managed and they no longer have to manage the software and the hardware for it. Then there's the group that's coming out of enterprises with data architects that always wanted to start experimenting with or using a NoSQL solution, but just the task of installing software, managing hardware and things like that was too daunting for them. So we take a barrier away for enterprise adoption of NoSQL as well. Then a third big category of customers that have been asking for a solution like this are the ones in the big data area, where they need a very fast key value store that is able to provide them with very high throughput for their big data applications."

DynamoDB Integrates With Amazon Elastic MapReduce


Selipsky said Amazon DynamoDB also integrates with Amazon Elastic MapReduce (Amazon EMR). Amazon EMR allows businesses to perform complex analytics of their large datasets using a hosted pay-as-you-go Hadoop framework on AWS. With the launch of Amazon DynamoDB, it is easy for customers to use Amazon EMR to analyze datasets stored in DynamoDB, archive the results in Amazon Simple Storage Service (Amazon S3), while keeping the original dataset in DynamoDB intact. Businesses can also use Amazon EMR to access data in multiple stores (i.e., Amazon DynamoDB, Amazon RDS and Amazon S3), do complex analysis over this combined dataset and store the results of this work in Amazon S3.

"A lot of what we've been doing at AWS for years has been trying to help developers spend less time with the complex management of infrastructure that is not necessarily differentiating to their businesses," Selipsky said. "Nowhere is that need more pressing than in the area of databases. Databases traditionally involve a lot of complexity and difficulty in scaling workloads, and incurring a lot of costs or involving downtime for applications. So DynamoDB is aimed squarely at removing all of that muck and providing very predictable performance and high scalability, all without requiring any intervention or management from customers. And the customers we've been working with are excited about that."

"Elsevier is a $3 billion enterprise that provides science and health information to more than 30 million scientists, students and medical professionals worldwide," said Darren Person, chief architect of Elsevier, in a statement. "Each year we publish thousands of books, nearly 2,000 journals and more than 250,000 articles, which means our datasets are constantly and rapidly changing. We are always evaluating new technologies that will enable us to handle our large, varying workloads. Operating a distributed data store on our own is orders of magnitude more complicated and expensive to manage than traditional databases. DynamoDB delivers a high-performance service that can be easily scaled up or down to meet our needs, helping us eliminate complexity and lower costs."

"DynamoDB is a truly revolutionary product which allows SmugMug to finally realize its goal of being 100% cloud-based," added Don MacAskill, CEO of SmugMug, in a statement. "I love how DynamoDB enables us to provision our desired throughput, and achieve low latency and seamless scale, even with our constantly growing workloads. Even though we have years of experience with large, complex architectures, we are happy to be finally out of the business of managing it ourselves, and to be using DynamoDB to get even higher performance and stability than we can achieve on our own. Most importantly, DynamoDB allows SmugMug to spend even more time and energy on what really matters-our product and customer experience."

"DynamoDB solves our problem of distributing and storing high-volume writes in a straightforward and cost-effective way," said Rob Storrs, head of engineering at Formspring, in a statement. "Our rapid growth meant that we were spending significant resources managing our own large-scale database systems.  DynamoDB gives us low latency and easy scalability, which allows us to keep our costs low and our engineers focused on building what our customers want.  It's another example of AWS listening to their customers and building services that solve real problems."

"Prior to Amazon DynamoDB, many of our customers were forced to spend weeks forecasting, planning, and preparing their database deployments to perform well at peak loads," said Raju Gulabani, vice president of Database Services at AWS, in a statement. "DynamoDB makes those processes obsolete. Now businesses can quickly add capacity with a few clicks in the management console. During our private beta, we saw customers successfully scale up from 100s of writes per second to over 100,000 writes per second without having to change a single line of code. This level of elasticity, coupled with consistent performance, reduces the cost and the risk of building a fast-growing application."

As mentioned earlier, Vogels said DynamoDB is the result of 15 years of learning. More specifically, it is related to an internal technology known as Dynamo that the company began writing about seven or eight years ago, Vogels said. DynamoDB is a follow-on to that research with input from some others areas, he said.



Rocket Fuel