Many of the Web’s busiest and most popular Websites, including Facebook and Twitter, have one thing in common that enables the sites to scale out to millions of users without a significant impact on performance: Memcached.
First introduced by Djanga Interactive almost six years ago to scale out LiveJournal, Memcached is easily among the most popular open-source software projects in use today. And its use has been organic and widespread for so-called LAMP, or Linux/Apache/MySQL/Perl/PHP/Python sites.
In April at the 2009 MySQL Conference, not one but three companies launched commercial Memcached offerings. Only one of them is taking a page out of the commercial open-source playbook. Similar to how Red Hat packaged a Linux distribution, Gear6 has packaged a Memcached distribution. The company recently hired Mark Atwood as its new Director of Community Development.
Before joining Gear6, Atwood was a senior technology advisor at Sun Microsystems, working on Sun’s cloud computing strategy. Atwood also is an active contributor to the libmemcached and Drizzle projects. Atwood talked with eWEEK senior editor Darryl K. Taft about Memcached and why there is so much interest in it now.
Q: What’s changed? Memcached has been around for six years, and suddenly it’s showing up everywhere now?
A: The Web is more dynamic than ever. There are more people online doing social networking, communicating, gaming. The LAMP stack is starting to show its age for highly scalable applications, so a new Web-scale architecture is emerging, with a bunch of technologies coming together, including network caching, simple databases, key-value stores, queuing, job servers, and dynamic allocation of processing. Although no one’s sure where it’s going just yet, Memcached is going to be an important part of this.
Q: How is Memcached used?
A: The vast majority of top Websites use Memcached already. Databases can contain huge amounts of data, but for really high-performance Websites, they’re just too slow. Most of the data that you look up in a database, you’re going to be looking up over and over again. When a page gets rendered, the same data will be shown many times as you render similar pages over and over. As you browse through a site, most of the information displayed is not going to change from moment to moment, so it’s wasteful for the application server to look all this stuff up in the database at the same time, repetitively.
One solution would be for an application server to keep a local cache, so it doesn’t have to look up data so often. But really large sites run multiple instances of their application servers, so they need a cache to share between them. And Memcached is the one that fits that problem most cleanly. It’s a very simple protocol that was designed to run on very affordable hardware. There’s not a lot of complexity or learning curve in writing to its API, so it really took off in this space.
There are many other caching solutions but they’re all a combination of complicated and very expensive to use. Memcached hit a sweet spot being simple and free.
Q: What attracted you to Gear6 and what will you be doing in your new role?
A: I am going to be the face of Gear6 into the open-source communities that are part of this new Web scale architecture. These include Memcached, Gearman, Drizzle and libmemcached. Gear6 has a clear vision of how it wants to be positioned in this new Web stack, and I want to be part of it. It’s also exciting to me that commercial companies are beginning to form around Memcached. I am a big believer in open source and think I can help augment Gear6’s contributions to open source. It will be good for the company and for the open-source community.
Q: What is the most interesting thing about Memcached today?
A: Its amazing simplicity. It’s a very simple key value, or KV, store. KV stores are becoming very big and Memcached was there before people realized just how important they would become.
The other interesting thing about Memcached is really how inefficient it is. It was written so simply and so quickly and it solved a need so well that no one realized how wasteful it was of the memory was on the machine it was running on. One of the things Gear6 brings to the game – at the cost of a great deal more thought and some careful engineering – is more efficient memory use. The Gear6 Memcached distribution is Memcached at the same speed and same API but a much smaller memory footprint.
Q: You also contribute to Drizzle and libmemcached. How do they fit with Memcached?
A: Libmemcached is a client library for Memcached mainly written by Brian Aker with input of many other people, including myself. It was written when Brian discovered that the existing C library binding for Memcached clients was slow and buggy, and so he felt the need to write a much faster, more efficient one, and now that’s becoming the basis for language bindings for many other languages. Python, Ruby and at least one of the well distributed Perl bindings use Memcached.
Drizzle is a fork of MySQL 6.0. It was done with idea of being a re-architecting and an opportunity to revisit some decisions, maybe correct some mistakes. But instead of focusing on enterprise use of a database, that is, competing with Oracle, Drizzle is designed to run behind web and application servers. It’s designed to run on rack-mounted machines with many processor cores serving the kinds of queries that get asked by application servers building web pages. Drizzle is going to start showing up on high performance websites because that’s what it was designed to do, and Memcached shows up on these same implementations because it enables high performance websites.
Q: Does this high performance extend to cloud applications?
A: Cloud providers are either running a fair amount of Memcached or should be running a fair amount of Memcached under the hood for their own uses. I hope soon that they will be exposing a Memcached service to their users. Multi-tenancy support will be very important for that, allowing multiple people to use the same actual cache servers without interfering with each other. Gear6 has done some work on multi-tenancy support and I know that several of the developers in the open source community have been doing some work on this. My hope is that we can get everything aligned so that we have only one implementation to manage.
Some people are resisting moving into the cloud because they can’t get high performance Memcached in their environment. They get medium performance Memcached by running the open source server on EC2 [Amazon Elastic Compute Cloud] nodes. Having a native Memcached in AWS [Amazon Web Services], Rackspace, Network.com or any of the other services is something that I would like to see happen.
Q: What is your best advice about Memcached to LAMP developers?
A: Design your applications to scale out. And design your applications to scale out. Don’t assume that you’re just going to get a bigger and bigger machine. Start with the assumption that you’re going to be running multiple instances of your application server talking to multiple shards or copies of your database and, of course, cache as much as you can in Memcached.