Many of the Web's busiest and most popular Websites, including Facebook and Twitter, have one thing in common that enables the sites to scale out to millions of users without a significant impact on performance: Memcached. Gear6 hopes its distribution of Memcached takes off for enterprise developers. Mark Atwood, Gear6's new Director of Community Development, talks to eWEEK about capitalizing on the open source technology.
Many of the Web's busiest and most popular Websites, including
Facebook and Twitter, have one thing in common that enables the sites
to scale out to millions of users without a significant impact on
performance:
Memcached.
First introduced by Djanga Interactive almost
six years ago to scale out LiveJournal,
Memcached is easily among the most popular open-source software
projects in use today. And its use has been organic and widespread for
so-called LAMP, or Linux/Apache/MySQL/Perl/PHP/Python sites.
In April at the 2009 MySQL Conference, not one but three companies
launched commercial Memcached offerings.
Only one of them is taking a page out of the commercial open-source
playbook. Similar to how Red Hat packaged a Linux distribution,
Gear6 has packaged a Memcached distribution. The company recently hired Mark Atwood as its new Director of Community Development.
Before joining Gear6, Atwood was a senior technology advisor at Sun
Microsystems, working on Sun's cloud computing strategy. Atwood also is
an active contributor to the
libmemcached and
Drizzle projects. Atwood talked with eWEEK senior editor Darryl K. Taft about Memcached and why there is so much interest in it now.
Q: What's changed? Memcached has been around for six years, and suddenly it's showing up everywhere now?
A: The Web is more dynamic than ever. There are more people
online doing social networking, communicating, gaming. The LAMP stack
is starting to show its age for highly scalable applications, so a new
Web-scale architecture is emerging, with a bunch of technologies coming
together, including network caching, simple databases, key-value
stores, queuing, job servers, and dynamic allocation of processing.
Although no one's sure where it's going just yet, Memcached is going to
be an important part of this.
Q: How is Memcached used?
A: The vast majority of top Websites use Memcached already.
Databases can contain huge amounts of data, but for really
high-performance Websites, they're just too slow. Most of the data that
you
look up in a database, you're going to be looking up over and over
again. When a page gets rendered, the same data will be shown many
times as you render similar pages over and over. As you browse through
a site, most of the information displayed is not going to change from
moment to moment, so it's wasteful for the application server to look
all this stuff up in the database at the same time, repetitively.
One solution would be for an application server to keep a local
cache, so it doesn't have to look up data so often. But really large
sites run multiple instances of their application servers, so they need
a cache to share between them. And Memcached is the one that fits that
problem most cleanly. It's a very simple protocol that was designed to
run on very affordable hardware. There's not a lot of complexity or
learning curve in writing to its API, so it really took off in this
space.
There are many other caching solutions but they're all a combination
of complicated and very expensive to use. Memcached hit a sweet spot
being simple and free.
Q: What attracted you to Gear6 and what will you be doing in your new role?
A: I am going to be the face of Gear6 into the open-source
communities that are part of this new Web scale architecture. These
include Memcached,
Gearman,
Drizzle and libmemcached. Gear6 has a clear vision of how it wants to
be positioned in this new Web stack, and I want to be part of it. It's
also exciting to me that commercial companies are beginning to form
around Memcached. I am a big believer in open source and think I can
help augment Gear6's contributions to open source. It will be good for
the company and for the open-source community.
Q: What is the most interesting thing about Memcached today?
A: Its amazing simplicity. It's a very simple key value, or
KV, store. KV stores are becoming very big and Memcached was there
before people realized just how important they would become.
The other interesting thing about Memcached is really how
inefficient it is. It was written so simply and so quickly and it
solved a need so well that no one realized how wasteful it was of the
memory was on the machine it was running on. One of the things Gear6
brings to the game - at the cost of a great deal more thought and some
careful engineering - is more efficient memory use. The Gear6 Memcached
distribution is Memcached at the same speed and same API but a much
smaller memory footprint.
Q: You also contribute to Drizzle and libmemcached. How do they fit with Memcached?
A: Libmemcached is a client library for Memcached mainly written by
Brian Aker
with input of many other people, including myself. It was written when
Brian discovered that the existing C library binding for Memcached
clients was slow and buggy, and so he felt the need to write a much
faster, more efficient one, and now that's becoming the basis for
language bindings for many other languages. Python, Ruby and at least
one of the well distributed Perl bindings use Memcached.
Drizzle is a fork of MySQL 6.0. It was done with idea of being a
re-architecting and an opportunity to revisit some decisions, maybe
correct some mistakes. But instead of focusing on enterprise use of a
database, that is, competing with Oracle, Drizzle is designed to run
behind web and application servers. It's designed to run on
rack-mounted machines with many processor cores serving the kinds of
queries that get asked by application servers building web pages.
Drizzle is going to start showing up on high performance websites
because that's what it was designed to do, and Memcached shows up on
these same implementations because it enables high performance websites.
Q: Does this high performance extend to cloud applications?
A: Cloud providers are either running a fair amount of
Memcached or should be running a fair amount of Memcached under the
hood for their own uses. I hope soon that they will be exposing a
Memcached service to their users. Multi-tenancy support will be very
important for that, allowing multiple people to use the same actual
cache servers without interfering with each other. Gear6 has done some
work on multi-tenancy support and I know that several of the developers
in the open source community have been doing some work on this. My hope
is that we can get everything aligned so that we have only one
implementation to manage.
Some people are resisting moving into the cloud because they can't
get high performance Memcached in their environment. They get medium
performance Memcached by running the open source server on EC2 [Amazon
Elastic Compute Cloud] nodes. Having a native Memcached in AWS [Amazon
Web Services], Rackspace, Network.com or any of the other services is
something that I would like to see happen.
Q: What is your best advice about Memcached to LAMP developers?
A: Design your applications to scale out. And design your
applications to scale out. Don't assume that you're just going to get a
bigger and bigger machine. Start with the assumption that you're going
to be running multiple instances of your application server talking to
multiple shards or copies of your database and, of course, cache as
much as you can in Memcached.