Q: Is it safe for a large commercial Web site to run its servers on a Red Hat clone instead of the real thing?
A: I think so. Were currently running some trials to find out. We have several dozen servers running Apache on Red Hat 7.3, which is an older version of Red Hat that we are using without a support contract. We also have a smaller number of database servers running MySQL on Red Hat Enterprise Linux 3 (RHEL3), for which we maintain support contracts with Red Hat. Recently, however, Ive made a policy decision to move our servers to more recent versions of Linux, perhaps RHEL4 or RHEL5 or equivalent. When we saw what it would cost us to do that with Red Hat, we decided to look at less expensive options, in particular CentOS, which is a free binary clone of Red Hat compiled from the publicly available Red Hat source code.
Q: How quickly will you move CentOS into production?
A: Were not going to do it all at once. Were going to start by installing CentOS on just one server out of the 35 we run our site on. It will take a little extra work to administer this machine, since we wont be able to boot it from the same image server as all the other machines, which are still using Red Hat 7.3 or RHEL3. Well also upgrade the software stack on this machine to the more recent versions of Apache and Perl. Then well see how it does in production. If it goes down, at least we know it cant take the whole site down.
Q: Does moving to CentOS mean you will drop all your Red Hat support contracts?
A: No. We plan to keep some of our servers under Red Hat maintenance, just to stay up to date with what theyre doing.
Q: How often do you expect to patch your CentOS servers?
A: Until now weve been patching Red Hat 7.3 and RHEL3 on an irregular schedule, usually only when high priority patches came through. But as part of our new policy we plan to patch both the Red Hat and CentOS servers on a monthly basis.
Q: As the manager of a large commercial Web site serving millions of impressions per day, what keeps you awake at night?
A: Well, it isnt my Web servers or my database servers. Its the Network Appliance filers and the Foundry Network load balancers. If one of them ever failed in a non-recoverable way, our site would go down for a significant period of time. Its never happened so far. We do have good support contracts with these vendors for that. For example, we have a four-hour onsite replacement policy with Network Appliance. But just the thought that it might happen one day does keep me awake at night sometimes.
Q: Why arent you more worried about your database servers? Arent they mission critical?
A: Actually our database servers rarely go down, and when they do we can bring them back up fairly quickly, usually in 20 minutes to an hour. And remember, we are not a transactional Web site. Mostly we are serving articles that are stored in a directory tree in the file system, not in a database. We use a number of MySQL servers for things like indexing the content to enable search or for visitor registration. These are important functions, but if we lose them for a few minutes or even for an hour it isnt as critical as it might be for some other organizations. By contrast, if our whole site went down for several hours or more because of a non-recoverable problem with the file servers or the load balancers, that would be a much more serious issue for our business.
Q: How often does a large commercial Web site need to replace its servers in order to maintain good availability levels?
A: We serve 60 million HTTP requests per day and our site is an important source of advertising revenue, but you may be surprised to know that most of our 35 Web servers are five years old. Although we are gradually adding more modern hardware, most of these older servers continue to work quite well. The new servers are dual processor machines based on dual core Intel Xeons and are obviously far more powerful. But after upgrading to a more recent version of Linux we plan to keep most of the older single processor servers online, except for a few troublesome machines that we will recycle into other areas of our organization.