Tzolkin's DNS-based site failover and geographic load balancing services are easy to implement, relatively inexpensive and non-disruptive.
Although many enterprises are reluctant to outsource security,
DNS-based services are a natural for the model: Most companies
outsource their externally facing DNS servers to begin with, and, with
outsourcing, features can be implemented with little disruption to an
existing Web infrastructure and turned on and off on the fly.
I recently tested Tzolkin's TZO-HA high-availability DNS services.
Tzolkin maintains five redundant DNS server sites across North
America and one in London, all connected through five different major
Internet backbones. In addition, the Tzolkin DNS system is proprietary,
and therefore less susceptible to the kinds of attacks that have
disrupted some other DNS servers.
Tzolkin has developed a highly redundant and geographically
dispersed system to monitor Websites at the protocol level--and not
just using a simple ping. Monitoring is performed from multiple
locations around the world, reducing the number of false-positives from
simple routing breaks between one provider and another. As a veteran
site administrator, I can tell you that this really cuts down on the
number of alerts you receive. In my three weeks of testing the
Tzolkin system, I was notified only when something actually went wrong
(and subsequently when the error condition was corrected).
During my extended testing of the TZO-HA services, I found the
browser-based GUI very easy to use, and all configurations worked as
expected and in a timely manner. This is an inexpensive and
non-disruptive way to build a high-availability or geographic load
balancing safety net based on Tzolkin's reliable DNS service. In
fact, it's hard to think of a more relatively risk-free way to
experiment with high-availability options--no additional hardware is
required, and the services will set you back only $99.50 per month to
start. (For more details on pricing, go to
http://autofailover.com/Order/Index.htm.)
TZO-HA Foundation
The foundation for TZO-HA is the ability to maintain very low DNS
cache times. This allows for near real-time traffic
redirection. When TZO-HA detects a failure, it automatically
updates the DNS record for your domain so that the server requests are
sent to the IP address of your alternate server or server cluster.
The maximum time to re-direct server requests is 2.5 minutes but
typically 1 minute, according to company officials. This includes
failure detection, DNS record changes and DNS propagation time through
other DNS servers. In my testing, I saw failover typically take place
within 30 to 90 seconds. Most competitors' solutions require at
least 5 minutes.
One of TZO-HA's big benefits is geographic load balancing, via TZO-GEO.
Geographic load balancing deployed entirely as a service
is--in my opinion, as an old-school Internet data center architect who
used to have to use dedicated hardware for this--pretty darn cool.
With "N" number of participating servers or IP addresses, as
an inbound DNS query comes in to the TZO DNS infrastructure, the source
IP address of the DNS query is traced, then matched to a database of IP
addresses and geographic longitude and latitude definitions. In a
period of milliseconds, TZO-GEO calculates which of the participating
servers is closest to the source IP address.
It's also possible to load balance based on server performance by
measuring the monitoring traffic's round-trip time. Degraded
performance--when the round trip for monitoring traffic exceeds a
specific threshold--is defined through the management GUI,.
This gets interesting because degraded performance can be combined
with geographic load balancing using what TZO calls a VDV (Variable
Distance Vector). When a server reaches a degraded state, the TZO
service artificially "adds distance" to the known location of the
server, thereby reducing the amount of traffic being sent to that
server. This feature allows for servers that enter a degraded state to
still participate in the load-balancing scheme, but in a reduced
capacity.
To test TZO-HA and TZO-GEO, I installed two slightly different
Wordpress blogs on two externally facing and publicly addressable IP
addresses. The blogs had to be slightly different so I could tell
which one served up which request. The way the TZO services work
is that they attempt to load a file of your choice (usually a small,
read-only text file) residing on your Web server from different
locations around the world at configurable intervals. If a server
fails to respond to this request or responds slowly, TZO executes a
pre-configured load-balancing response.
Thus, the first step in implementing the TZO services is to create
and assign the file to be used for monitoring. I used the default
choice of autofailover.html and placed this small text file in the Web
server's root directory. The TZO system then attempted to download
the text file from each of my test servers. If the download failed or
response time was slow, TZO assumed that it was representative of
overall Web server performance and began to apply load balancing and
failover rules.
I configured TZO-HA for a variety of different failover modes for my
two servers. (You can also configure it for three servers.) My
first test involved Failover-Stay over, where Server 1 handles all
requests until it fails, at which point all requests go to Server 2
until Server 2 fails. I also tried Failover-Switch back, a
function similar to Failover-Stay over except that it starts to end
requests to Server 1 immediately after it//Server 1?// comes back up.
There's also a Successive Server Failover mode that I didn't test for installations with three servers.
All of the load-balancing services worked as expected in my
tests. If any of this seems less than straightforward, there is
adequate context-sensitive help available for each setting.
Down and Out
Under the Monitoring tab are settings for how much time must elapse
for a server to be considered down and/or performance degraded. I
set the test interval to 30 seconds. I set the Detect Failing and
Detect Passing intervals to 20 seconds. I found it best to adjust the
settings based on the current load on the servers.
I also configured the service to send notifications of site
conditions to my e-mail address. I chose to be notified of any
site change.
When I unplugged Server 1, I received an e-mail alert in less
than 1 minute. (I received an alert in the same timeframe when I
plugged Server 1 back in.) Other notification options exist for varying
levels of severity, and different notifications can be sent to
different e-mail addresses. With the increase in domain theft. it
is important to note that any changes to DNS records generates a rapid
notification.
Matthew D. Sarrel is executive director of Sarrel Group, an IT test lab, editorial services and consulting firm in New York.
Matthew D. Sarrel, CISSP, is a network security,product development, and technical marketingconsultant based in New York City. He is also a gamereviewer and technical writer. To read his opinions on games please browse http://games.mattsarrel.com and for more general information on Matt, please see http://www.mattsarrel.com.