Although many enterprises are reluctant to outsource security, DNS-based services are a natural for the model: Most companies outsource their externally facing DNS servers to begin with, and, with outsourcing, features can be implemented with little disruption to an existing Web infrastructure and turned on and off on the fly.
I recently tested Tzolkin’s TZO-HA high-availability DNS services.
Tzolkin maintains five redundant DNS server sites across North America and one in London, all connected through five different major Internet backbones. In addition, the Tzolkin DNS system is proprietary, and therefore less susceptible to the kinds of attacks that have disrupted some other DNS servers.
Tzolkin has developed a highly redundant and geographically dispersed system to monitor Websites at the protocol level–and not just using a simple ping. Monitoring is performed from multiple locations around the world, reducing the number of false-positives from simple routing breaks between one provider and another. As a veteran site administrator, I can tell you that this really cuts down on the number of alerts you receive. In my three weeks of testing the Tzolkin system, I was notified only when something actually went wrong (and subsequently when the error condition was corrected).
During my extended testing of the TZO-HA services, I found the browser-based GUI very easy to use, and all configurations worked as expected and in a timely manner. This is an inexpensive and non-disruptive way to build a high-availability or geographic load balancing safety net based on Tzolkin’s reliable DNS service. In fact, it’s hard to think of a more relatively risk-free way to experiment with high-availability options–no additional hardware is required, and the services will set you back only $99.50 per month to start. (For more details on pricing, go to http://autofailover.com/Order/Index.htm.)
TZO-HA Foundation
The foundation for TZO-HA is the ability to maintain very low DNS cache times. This allows for near real-time traffic redirection. When TZO-HA detects a failure, it automatically updates the DNS record for your domain so that the server requests are sent to the IP address of your alternate server or server cluster.
The maximum time to re-direct server requests is 2.5 minutes but typically 1 minute, according to company officials. This includes failure detection, DNS record changes and DNS propagation time through other DNS servers. In my testing, I saw failover typically take place within 30 to 90 seconds. Most competitors’ solutions require at least 5 minutes.
One of TZO-HA’s big benefits is geographic load balancing, via TZO-GEO.
Geographic load balancing deployed entirely as a service is–in my opinion, as an old-school Internet data center architect who used to have to use dedicated hardware for this–pretty darn cool.
With “N” number of participating servers or IP addresses, as an inbound DNS query comes in to the TZO DNS infrastructure, the source IP address of the DNS query is traced, then matched to a database of IP addresses and geographic longitude and latitude definitions. In a period of milliseconds, TZO-GEO calculates which of the participating servers is closest to the source IP address.
It’s also possible to load balance based on server performance by measuring the monitoring traffic’s round-trip time. Degraded performance–when the round trip for monitoring traffic exceeds a specific threshold–is defined through the management GUI,.
This gets interesting because degraded performance can be combined with geographic load balancing using what TZO calls a VDV (Variable Distance Vector). When a server reaches a degraded state, the TZO service artificially “adds distance” to the known location of the server, thereby reducing the amount of traffic being sent to that server. This feature allows for servers that enter a degraded state to still participate in the load-balancing scheme, but in a reduced capacity.
To test TZO-HA and TZO-GEO, I installed two slightly different WordPress blogs on two externally facing and publicly addressable IP addresses. The blogs had to be slightly different so I could tell which one served up which request. The way the TZO services work is that they attempt to load a file of your choice (usually a small, read-only text file) residing on your Web server from different locations around the world at configurable intervals. If a server fails to respond to this request or responds slowly, TZO executes a pre-configured load-balancing response.
Thus, the first step in implementing the TZO services is to create and assign the file to be used for monitoring. I used the default choice of autofailover.html and placed this small text file in the Web server’s root directory. The TZO system then attempted to download the text file from each of my test servers. If the download failed or response time was slow, TZO assumed that it was representative of overall Web server performance and began to apply load balancing and failover rules.
I configured TZO-HA for a variety of different failover modes for my two servers. (You can also configure it for three servers.) My first test involved Failover-Stay over, where Server 1 handles all requests until it fails, at which point all requests go to Server 2 until Server 2 fails. I also tried Failover-Switch back, a function similar to Failover-Stay over except that it starts to end requests to Server 1 immediately after it//Server 1?// comes back up.
There’s also a Successive Server Failover mode that I didn’t test for installations with three servers.
All of the load-balancing services worked as expected in my tests. If any of this seems less than straightforward, there is adequate context-sensitive help available for each setting.
Down and Out
Under the Monitoring tab are settings for how much time must elapse for a server to be considered down and/or performance degraded. I set the test interval to 30 seconds. I set the Detect Failing and Detect Passing intervals to 20 seconds. I found it best to adjust the settings based on the current load on the servers.
I also configured the service to send notifications of site conditions to my e-mail address. I chose to be notified of any site change.
When I unplugged Server 1, I received an e-mail alert in less than 1 minute. (I received an alert in the same timeframe when I plugged Server 1 back in.) Other notification options exist for varying levels of severity, and different notifications can be sent to different e-mail addresses. With the increase in domain theft. it is important to note that any changes to DNS records generates a rapid notification.
Matthew D. Sarrel is executive director of Sarrel Group, an IT test lab, editorial services and consulting firm in New York.