How to Serve Your Users by Going Retro (with Analysis)

When a user complaint comes into the helpdesk at Central DuPage Hospital, the IT staff doesn't start a packet capture - they go back in time. Jack King, IT director at the hospital, discusses how the switch from conventional, real-time network analysis to retrospective network analysis saves him significant time and resources.


Imagine the typical VOIP troubleshooting situation. It might go something like this:

1:00 PM - User complains of sporadic VOIP quality issues.

1:30 PM - Tier-one helpdesk runs preliminary diagnostics, but can't pinpoint the issue.

2:00 PM - The trouble ticket is placed into the queue and escalated for further troubleshooting.

3:00 PM - As a tier-three network engineer, you receive the ticket and begin to dissect the problem - only to see it has stopped for the moment.

How do you proceed?

When Central DuPage Hospital confronted this situation a year ago, we used conventional troubleshooting tools and real-time network analysis. This left us with few options and little information to diagnose the VOIP problem. Our next steps would have been to start a packet capture, speak with users, attempt to replicate the problem, or wait for it to recur. Regardless of our course, it was often a waiting game until we had enough information to diagnose the problem. Only then could real troubleshooting begin. This process might take just a few minutes or a couple of hours.

Faced with major VOIP and wireless implementations, our IT staff looked for a better way to troubleshoot sporadic issues. We began using Retrospective Network Analysis (RNA) tools rather than real-time protocol analyzers to save time and resources. RNA tools function as network surveillance cameras, passively capturing every packet, transaction, and connection for later playback and review. Because all network activity is captured, we can always select and review the time period around the event - rather than waiting for it to recur.

The benefits of employing RNA solutions rather than relying on traditional, real-time analysis are significant:

  • Higher network availability
  • Improved ability to conduct business efficiently and effectively
  • Satisfied network users
  • The ability to validate and provide evidence for compliance and security issues streamlines the enforcement process

There are many RNA devices on the market. Network Instruments and NetScout are two of the larger vendors. In our case, we used Network Instruments' GigaStor to view network problems that occurred days before the complaint reached us. On one occasion, the complaint of a network slowdown came through the CIO with little detail. We only knew that there was an e-mail slowdown, and that it happened a week earlier. GigaStor allowed us to quickly isolate and resolve routing and retransmission issues with the e-mail server. Without it, we would have spent hours trying to replicate the slowdown.

With terabytes of packet-level data captured and saved, RNA devices can be used for compliance violations or security investigations. Viewing such problems in relation to other concurrent network activities allows greater understanding of the situation. It also limits the amount of investigative work necessary to locate and understand the problem.

In contrast with the opening scenario, in which we relied on traditional tools to troubleshoot a VOIP problem, our experiences using RNA have consisted of a more straight-forward path to problem resolution:

1. User reports the problem: With RNA, all packet-level data has been recorded. We obtained the timeframe in which the problem occurred from the user.

2. Select the time period and analyze the problem: We rewound our network to the point in time in which the problem first appeared. Our RNA tool provided Expert Analysis to help pinpoint the exact cause. Through VOIP analysis and data stream reconstruction, we analyzed VOIP quality and reconstructed the call.

3. Resolve the issue: Based upon analysis of VOIP events, we determined that the Quality of Service (QoS) had changed midstream. We then reconfigured the router's QoS settings to correct the problem.

As I have illustrated, the comprehensive functionality of RNA lets our staff spend less time recreating problems and spend more time on proactive planning. In short, reduced downtime, plus faster problem resolution, equals a rapid return on investment.

Jack King is the IT Director for Central DuPage Hospital in Winfield, Ill. Central DuPage Hospital is an independent healthcare network with over 800 physicians and a staff of 4,000, and operates the second-busiest surgical center in Illinois.