In the aftermath of Hurricane Katrina, eWEEK Labs spoke with members of eWEEKs Corporate Partner Advisory Board to determine if the event had changed the scope of their organizations disaster recovery planning.
Participating in the roundtable were Kevin Baradet, chief technology officer at the Johnson Graduate School of Management at Cornell University, in Ithaca, N.Y.; Gary Gunnerson, IT architect at Gannett Company Inc., in McLean, Va.; Tom Miller, director of IT for FoxHollow Technologies Inc., in Redwood City, Calif.; Nelson Ramos, enterprise IT strategist, Sutter Health, in Mather, Calif.; Robert Rosen, CIO at the National Institute of Arthritis and Musculoskeletal and Skin Diseases, in Bethesda, Md.; and Francine Siconolfi, senior project manager at Aetna Inc., in Blue Bell, Pa. eWEEK Technology Editor Peter Coffee moderated the discussion.
We wanted to talk with you about some conversations that we suspect may be occurring at your locations—about whether there are any larger lessons to be learned from the Hurricane Katrina situation. Have there been major blind spots about the real nature of continuity of operations following a regional disruption?
Rosen: After 9/11, we raised the issue that its not just the computer part you have to worry about, its the people part. But even the consultants we talk to today still primarily ignore that. Theyre focused on the technology, but if people cant get there, what do you do?
One of the interesting articles I saw was about an organization that got its people to a backup site, but there was no place for them to stay. People were sleeping on the computer room floor because there was nothing in the area available for them. You really have to look at it as if youre migrating the whole operation, and all of the ancillary things that have to do with it, and people just dont do that.
Some of the organizations that are part of your sphere of operations—Im thinking the Centers for Disease Control and Prevention and the National Institutes of Health, which I guess you talk to quite frequently—probably think more often than most organizations about being in a disrupted environment.
Rosen: We do have some discussion about that. It depends on the amount of backup or readiness they [require].
It was much more obvious when I was at [the U.S. Army Research Laboratory], and we dealt with the [Department of Defense] people. They seemed to do much more—they did a couple of things that people are missing out on. One, they did a very thorough risk assessment, which included looking at all the different possibilities and then figuring out what you had to do if things came to pass. They would do these paper exercises to help them think about [questions such as]: What are all the things we would need? How do we get our people there if theres no transportation? And where do people stay if we do get them there?
Taking New Orleans as an example, even if you got your people to the backup site, are they even going to show up because theyre worried about their families? You at least need to think about these things, and Ive found that people arent, even today.
Health care concerns
Nelson, what does disaster recovery for a health care provider mean?
Ramos: I think this hurricane took everyone by surprise because, in terms of disaster recovery, most of the planning is within a couple of days. The other aspect is, you always thought if there was major damage to the facility—in our case, a hospital—you would probably end up transferring the patients and closing down and having the service picked up by another institution. In this case … health care was being provided in situations where the building wasnt functionally fit for extended periods of time.
You might have to deal with the situation of not only resuming business but also providing information to others. In our case, if were unable to provide medical care, we still have this vast knowledge warehouse of peoples medical records. So, while the organization might not be able to provide its primary function, IT may have a responsibility in taking its knowledge bases and sharing them with those who would be able to take advantage of that knowledge. So business resumption may not totally surround your business; it may mean going to another site and giving access to your data to those who can use it in a more effective way.
During the past few years, health care providers have been very leading-edge adopters of wireless technology, of networks, of the move from physical to digital imaging and so on. Do you have any sense at all if this has made people think that theyve been too quick to replace somewhat more fault-tolerant and gracefully degrading systems with electronic systems? And that there needs to be more thought given to questions such as: Whats your operational scenario when you dont have electricity, when you dont have connectivity and so on?
Ramos: I think the digitization of the data is something that will continue because were moving more toward electronic health records, the ability to move information among multiple campuses.
You certainly have a lot of pressures forcing you to move in that direction.
Ramos: Correct. We used to keep stacks of paper that were 3 to 4 feet high, in case a system went down. Now, with fault-tolerant systems, those huge paper backup systems that were used for short-term downtime scenarios are no longer there. So I think maybe [Katrina] will prompt us to rethink what we do and how we would deal with a medium-term outage.
An education in disaster
Kevin, there has been a lot of interesting discussion about the response of higher education to the Katrina disaster, in terms of the speed and flexibility that schools have shown in developing arrangements for the students of affected schools. Have you been privy to any discussions about how to respond to such a situation?
Baradet: I just saw eight students from Tulane University meander down the hall past my office. I think weve taken in a couple of faculty and Ph.D. students as well.
Has there been any impact on your network as of this time, in terms of new people showing up and having to be rapidly assimilated?
Baradet: Were used to 300 or 400 people showing up within a day or two at the beginning of the semester and having to run them through the system. So eight is nothing.
Over two weeks ago, there were some guys running a network operations center in New Orleans who stayed through it all. They stayed up and stayed on and had a blog and a Web cam and an IRC [Internet Relay Chat] channel open. I watched that for a good part of the weekend.
It was pretty interesting. They were fairly well-prepared, but there were a whole bunch of things that they really didnt think about. They had a huge generator, but [it] was on the ninth floor of the parking structure, and they had to roll 55-gallon drums of diesel fuel through the parking ramp to fuel the generator.
The other thing they didnt think about was a place to sleep, so they were sleeping on server boxes. After a couple of days, they realized they didnt have any clean clothes, and they were also wishing for some chemical toilets. Food and water wasnt the problem—it was the other things.
No news is bad
Gary, how did Katrina affect your publishing operations?
Gunnerson: We do have a newspaper in Hattiesburg, Miss., thats had to suffer through all this stuff.
Gannett takes an interesting view any time we have an emergency. And we have them all the time—the hurricane is just an example of one. We have enough markets out there that we usually have one or two a year. We have a pretty good response, and that includes people. We basically ship people from all over the county to the location that needs them most.
Hattiesburg has a generator; it was mostly designed to run their press, so it wasnt quite up to everything we asked it to do. As a company, we still consider the employees to be most important and tell them, “Go handle your own affairs because thats whats most important to you right now.”
Is that the strategy—to bring people who arent personally affected by the disaster in to take care of at least skeleton functions so local folks can worry about homes and families?
Gunnerson: Thats part of it. The other part is that we have a printing press one state over and trucks that we can bring in and that kind of thing. … But do we plan for extended outages? Not usually. I dont think anyone plans for their generator to be their primary source of power.
As soon as you start talking about weeks, its qualitatively a different thing, isnt it?
Gunnerson: [Before] 9/11, no one ever asked us to completely back everything up out of the geographic area. After 9/11 was the first time people asked us to start thinking about that. Thats not a one-year or a two-year project; thats an eight-year or 10-year project. Weve made strides to go that way. Were not completely where we need to be yet, but weve certainly made a lot of progress.
Do you feel that the decision to do that was a sticky decision—did people continue to be committed to the idea of doing it three and four years after 9/11?
Gunnerson: I think so. What were seeing, though, is that once you say you need to be outside of your geographic area, you start looking at more centralized approaches to some of your infrastructure, or the ability to move some things fairly quickly using the different technologies that we have out there. Were playing around right now with some of the VMware [Inc.] stuff that lets you move a computer system to another physical location. Its pretty neat.
Because you essentially snapshot the system and then move the snapshot?
Gunnerson: Right. You still have the issue with people, though. … In some cases, we can run the systems from our homes, but in the case of a disaster like weve had with [Katrina,] you dont even have homes to go to anymore. You pretty much have to go out of state, and thats a pretty tough thing to plan for.
I dont really know how you plan for that sort of thing. Its just something thats going to happen, and you do your best after the fact to recover from it.
Francine, I would imagine that Aetna has been a little busy lately.
Siconolfi: I think Aetnas been through it enough times—there are a lot of dynamic triggers in the tables that drive the software systems. Some of the things they were doing in our business applications—allowing people to go outside the network for their physician and their medical care, not needing their referrals, things like letting the system refill prescriptions even if theyre not due to be refilled for another 30 days, getting treatment without precertifications and that sort of thing. We removed whatever barriers we could to give people access to the health care they needed.
The systems are capable of turning certain things on and off dynamically. Its taken many years to get it to that point—after, unfortunately, other natural disasters and things in the past—but the folks who maintain all those tables in the systems did what they had to do to allow things to flow through.
Its interesting to hear you go through the number of points at which youd have to have that kind of dynamic capability. Its making me realize how much Ive been taking it for granted that various service providers have demonstrated that kind of flexibility. Trying to do that kind of thing could have put you back in the paper age pretty quickly if you hadnt anticipated it and incorporated it into the architecture.
Siconolfi: There were certainly glitches along the way—especially when it came to delivering things on paper, things by mail, that kind of stuff—but we got through it, and it will continue to be an issue for many months.
Has there been any discussion in your halls about how long you expect to be dealing with people who arent at their regular address, people who have no fixed address and various other speed bumps?
Siconolfi: I started at Aetna just in time to see how the company reacted to 9/11, and Ive never seen any time constraints put on something of this magnitude.
Dispersing digital records
Tom, your company develops medical device technology. Now that youve had a chance to see what a substantial regional-scale event can do, has there been any kind of reanalysis at FoxHollow Technologies of what it means to be prepared for an interruption of operations?
Miller: I think weve really reflected on what the impact could be in some places, like Northern California, where we are. We truly understand that any disruption is probably going to be regional in scope and that any planning we have to do has to consider not only the impact on the entire Bay area but probably the state of California.
Weve looked at things like business-resumption services through companies like SunGard, with its SunGard Availability Services, but the nearest recovery center, in San Ramon, could be impacted by a regional disaster. So, were realizing that we need to push outside the Bay area and even potentially outside California to reflect that any impact is going to be really large for us.
Coming back to the people element, we realize the individuals obligations are to their families first and to their business second.
Now, correct me if Im wrong, but youre in an operation where a substantial amount of your assets are digital records—clinical trial data and so on.
Miller: Yes, thats correct.
Do you have that information in a nice salt mine in Kansas, or is site diversity for backup storage a major priority because of the considerations weve just been discussing?
Miller: We will be looking at diverse sources for storage of information, both electronic and hard copy.
Would it be accurate to say that the need for a very-big-picture view of that has been intensified by this episode?
Miller: Yes it has. For us, it really comes down to: What are we going to do from a risk management standpoint? And how can we make sure that we involve the business in its entirety? As with any business continuity plan, we want to make sure that it doesnt look just at the IT side of the house and that were looking at how can we operate entirely as a business.
Thats actually one of the threads thats been coming back to me from some of the Y2K-readiness discussions that we had. People were saying that the main byproduct was a much wider perspective on the number of other peoples systems that have to be working for there to be any point in having systems up at all.
Have any of you had any epiphanies as part of any discussions in which people have said, “You know, our own boundaries cannot be the point at which we stop talking about readiness.”
Siconolfi: There are service-level agreements, and there are federal regulations for how long you can take to pay a claim, for example. And the way the electronic gateways work, the EDI [electronic data interchange] tools, well, a lot of those things are programmed to meet certain SLAs. Even from systems processing a claim or a membership or whatever it might be. So, yes, were very dependent on third parties and other partners.
So not only do you have to be prepared to relax certain thresholds and guarantees, but you also have to make sure that others will cooperate with you.
Siconolfi: Yes. You can pay fines for not paying a claim within a certain period of time.
Is there anything that is being looked at for immediate implementation, in response to the hurricane and its aftermath?
Rosen: One of the things were looking at is more automated tools to help the systems at the backup site run with less attention so we dont have to get the people there. I think that may be a promising approach to take to address some of these problems.
I guess a lot of it is, What is the maximum credible event? I dont think anyone has an operation scenario that says, “All right, how do we function for one week after a nuclear attack on the city?” But was 20 feet of water in New Orleans a scenario that people should have been prepared for? Whats the worst we can even plan for?
Gunnerson: We try to plan for that stuff, and, on a systems side, you try to make things as resilient as possible—doing your off-site backups, the whole bit.
But I think, when it comes down to it, you really have to have a plan in place and know that you can execute it. And you have to try it out a couple of times. Once youve done that, its just a matter of, Are you prepared to respond? Because youll never be able to run your rule book, right?
I dont think you could ever plan well enough to respond appropriately to every scenario. So what you really have to do is have a group thats prepared to respond and then adjust your response based upon what your issues are. Thats the hardest part—to make sure youre ready to respond.
Check out eWEEK.coms for the latest news, reviews and analysis on IT management from CIOInsight.com.