FAA Issues Post-Mortem Report on Flight-Plan System Failure

 
 
By Chris Preimesberger  |  Posted 2009-11-23 Email Print this article Print
 
 
 
 
 
 
 

The culprit of the FAA 4-hour flight-plan system failure was eventually determined to be a routing error in the software configuration inside a telecom router link at the FAA's Salt Lake City data distribution hub, pushing the router offline. Here is a detailed point-by-point timeline, supplied to eWEEK by the FAA and the Professional Aviation Safety Specialists union.

The Federal Aviation Administration's national flight-plan filing system went down for 4 hours on the morning of Nov. 19, disrupting the takeoffs of hundreds of commercial flights and throwing hundreds of thousands of travelers off schedule.

The culprit was eventually determined to be a routing error in the software configuration inside a telecom router link at the FAA's Salt Lake City data distribution hub, pushing the router offline.

The faulty router, which for reasons not yet established was not able to default to a backup, also shut down a second major system node in Hampton, Ga., effectively bringing to a halt the inputting of flight plans filed by U.S. commercial pilots. Commercial aircraft cannot take off from a U.S. airport without filing a flight plan.

The glitch forced hundreds of pilots flying that day to enter their plans manually via e-mail or by faxing them into the system, causing widespread flight cancellations and delays.

Here is a detailed point-by-point timeline, supplied to eWEEK by the FAA, telecom system maintainer Harris and the Professional Aviation Safety Specialists union, an affiliate of AFL-CIO that represents about 11,000 FAA technicians.

  • Harris, maintainer of the FAA telecom network, installed a replacement router during planned maintenance release.
  • Replacement router contained a route error in its software configuration.
  • This router error caused route translation errors in the logical router "bridge" between the IP ATM network and the OP IP backbone at the FAA's Salt Lake City hub.
  • The problem effectively blocked three-quarters of the IP routes over both networks.
  • Problem was detected immediately (started MR for router replacement at 5 a.m. EST; problem detected at 5:08 a.m. EST)
  • Isolation of the problem was complex, driven by a number of failures:
    • Another MR at Herndon, Va.
    • CPU utilization alarms did not trigger alerts.
    • Problem looked like a drop of routes in all of the backbone routers; suspected software problem in the routers.
    • CPU utilization on sample routers looked normal, so routing problem was not suspected at first.
Below is the timeline of events:
 

  • 10:00 GMT - Router install begun and "autonomous" route changes begun
  • 10:08 GMT - Start of outage, multiple calls to PNOCC from FAA
  • 10:08 GMT to 13:13 GMT - Isolation underway to determine root cause
  • 13:13 GMT -  Router CPU utilization and IP engineering reset
  • 13:13 GMT to 13:59 GMT - FTI field tech in route and IP engineering constant reset of router
  • Services are up and data is flowing between each reset
  • 13:59 GMT - FTI field tech removes router card. Services restore and returned to service
  • 14:17 GMT - Tier 3 sites (13 sites) report services down
  • 14:38 GMT - All URET services restored
  • 15:08 GMT  - ZHN CERAP all services restored
  • 15:30 GMT - ZLC Tier 3 sites (13 sites) services restored
 
"This problem could have been mitigated many hours sooner had FAA specialists maintained the system," PASS spokesman Church Siragusa told eWEEK.

"At large facilities, FAA specialists are on duty 24/7. No one understands the intricacies and inter-relationship of NAS systems better than FAA specialists. We are trained to understand this, and we have an intimate knowledge due to our maintenance efforts daily."

Harris Issues a Statement

Harris spokesperson Marc Raimondi told eWEEK that people should keep in mind that weather conditions cause most flight delays, and that the FTI system used by the FAA has a very good performance record. "Five nines-maybe even nine nines of efficiency," Raimondi said.

Raimondi issued the following statement from Harris: "We're working with the FAA to evaluate the interruption in order to prevent similar outages in the future. FTI has proven to be one of the most reliable and secure communications networks operating within the civilian government. Safety and security are the highest priorities."


 
 
 
 
Chris Preimesberger Chris Preimesberger was named Editor-in-Chief of Features & Analysis at eWEEK in November 2011. Previously he served eWEEK as Senior Writer, covering a range of IT sectors that include data center systems, cloud computing, storage, virtualization, green IT, e-discovery and IT governance. His blog, Storage Station, is considered a go-to information source. Chris won a national Folio Award for magazine writing in November 2011 for a cover story on Salesforce.com and CEO-founder Marc Benioff, and he has served as a judge for the SIIA Codie Awards since 2005. In previous IT journalism, Chris was a founding editor of both IT Manager's Journal and DevX.com and was managing editor of Software Development magazine. His diverse resume also includes: sportswriter for the Los Angeles Daily News, covering NCAA and NBA basketball, television critic for the Palo Alto Times Tribune, and Sports Information Director at Stanford University. He has served as a correspondent for The Associated Press, covering Stanford and NCAA tournament basketball, since 1983. He has covered a number of major events, including the 1984 Democratic National Convention, a Presidential press conference at the White House in 1993, the Emmy Awards (three times), two Rose Bowls, the Fiesta Bowl, several NCAA men's and women's basketball tournaments, a Formula One Grand Prix auto race, a heavyweight boxing championship bout (Ali vs. Spinks, 1978), and the 1985 Super Bowl. A 1975 graduate of Pepperdine University in Malibu, Calif., Chris has won more than a dozen regional and national awards for his work. He and his wife, Rebecca, have four children and reside in Redwood City, Calif.Follow on Twitter: editingwhiz
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel