A bug in a firmware update for Juniper routers may have been responsible for a widespread outage throughout North America and Europe on Nov. 7. The bug caused a network failure at backbone provider Level 3 Communications, resulting in service disruptions for Time Warner Cable, Research In Motion’s BlackBerry services and various Internet service providers based in the United Kingdom, among others.
“There are confirmed reports that this event within Level 3 has affected most of North America,” hosting provider Phyber Communications wrote in a status report Nov. 7, noting that information was “still limited.”
The culprit appears to be a problem with Juniper routers that corrupted Border Gateway Protocol (BGP) tables, according to various reports on Twitter. The BGP bug was in the update for the Junos 10.2 and 10.3 firmware on Juniper routers, several members of the North American Network Operators Group posted on the mailing list.
“This outage has affected other networks running Juniper routers with the majority of them seeing their devices core dump and reload,” wrote Max Clark, managing director of Phyber Communications. A core dump refers to when a device crashes and writes the contents in memory to disk.
The router crash would inevitably disrupt network operations within organizations depending on particular Juniper routers for their Internet operations. Level 3 Communications is a Tier 1 network and one of the key Internet pathways in the United States. When routers went down in Level 3 data centers, other organizations downstream were affected.
“This morning, Juniper learned of a Border Gateway Protocol edge router issue that affected a small percentage of customers,” Mark Bahaus, Juniper’s executive vice president of services, support and operations, said in a statement. “A software fix is available, and we’ve been working with our customers to immediately deploy the fix.”
Bahaus did not provide any details, but a document purporting to be a Juniper advisory posted on the text-sharing site Pastebin claimed the issue was in Juniper’s MX router series.
Level 3 posted an update, explaining its outage was due to failing routers, but declined to name a networking vendor or elaborate on the issue.
“Shortly after 9 a.m. ET today, Level 3’s network experienced several outages across North America and Europe relating to some of the routers on our network,” Level 3 said. “Our technicians worked quickly to bring systems back online. At this time, all connection issues have been resolved, and we are working hard with our equipment vendors to determine the exact cause of the outage and ensure all systems are stable.”
Reports of the outage hit Twitter shortly after the routers went down, as ISPs and hosting providers took to the micro-blogging site to reassure users. RIM assured BlackBerry users that this time it wasn’t to blame for the outage, but that it was “a global Internet issue.”
Hosting provider HostVirtual said the failure at its ISP Level 3 seemed to be “related to a rogue route(s) injected into the BGP table or a Juniper bug.”
“We just had a core sump (sic) on all our Juniper running Junos 10.3R2.11,” NeoTelecoms, a French telecommunications company, wrote on Twitter.