A Border Gateway Protocol update caused several Juniper Networks routers to crash, causing service disruptions for Tier 1 network provider Level 3 Communications.
A bug in a firmware update for Juniper
routers may have been responsible for a widespread outage throughout North
America and Europe on Nov. 7. The bug caused a network failure at backbone
provider Level 3 Communications, resulting in service disruptions for Time
Warner Cable, Research In Motion's BlackBerry services and various Internet
service providers based in the United Kingdom, among others.
"There are confirmed reports that
this event within Level 3 has affected most of North America," hosting
provider
Phyber Communications wrote in a status report
Nov. 7, noting that information was "still limited."
The culprit appears to be a problem
with Juniper routers that corrupted Border Gateway Protocol (BGP) tables,
according to various reports on Twitter. The BGP bug was in the update for the
Junos 10.2 and 10.3 firmware on Juniper routers, several members of the
North American Network Operators Group posted on
the mailing list.
"This outage has affected other
networks running Juniper routers with the majority of them seeing their devices
core dump and reload," wrote Max Clark, managing director of Phyber
Communications. A core dump refers to when a device crashes and writes the
contents in memory to disk.
The router crash would inevitably
disrupt network operations within organizations depending on particular Juniper
routers for their Internet operations. Level 3 Communications is a Tier 1
network and one of the key Internet pathways in the United States. When routers
went down in Level 3 data centers, other organizations downstream were
affected.
"This morning, Juniper learned of
a Border Gateway Protocol edge router issue that affected a small percentage of
customers," Mark Bahaus, Juniper's executive vice president of services,
support and operations, said in a statement. "A software fix is available,
and we've been working with our customers to immediately deploy the fix."
Bahaus did not provide any details, but
a document purporting to be a
Juniper advisory posted on the text-sharing site
Pastebin claimed the issue was in Juniper's MX router series.
Level 3 posted an update, explaining
its outage was due to failing routers, but declined to name a networking vendor
or elaborate on the issue.
"Shortly after 9 a.m. ET today,
Level 3's network experienced several outages across North America and Europe
relating to some of the routers on our network," Level 3 said. "Our
technicians worked quickly to bring systems back online. At this time, all
connection issues have been resolved, and we are working hard with our
equipment vendors to determine the exact cause of the outage and ensure all
systems are stable."
Reports of the outage hit Twitter
shortly after the routers went down, as ISPs and hosting providers took to the
micro-blogging site to reassure users. RIM assured
BlackBerry users that this time it wasn't to
blame for the outage, but that it was "a global Internet issue."
Hosting provider HostVirtual said the
failure at its ISP Level 3 seemed to be "related to a rogue route(s)
injected into the BGP table or a Juniper bug."
"We just had a core sump (sic) on
all our Juniper running Junos 10.3R2.11," NeoTelecoms, a French
telecommunications company, wrote on Twitter.