Research In Motion pointed the finger Feb. 12 at a recent upgrade to an internal data routing system as the culprit for the 3-hour service outage that hit U.S. BlackBerry users the previous day.
RIM said in a statement, “The upgrade was part of RIM’s routine and ongoing efforts to increase overall capacity for longer-term growth. RIM continuously increases the capacity of its infrastructure in advance of longer-term demand.”
The statement also said similar upgrades had been implemented in the past with no service interruptions for users. RIM stressed that its analysis was only preliminary and that it was continuing the diagnosis. According to the company, a full analysis of the service outage will take several more days.
“RIM continues to focus on providing industry-leading reliability in its products and services and continues to invest in its infrastructure and processes,” the company said in its Feb. 12 statement. “Once again, RIM apologizes to its customers for any inconvenience.”
The service outage began Feb. 11 at approximately 3:30 p.m. ET, and lasted until about 6:30 p.m. RIM said only e-mail was affected by the outage and that no SMS (Short Message Service) or voice services were disrupted.
“No messages were lost and message queues began to be cleared after normal service levels were restored,” the statement said. “RIM has made significant investments to improve its system recovery infrastructure and processes over the last year, which enabled service levels to return to normal quickly.”
In April 2007, RIM also suffered a large service outage. The company blamed that outage on the failure of a minor software upgrade to a caching subsystem, a subsequent breakdown of the failover system and the overloading of a second system.
“It shouldn’t have happened, and it won’t happen again,” RIM Co-CEO Jim Balsillie told eWEEK after the April outage. “It wasn’t a corruption of any form of the infrastructure, and that’s very important. It shouldn’t have happened, and it won’t happen again.”
Explaining that the problem that caused the April blackout was totally avoidable, Balsillie said the company was broadening, strengthening and “fault-tolerating” the system. “It’s a global and public safety imperative,” he said, adding that there were no constraints on budget or resources for this work.