RIM: Software Upgrade Caused BlackBerry Failure

Updated: The company says the shutdown that stopped e-mail service to BlackBerry users resulted from a software upgrade that went awry.

BlackBerry maker Research In Motion announced late April 19 that it has determined the apparent cause of the shutdown that stopped e-mail service to BlackBerry users throughout North America earlier in the week.

According to a statement from the Waterloo, Ontario-based company, the shutdown on April 17 was related to a software upgrade that went awry, followed by a failover process that also didnt work properly.

The BlackBerry blackout happened when the company introduced a new, noncritical system routine into its database, officials said. The routine, according to RIM, was designed to improve cache optimization but instead caused a series of interaction errors between the databases and the cache.

"After isolating the resulting database problem and unsuccessfully attempting to correct it, RIM began its failover process to a backup system," company officials said in a statement. Officials said that the company had repeatedly tested the failover process successfully, but this time something went wrong.

"The failover process did not fully perform to RIMs expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," officials said in the statement.

The companys statement goes on to say that its analysis continues and that it has identified certain aspects of its testing, monitoring and recovery process that need to be fixed to prevent this from happening again. "RIM apologizes to customers for inconvenience resulting from the service interruption," company officials said in the statement.

Analyst Jack Gold, who is principal analyst at J. Gold Associates in Northborough, Mass, thinks users shouldnt be too surprised at the outage.

"I cant fault them too much because this happens to everyone. Im inclined to cut them a break here because they didnt do anything that they thought would adversely affect the systems," he said. "What it sounded like they were doing was putting some code into the system that would make it more efficient."

Gold said that its clear that their testing didnt work. He said it was also clear that the company needed to make sure it had redundancy that actually worked and that had a reliable failover method.

"They sort of had redundancy. They actually have another NOC running in Europe. The NOC in the UK is set up for Europe. In theory, if the NOC fails, you should be able to flash over to the other NOC. Apparently that didnt work, either."

Gold also said that RIM made a big mistake in how it communicated with users, or more accurately failed to communicate.

"They really need to get better at communicating with their end users so they know whats going on. We can usually deal with it if we know whats going on. For several hours RIM wasnt really very forthcoming. You really need to tell your users what you know and when they plan to do about it," Gold said.

Craig Mathias, principal of Farpoint Group, said that RIM needs to review its procedures. "Mission-critical systems with single points of failure are a problem," Mathias said. "Its very difficult to architect solutions that dont fail, but it can be done. The military does it."

/zimages/2/28571.gifClick here to read more about the recent BlackBerry shutdown and restoration.

Mathias noted that RIM isnt the only e-mail system that shares such problems. "A lot of people use the BlackBerry as their primary e-mail service, and when its down its down," he said. "But thats the problem with e-mail, its loaded with single points of failure."

Mathias said that RIM needs to design a more reliable solution and that the technologies to solve these problems are available. Mathias added that a key rule of business is, "never disappoint a customer. There are a lot of angry people out there."

RIM will probably be forgiven for this BlackBerry Blackout, Mathias said, but added, "If it happens again, thats going to be a problem."

Mathias suggested that any business in which e-mail is a mission-critical application should think twice before depending on a single source.

He also said that there are plenty of e-mail services available that can provide mail to your BlackBerry, but users should always have other ways to receive e-mail if the BlackBerrys have another blackout.

Editors Note: This story was updated to include information and comments from analysts.

/zimages/2/28571.gifCheck out eWEEK.coms for the latest news, reviews and analysis on mobile and wireless computing.

Wayne Rash

Wayne Rash

Wayne Rash is a freelance writer and editor with a 35 year history covering technology. He’s a frequent speaker on business, technology issues and enterprise computing. He covers Washington and...