A quartet of glitches in IBMs DB2 database software tripped up Denmarks largest bank in mid-March, Danske Bank Group reported in a release [pdf] sent out on Thursday.
According to the release, the outage had “significant” consequences that involved payments and the trading and settlement of currencies and securities. The outage started March 10 as a routine operation to replace a defective electrical unit in an IBM RVA (Ramac Virtual Array) disk storage system. The RVA disk and the data stored on it became inaccessible after the breakdown, thus crippling the part of the systems—currency, security trading, foreign systems and payments—running at an operating center in Ejby. Automatic teller machines running at the banks operating center in Brabrand were unaffected.
According to the bank, once the problematic RVA unit was given an all-clear sign—about six hours after the initial failure—the Ejby systems were restarted. But on Tuesday, March 11, it became clear that batch runs were running incorrectly.
The problem lay in a series of errors in DB2 that Danske Bank and IBM officials said were unknown but which had existed in all similar installations since 1997.
Over the course of recovery—which was completed on Friday, March 14—three more errors revealed themselves. The second error involved an inability to start the recovery process on several DB2 tables. The third DB2 error prevented recovery jobs from running simultaneously. Both the second and third errors delayed recovery.
The fourth error kept recovery jobs from re-establishing data in tables. According to the bank, this last error meant that new batches of inconsistent data had to be recreated using alternative methods, again prolonging and complicating the recovery process.
By Thursday, March 13, the bank restarted several online systems, including the currency and securities trading system and the foreign systems. Data was fully recovered by the morning of Friday, March 14.
According to the bank, data was never at risk for being lost, since it could be recovered from the Brabrand operating center or from DB2 logs stored on backup systems.
IBM Spokesman Ari Fishkind called the DB2-centric chain of events that followed unique, in that they involved a power outage to an older IBM storage system. Fishkind said that no other customer, to IBMs knowledge, has reported a similar problem. “The chain of events that led up to this problem were really unusual,” said Fishkind, in Somers, N.Y. “They probably couldnt be replicated if people wanted to.”
IBM has created patches for the errors. The patches are being distributed by IBM customer service representatives who are handling customers on a one-on-one basis, Fishkind said. “We want to give that handholding experience to people who are understandably concerned about it but who should be reassured that its not going to happen again,” he said.
Latest IBM News:
Search for more stories by Lisa Vaas.