Meanwhile, new technologies such as multicore processors and increasing parallelism offer promise. "But theres a catch," Barroso said. "Are there enough threads? Can we expect programmers to build efficient/concurrent programs?" Indeed, with more data it is easier to do parallelism. "At Google were interested in problems where theres a truckload of data, so it might be a little easier for us," Barroso said.Google employs what it calls its System Health Infrastructure, which talks to every server in the system frequently and collects health signals and activity information, Barroso said Asked if Google might consider open-sourcing this technology, Barroso said "Weve been looking at open-sourcing some of the code for some time." However, "some of this is infrastructure and we build it so intertwined with other software we have that its hard to pull things apart." In addition, Google uses self-monitoring, analysis and reporting technology, or SMART, to do early detection of problems. And it found that disk drives with scan errors are 10 times more likely to fail than those with no errors, Barroso said. However, the company found that more than half of the drives that failed showed no signals, he said. Indeed, 56 percent had no strong signals at all, he said. "Its fairly easy to predict something if you give a long enough time frame," Barroso said. "I predict were all going to die," he quipped. In addition, Barroso said the Google study found that temperature was not shown to be a significant factor in disk failuresslightly warmer temperatures did not cause any more failures than cooler ones. "If the variability of temperature is not that great then data center designers have a lot more flexibility" in designing more energy-efficient facilities, Barroso said. Check out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.
However, fault-tolerant software is powerful, but it is not enough, Barroso said. Large-scale systems also need additional monitoring.