By Darryl K. Taft  |  Posted 2007-04-06 Print this article Print

Meanwhile, new technologies such as multicore processors and increasing parallelism offer promise. "But theres a catch," Barroso said. "Are there enough threads? Can we expect programmers to build efficient/concurrent programs?" Indeed, with more data it is easier to do parallelism. "At Google were interested in problems where theres a truckload of data, so it might be a little easier for us," Barroso said.
However, fault-tolerant software is powerful, but it is not enough, Barroso said. Large-scale systems also need additional monitoring.
Google employs what it calls its System Health Infrastructure, which talks to every server in the system frequently and collects health signals and activity information, Barroso said Asked if Google might consider open-sourcing this technology, Barroso said "Weve been looking at open-sourcing some of the code for some time." However, "some of this is infrastructure and we build it so intertwined with other software we have that its hard to pull things apart." In addition, Google uses self-monitoring, analysis and reporting technology, or SMART, to do early detection of problems. And it found that disk drives with scan errors are 10 times more likely to fail than those with no errors, Barroso said. However, the company found that more than half of the drives that failed showed no signals, he said. Indeed, 56 percent had no strong signals at all, he said. "Its fairly easy to predict something if you give a long enough time frame," Barroso said. "I predict were all going to die," he quipped. In addition, Barroso said the Google study found that temperature was not shown to be a significant factor in disk failures—slightly warmer temperatures did not cause any more failures than cooler ones. "If the variability of temperature is not that great then data center designers have a lot more flexibility" in designing more energy-efficient facilities, Barroso said. Check out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.

Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel