Google Gives Behind-the-Scenes Peek
In his EclipseCon keynote, Google VP Urs Hoelzle gives a glimpse into how the search company operates.
BURLINGAME, Calif.At the EclipseCon 2005 conference here, a leading Google Inc. engineer gave a rare glimpse into the workings of the search powerhouse. In a keynote Wednesday titled "A Look Behind the Scenes at Google," Urs Hoelzle, vice president of engineering, essentially described the companys secret sauce as "the return of batch computing." Large numbers of cheap hardware, plus networking and intelligent software to support fault tolerance and other key functions, have gone a long way with the Mountain View, Calif., company, he said. Hoelzles talk also had a subplot: "the things behind searchhow it works and how its organized."Hoelzle described Googles mission as "to organize the worlds information and make it universally accessible and useful." This mission, he added, "drives a lot of the engineering we do."
Google recently revealed its product formula. Click here to read more.
Indeed, fault-tolerant software makes cheap hardware practical, Hoelzle said.
And "sometimes things go very wrong," he said as he displayed a slide showing three fire trucks parked in front of a Google location. "I cant tell you exactly what happened, but it was not very good, and it was not just one machine going down."
Yet, Hoelzle described Googles fault-tolerant solutions as "very robust," claiming the system "can tolerate massive failures." The company once lost 1,800 out of 2,000 machines in one environment, he said, but the operation continued to runa bit slower, but it continued to work with 90 percent of its machines out of operation.
Google uses an index, similar to a books index, which takes several days on hundreds of machines to compile, Hoelzle said. It has more than 8 billion Web documents and 1.1 billion images.
Then Google uses its PageRank system for ranking and ordering the Web pages, he said. "Then we split them into pieces called shards, small enough to put on various machines. And we replicate the shards."
So an incoming query would hit the Google Web server and then the index server and eventually a document server that contains copies of the Web pages Google downloads.
Next Page: Managing the system. 








