Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Latest News

      Google Gives Behind-the-Scenes Peek

      Written by

      Darryl K. Taft
      Published March 3, 2005
      Share
      Facebook
      Twitter
      Linkedin

        eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

        BURLINGAME, Calif.—At the EclipseCon 2005 conference here, a leading Google Inc. engineer gave a rare glimpse into the workings of the search powerhouse.

        In a keynote Wednesday titled “A Look Behind the Scenes at Google,” Urs Hoelzle, vice president of engineering, essentially described the companys secret sauce as “the return of batch computing.” Large numbers of cheap hardware, plus networking and intelligent software to support fault tolerance and other key functions, have gone a long way with the Mountain View, Calif., company, he said.

        Hoelzles talk also had a subplot: “the things behind search—how it works and how its organized.”

        Hoelzle described Googles mission as “to organize the worlds information and make it universally accessible and useful.” This mission, he added, “drives a lot of the engineering we do.”

        With the Web consisting of 10 billion pages and the average page going 10 kilobytes, the total size is “many tens of terabytes,” Hoelzle said. Yet, “its very big, but its actually tractable. We need a lot of computers and disks and networking and software.”

        In fact, Google runs its system on commodity hardware running Linux, he said.

        “The underlying hardware is pretty damn cheap, but you have to build it into a system thats scalable,” he said.

        The primary components of the Google system are hardware, networking, distributed systems software, search algorithms, machine learning, information retrieval and user interfaces, Hoelzle said. He added that the hardware environment consists of racks and racks of 88 commodity PCs.

        “These things are cheap, and you can buy them anywhere,” he said. “The problem is these things break. Things break everyday. If you have a thousand of these machines, expect to lose one a day. So you have to deal with that, and you better deal with that in an automated way. You deal with it in software by replication and redundancy.”

        /zimages/4/28571.gifGoogle recently revealed its product formula. Click here to read more.

        Indeed, fault-tolerant software makes cheap hardware practical, Hoelzle said.

        And “sometimes things go very wrong,” he said as he displayed a slide showing three fire trucks parked in front of a Google location. “I cant tell you exactly what happened, but it was not very good, and it was not just one machine going down.”

        Yet, Hoelzle described Googles fault-tolerant solutions as “very robust,” claiming the system “can tolerate massive failures.” The company once lost 1,800 out of 2,000 machines in one environment, he said, but the operation continued to run—a bit slower, but it continued to work with 90 percent of its machines out of operation.

        Google uses an index, similar to a books index, which takes several days on hundreds of machines to compile, Hoelzle said. It has more than 8 billion Web documents and 1.1 billion images.

        Then Google uses its PageRank system for ranking and ordering the Web pages, he said. “Then we split them into pieces called shards, small enough to put on various machines. And we replicate the shards.”

        So an incoming query would hit the Google Web server and then the index server and eventually a document server that contains copies of the Web pages Google downloads.

        Next Page: Managing the system.

        Page 2

        Managing the overall system is the Google File System, which features a master that manages metadata. Data transfers occur directly between clients and chunkservers, files are broken into 64MB chunks, and chunks are triplicated across three machines for safety, Hoelzle said.

        “The machines are cheap and not reliable, so we take our files and put them into chunks and spread them across a few machines and randomly distribute copies,” he said. “So you need to have a master that tells you where the chunks are.” The master will look at one chunkserver, and if it gets no response it assumes it is dead and it seeks out the next one.

        Hoelzle said there are more than 30 clusters at Google, with clusters as large as 2,000 machines to address a petabyte-sized file system.

        /zimages/4/28571.gifA key Windows architect has defected to Google. Read more here.

        “Youd like to be able to write an application that can run on 1,000 machines in parallel,” he said.

        Googles MapReduce framework provides automatic and efficient parallelization, fault tolerance, I/O scheduling and status monitoring.

        “MapReduce basically does a grep over the Web on a thousand machines,” Hoelzle said. Grep is a Unix/Linux function that searches one or more input files for lines containing a match to a specified pattern.

        As far as scheduling, the Google system has one master and many workers, and tasks are assigned to workers dynamically, Hoelzle said. The master assigns each map task to a free worker.

        MapReduce has broad applicability, Hoelzle said. “Its parallel to Eclipse. If you have a good tool that is easy to use, your users come out of the woodwork. In the first year we had hundreds of MapReduce jobs being written. Our production index system is written on top of MapReduce.”

        As a demonstration, Hoelzle produced a diagram of the Google activity around searches of the term “eclipse” over the last few years. The diagram showed three spikes, all in line with the occurrence of a solar eclipse.

        /zimages/4/28571.gifCheck out eWEEK.coms for the latest news, views and analysis on enterprise search technology.

        Darryl K. Taft
        Darryl K. Taft
        Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        MOST POPULAR ARTICLES

        Artificial Intelligence

        9 Best AI 3D Generators You Need...

        Sam Rinko - June 25, 2024 0
        AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
        Read more
        Cloud

        RingCentral Expands Its Collaboration Platform

        Zeus Kerravala - November 22, 2023 0
        RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
        Read more
        Artificial Intelligence

        8 Best AI Data Analytics Software &...

        Aminu Abdullahi - January 18, 2024 0
        Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
        Read more
        Latest News

        Zeus Kerravala on Networking: Multicloud, 5G, and...

        James Maguire - December 16, 2022 0
        I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
        Read more
        Video

        Datadog President Amit Agarwal on Trends in...

        James Maguire - November 11, 2022 0
        I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
        Read more
        Logo

        eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

        Facebook
        Linkedin
        RSS
        Twitter
        Youtube

        Advertisers

        Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

        Advertise with Us

        Menu

        • About eWeek
        • Subscribe to our Newsletter
        • Latest News

        Our Brands

        • Privacy Policy
        • Terms
        • About
        • Contact
        • Advertise
        • Sitemap
        • California – Do Not Sell My Information

        Property of TechnologyAdvice.
        © 2024 TechnologyAdvice. All Rights Reserved

        Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

        ×