Time Has Come for Another New System

By Chris Preimesberger  |  Posted 2009-08-17 Print this article Print

The original Google storage file systems have served the company very well; the company's overall performance proves this. But now, in 2009, the continued stratospheric growth of Web, business and personal content and ever-increasing demands to keep order on the Internet mean that Quinlan and his team have had to come up with yet another super-file system.

Although Google folks will not officially sanction this information for general consumption, this overhaul of the Google File System apparently has been undergoing internal testing as part of the company's new Caffeine infrastructure announced earlier this month.

Google on Aug. 10 introduced a new "developer sandbox" for a faster, more accurate search engine and invited the public to test the product and provide feedback about the results. The sandbox site is here; as might be expected, there's also a new storage file system behind it.

"By far the biggest challenge is dealing with the reliability of the system. We're building on top of this really flaky hardware-people have high expectations when they store data at Google and with internal applications," Quinlan said.

"We are operating in a mode where failure is commonplace. The system has to be automated in terms of how to deal with that. We do checksumming up the wazoo to detect errors, and using replication to allow recovery."

Chunks of data, distributed throughout the vast Google system and subsystems, are replicated on different "chunkserver" racks, with triplication default and higher-speed replication relegated for hot spots in the system.

"Keeping three copies gives us reliability to allow us to survive our failure rates," Quinlan said.

Replication enables Google to use the full bandwidth of the cluster, reduces the window of vulnerability and spreads out the recovery load so as not to overburden portions of the system. Google uses the University of Connecticut's Reed-Solomon error correction software in its RAID 6 systems.

Chris Preimesberger Chris Preimesberger was named Editor-in-Chief of Features & Analysis at eWEEK in November 2011. Previously he served eWEEK as Senior Writer, covering a range of IT sectors that include data center systems, cloud computing, storage, virtualization, green IT, e-discovery and IT governance. His blog, Storage Station, is considered a go-to information source. Chris won a national Folio Award for magazine writing in November 2011 for a cover story on Salesforce.com and CEO-founder Marc Benioff, and he has served as a judge for the SIIA Codie Awards since 2005. In previous IT journalism, Chris was a founding editor of both IT Manager's Journal and DevX.com and was managing editor of Software Development magazine. His diverse resume also includes: sportswriter for the Los Angeles Daily News, covering NCAA and NBA basketball, television critic for the Palo Alto Times Tribune, and Sports Information Director at Stanford University. He has served as a correspondent for The Associated Press, covering Stanford and NCAA tournament basketball, since 1983. He has covered a number of major events, including the 1984 Democratic National Convention, a Presidential press conference at the White House in 1993, the Emmy Awards (three times), two Rose Bowls, the Fiesta Bowl, several NCAA men's and women's basketball tournaments, a Formula One Grand Prix auto race, a heavyweight boxing championship bout (Ali vs. Spinks, 1978), and the 1985 Super Bowl. A 1975 graduate of Pepperdine University in Malibu, Calif., Chris has won more than a dozen regional and national awards for his work. He and his wife, Rebecca, have four children and reside in Redwood City, Calif.Follow on Twitter: editingwhiz

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel