Google Dec. 16 launched Ngram Viewer, an experiment to let lay users and researchers search and study the waxing and waning of phrase instances over the last 500 years.
As Google is trying to gain traction selling books, a
free software tool that helps scholars analyze what words and phrases were popular
several centuries ago is wowing researchers and media.
Google Dec. 16 launched Google Books Ngram Viewer, a data
visualization tool that crawls 500 billion words culled from 5.2 million books published
between 1500 and 2008 that Google has indexed in its cloud computing system.
Users may access the tool
here and type up to five words to see a
typical Google graph that counts the words' and phrases' use each year over the
last several hundred years. The words come from books published in Chinese,
English, French, German, Russian and Spanish.
Google in one example
shows how the tool can compare instances of musical instruments in English
literature from 1750 to 2008. Note how the drum and trumpet, in particular,
seemed to trade places in popularity over the last two hundred years.
While any bystander with a computer may access the tool, it
is largely geared to help scholars and researchers studying philosophy, pop
culture, religion, politics, art and language to conduct their research. Google
said it is also making the datasets supporting the Ngram Viewer freely
downloadable so that scholars can replicate the work.
The datasets were used in research project led by Harvard
University's Jean-Baptiste Michel and Erez Lieberman Aiden, along with several
Googlers, said Jon Orwant, Google Books engineering manager.
"Their work provides several examples of how
quantitative methods can provide insights into topics as diverse as the spread
of innovations, the effects of youth and profession on fame, and trends in
censorship," Orwant said.
Unlike most Google Labs projects, media curiosity has
been piqued by Ngram viewer.
The New York Times and
Wall Street Journal spotlighted it, while top tech blogs
such as
ReadWriteWeb dedicated not one but two positive posts to it
here and
here.
The datasets in Ngram viewer constitute merely one-third of
the 15 million works Google has scanned online since 2004 as part of its Google
Books project.
High-octane math and algorithms aside, the work has been
complicated thanks to
contentious battles over copyright, particularly related to orphan works, where
rightholders are deceased or cannot be found.
The court system is still sussing out this matter, though
Google this month went ahead and
launched its eBookstore to compete with
Amazon, Apple and Barnes & Noble in selling books online.