Google is pairing its automatic speech recognition technology with its YouTube caption system to offer automatic captions. Auto-caps use the same voice recognition algorithms that power automatic voice mail transcription in Google Voice to generate captions for video on the fly. This a boon for deaf and hearing-impaired users who want to enjoy the millions of videos on the YouTube video sharing service. Google's YouTube team is also launching automatic caption timing, making it much easier for users to create captions manually.
In the latest example of how Google
uses its technology to improve user
experience for its other Web services, the search engine Nov. 19 paired its
automatic speech recognition technology with its YouTube caption system to
offer automatic captions.
Auto-caps use the same voice recognition algorithms that power automatic
voice mail transcription in Google Voice to generate captions for video on the
fly. This a boon for deaf and hearing impaired users who want to enjoy the
millions of videos on the YouTube video sharing service.
After all, while Google introduced captions
for Google Video and YouTube in
2006, about 20 hours of video are uploaded to YouTube every minute. Video
creators don't have the time to add captions that would provide more context
for the video content they wish to share.
"Even with all of the captioning support already available on YouTube,
the majority of user-generated video content online is still inaccessible to
people like me," wrote Google Software Engineer Ken Harrenstien,
who is deaf.
Machine-made captions solve that time and resource dilemma, and they're not
only helpful to the hearing impaired. They also enable users to jump to the
exact parts of the videos they're looking for, and will help people who speak
languages other than English access video content in the 51 languages the Google Translate
service supports. Matt Cutts, a Google search quality
engineer, explains how this works here.
The idea is to expand the number of captioned videos, currently in the
hundreds of thousands, to include more of the service's content. More broadly,
this is core to YouTube's mission to organize the world's videos online and
make them accessible to every user, everywhere.
Google's YouTube team also launched automatic caption timing, making it much
easier for users to create captions manually. Users can create a text file with
all the words in the video and upload it to YouTube.
Google will use its speech recognition technology to figure out when the
words are spoken and create captions for the video. Google explains how the
captions and auto-timing work in this video here.
The technology, rolling out globally for all English-language
videos on YouTube, works best for videos with good sound quality and clear
Auto-caps are currently only visible
on YouTube partner channels
for "UC Berkeley, Stanford, MIT, Yale, UCLA, Duke, UCTV, Columbia, PBS,
National Geographic, Demand Media, UNSW and most Google and YouTube
channels," Harrenstien wrote.
To view automated captions on videos for these sites, users can click on the
menu button at the bottom right of the video player, then click CC and the
arrow to its left, then click the new "Transcribe Audio" button.
Fair warning: The application of the ASR
technology to YouTube videos is extremely rough. In some tests on the YouTube
UC Berkeley site, the technology could not keep up with the speakers and often
provided results that were laughable.
Harrenstien warned: "The captions will not always be perfect ... but
even when they're off, they can still be helpful-and the technology will
continue to improve with time."
Google will gather feedback on auto-captions from
these Websites and roll them out more broadly in the future.