Google Automates Captions in YouTube for Accessibility

By Clint Boulton  |  Posted 2009-11-20

In the latest example of how Google uses its technology to improve user experience for its other Web services, the search engine Nov. 19 paired its automatic speech recognition technology with its YouTube caption system to offer automatic captions.

Auto-caps use the same voice recognition algorithms that power automatic voice mail transcription in Google Voice to generate captions for video on the fly. This a boon for deaf and hearing impaired users who want to enjoy the millions of videos on the YouTube video sharing service.

After all, while Google introduced captions for Google Video and YouTube in 2006, about 20 hours of video are uploaded to YouTube every minute. Video creators don't have the time to add captions that would provide more context for the video content they wish to share.

"Even with all of the captioning support already available on YouTube, the majority of user-generated video content online is still inaccessible to people like me," wrote Google Software Engineer Ken Harrenstien, who is deaf.

Machine-made captions solve that time and resource dilemma, and they're not only helpful to the hearing impaired. They also enable users to jump to the exact parts of the videos they're looking for, and will help people who speak languages other than English access video content in the 51 languages the Google Translate service supports. Matt Cutts, a Google search quality engineer, explains how this works here.

The idea is to expand the number of captioned videos, currently in the hundreds of thousands, to include more of the service's content. More broadly, this is core to YouTube's mission to organize the world's videos online and make them accessible to every user, everywhere.

Google's YouTube team also launched automatic caption timing, making it much easier for users to create captions manually. Users can create a text file with all the words in the video and upload it to YouTube.

Google will use its speech recognition technology to figure out when the words are spoken and create captions for the video. Google explains how the captions and auto-timing work in this video here. The technology, rolling out globally for all English-language videos on YouTube, works best for videos with good sound quality and clear spoken English.

Auto-caps are currently only visible on YouTube partner channels for "UC Berkeley, Stanford, MIT, Yale, UCLA, Duke, UCTV, Columbia, PBS, National Geographic, Demand Media, UNSW and most Google and YouTube channels," Harrenstien wrote.

To view automated captions on videos for these sites, users can click on the menu button at the bottom right of the video player, then click CC and the arrow to its left, then click the new "Transcribe Audio" button.

Fair warning: The application of the ASR technology to YouTube videos is extremely rough. In some tests on the YouTube UC Berkeley site, the technology could not keep up with the speakers and often provided results that were laughable.

Harrenstien warned: "The captions will not always be perfect ... but even when they're off, they can still be helpful-and the technology will continue to improve with time."

Google will gather feedback on auto-captions from these Websites and roll them out more broadly in the future.

Rocket Fuel