Google has added several new features to its Cloud Speech Application Programming Interface (API) for developers seeking to integrate speech recognition capabilities into their Android applications.
The updates add support for long-form audio clips and increase the number of languages for which speech recognition is now available. The goal is to give developers more functionality and control for adding speech recognition to their products and services, Google product manager Dan Aharon said in an announcement on Google’s cloud platform blog.
Google’s Cloud Speech API is a machine-learning powered technology for converting speech to text. Developers and websites can use the API to enable capabilities like voice transcription, audio file transcription, voice-enabled command and control and call center routing in their applications and services.
Google has described the technology as being powered by machine-intelligence. The API uses deep learning neural network algorithms to improve its speech recognition capabilities with repeated use in the same setting. Developers can customize speech recognition to a particular setting or context by including specific phrases or words that might be spoken by users in that setting.
Google has described the API as being capable of streaming text results—or to make the text appear even as someone is speaking. It can also be used to return text from saved audio files. Among the several other capabilities is one that lets developers filter out inappropriate content in spoken language or text.
Google currently offers its Cloud Speech API for free for the first 60-minutes of audio processed. After that the company charges $0.006 for every 15 seconds of processing.
This week’s updates extend the long-form audio support capabilities of the Cloud API. The length of supported audio files has been increased from up to 80 minutes to up to 180 minutes. The Cloud API can support files that are longer than three hours also, but only on a case-by-case basis, Aharon said.
Google has also introduced a word-level timestamp feature, which Aharon said was one of the features that developers had wanted the most in the Cloud Speech API.
The timestamps give users the ability to jump to specific points in a transcript where a piece of text might have been spoken. Or it can be used to display relevant text while the audio clip is playing, Aharon said. The feature can help organizations significantly cut down on the time needed to proofread transcripts and for improving the accuracy of speech-to-text transcription.
With this week’s update Google has also added support for 30 additional languages. The updated Cloud Speech API now supports 119 languages and their variants. Among the new languages that the API supports are Bengali, Latvian and Swahili. According to Aharon, the new language support covers an additional one billion speakers around the world.