Microsoft today announced the addition of the new Custom Speech Service to the company’s Cognitive Services toolkit and collection of APIs that developers can use to build machine-learning and artificial-intelligence applications.
Custom Speech Service, currently available in preview, addresses some of the shortcomings of voice-recognition systems, like noisy environments and different speaking styles. It enables developers to create custom language models that adapt a user’s speech to an application’s vocabulary and acoustic models that specify environments or the number of users an application is expected to accommodate.
“Beneath the hood, the Custom Speech Service leverages an algorithm that shifts Microsoft’s existing speech recognizer to the developer-supplied data,” explained Microsoft Research writer John Roach, in a Feb. 7 blog post. “By starting from models that have been trained on massive troves of data, the amount of application-specific data required is greatly reduced. In cases where the developer’s data is insufficient, the recognizer falls back on the existing models.”
Meanwhile, Custom Speech Service’s acoustic modeling capabilities can help make speech-enabled computing in factories and other noisy environments a reality. Developers can tune their models to not only pick out speech amid the clatter of machinery, but also prioritize jargon associated with a specific working environment.
The company also announced today that two other cognitive offerings, Content Moderator and Bing Speech API, will be generally available in March.
Developers can use Content Moderator to detect profanity in more than 100 text languages in addition to custom lists. As a security-enhancing perk, it can also spot phishing URLs, personally identifiable information and check for malware.
Content Moderator can also be used to analyze images for unwanted or offensive content or catch adult content in videos. The included review tool allows customers to add a degree of human oversight and curation.
Developers can enlist the Bing Speech API to turn spoken audio into text in real-time or from within a media file. Conversely, it can convert text to speech, paving the way for apps that talk back. It can also be used to create voice-enabled apps that activate when users speak a command aloud.
The technology’s Speech Intent Recognition capability, powered by Microsoft Language Understanding Intelligent Service (LUIS) service, allows for apps that can decipher spoken commands and take the appropriate action.
Of course, Microsoft isn’t the only IT heavyweight making waves in the cognitive computing space.
Last month, IBM revealed it had filed a record 8,088 patents in 2016, the first time any organization was awarded more than 8,000 U.S. patents in a single year. Over 2,000 of those patents involve artificial intelligence, cognitive computing and the cloud.
In terms of hardware, Big Blue unveiled new all-flash storage systems specifically built for cognitive workloads. Aimed at enterprises, the IBM DS8888F analytics-class storage system features 2TB of DRAM-based cache and up to 1.22 PB of flash storage capacity to make short work of machine learning and predictive analytics.