Speech recognition technology has made advances in the past few years, but has that growth been enough to call it a success?
Early last year I launched the eWEEK Emerging Technology site with a list of 10 emerging technologies that flopped. While the list included some technologies that no one argued with–hi there, Microsoft Bob and the CueCat–there were some choices that elicited more than a few protests.
Speech recognition, in particular, was one technology that drew more than a few cries of foul. Several readers pointed out the importance of speech recognition to people with disabilities, and I received several invitations from speech recognition software leader Nuance to take a fresh look at the current state of speech recognition.
That is exactly what I’ve done, and I have to say that I am impressed in many ways with the current state of speech recognition technology. However, I don’t know if I’m impressed enough to change my perception of it as a technology that hasn’t lived up to its promises–a flop, if you will.
On the plus side, speech recognition is seeing a boom in hype and high profile implementations that hasn’t been seen since its heyday in the late 1990s. Leading the charge are the omnipresent commercials and ads touting the Microsoft Sync feature found in some cars, which makes it possible to control music and other car features with simple voice commands (what the ads don’t say is that the underlying technology for Sync comes from Nuance).
A Commanding Voice
In many ways, command is the top functionality for speech recognition. In most cases, it doesn’t require training and even people who don’t think they want to dictate memos and letters to a computer see the value of being able to use simple voice commands like “call Bob” or “play Icky Thump.”
While I didn’t get a chance to try out speech commands on a Sync-enabled car, I did test it out using Nuance’s Dragon NaturallySpeaking 9.5 and a smart phone with voice command features enabled.
In my tests, voice commands worked well, at least as far as recognizing what I wanted done. On the PC with Dragon installed, I sometimes had to repeat commands, but all in all, it worked.
The voice commands on the phone were both more impressive and more frustrating. Using voice commands to dial numbers is a classic win-win situation and worked well in my tests, making it possible to say “call Jane Morris mobile” and have the phone dial her cell number.
The phone I tested also had Nuance-enabled voice command features that made it possible to do many different tasks, including sending e-mails, doing Web searches, and adding calendar appointments.
This feature worked well when it came to recognition, but was frustrating in delivery. That’s because the actual technology is server-based. This means I would say a command, it would route into the cloud, and then come back to my phone. In tests, most commands took 30 to 45 seconds before delivering a result. When you need to be hands free, this is a necessary inconvenience, but in most other cases it was much too long to wait.
But all in all, I was impressed with voice command capabilities, especially in non-PC areas. On the PC it worked well, though to be honest, the voice command features in my old OS/2 Warp system were nearly as good.
So what about speech recognition? In this area I can unequivocally say that the results and experience were greatly superior to the voice recognition I used several years ago.
One of the biggest improvements was in training. In the old days, training the system to your voice could take hours or even days. In Dragon NaturallySpeaking 9.5, I was done training in about 15 minutes.
After this short training session, the results I had in dictation were pretty good. I did several tests, including some long dictated texts, and the error rate was what I considered acceptable, really not much worse than if I typed a couple of paragraphs without going back to fix errors.
So after using the current generation of speech recognition, I can clearly say that it is much improved and works very well. So why doesn’t that change my disappointed view of speech recognition?
Well, in one area it’s the current reality versus the original promise. In the mid-1990s, many people claimed that speech recognition would take over offices, that you’d walk into a business and everyone would be talking to their computers instead of typing. Sorry, but that isn’t likely to happen anytime soon.
However, an even bigger issue is that, while speech recognition has improved over the past 10 or so years, I don’t think it has improved enough.
If you look back to other technologies from 1997, such as the Web, enterprise applications or mobile phones, the changes have been radical. Compared to these technologies, speech recognition has seen only modest gains.
In part I blame this on the lack of competition. In the 1990s, there were several major companies competing in the area of speech recognition. However, these competitors either failed (some spectacularly, as in the case of Lernout & Hauspie) or turned their attention elsewhere (IBM, I’m looking at you).
While this limited competition has been good for dominant vendors like Nuance, it generally isn’t good for innovation. Nothing spurs a technology area into interesting and rewarding innovations quite like tough competition.
But yes, I’ve heard the calls and, yes, I agree that speech recognition is improved over past capabilities.
However, if the question is do I think that speech recognition will deliver on its original grand promises and become ubiquitous in all forms of computing, then I think the answer is still too garbled to know for sure.