Adventures in Speech Recognition
Updated: January 4, 2018
A few weeks ago, I wrote about the traditional metrics for accuracy in transcription. Every now and then, we’ll run across an egregious mistake from an automatic speech recognizer (ASR) that’s pretty funny.
Check out this one from today. In an interview with a famous biologist, what was supposed to read “RNA catalysis, a wonderful discovery” came out as “naked palaces a wonderful discovered”.
Now I won’t argue the relative merits of RNA catalysis vs. naked palaces – I’m sure they’re both great in their own unique ways – but the first one was a bit more on the context we were looking for in that interview.
I have come to appreciate the strange rhymes that come out of ASR engines, having spent a number of years now sifting through their outputs. Doing so, you get a feel for how a group of sounds isolated from the surrounding words and context can be mistaken for any number of alternatives.
Even more so, I have come to appreciate the ability of the human mind to comprehend sounds into words and meanings – to discern misspoken words, accents and stutters into the intended meaning. We decipher context every day, and not just with 50-cent words that challenge our vocabulary. People say “tuh” instead of “to”. People say “mighta” instead of “might have”. The words “marriage” and “mirage” sound remarkably similar spoken alone, yet our experience and power of deductive reasoning allow us to correctly assess the intended meaning of the speech without any real deep thought.
The technology behind machine recognition, adaptation, and reasoning is fascinating. It is amazing how far we have come in just a few decades of research. It is just as amazing to think how far there is to go, and to dream what innovations may come to fill the remaining voids to true artificial intelligence.
Study Highlights: Implementation of and Solutions for Closed Captioning in US Institutions of Higher Education
Most colleges and universities in the US are legally required to provide closed captioning on many of their videos. Despite the laws, many institutions of higher education struggle to implement closed captioning practices. The national research study, Implementation of and Solutions for…
The Growing Population with Hearing Loss
We are quite the noisy population. Everywhere we go, sound is always surrounding us. Whether it’s police sirens passing by, or music through our headphones on our way to work, there is always background noise. Over 5% of the world’s population –…
3Play Media’s Top 10 Blog Posts of 2017
It’s that time of year again! Looking back on the blog posts from last year that peaked our readers’ interests the most helps everyone imagine what to expect in the new year. We would also like to take this opportunity to thank…