[Report] So, What’s the Current State of Automatic Speech Recognition?

October 29, 2019 BY ELISA LEWIS
Updated: April 16, 2021

We often get the question, “When will automatic speech recognition technology be good enough to replace humans?” The answer really depends on the particular use case. While the current state of technology may work for Siri and Alexa – when it comes to captioning and transcription, human editing is still critical to accuracy. Our 2019 State of Automatic Captioning report explains the ins and outs of why that is.

About the Report

In order to closely follow trends with captioning accuracy, and because ASR is such a critical part of our process at 3Play Media, we are constantly testing to make sure we are using the best automatic speech recognition (ASR) engine. Our results, which investigate the current state of ASR technology with specific regard to captioning accuracy, will be published annually in the State of ASR report.

Our research tested the most popular ASR technologies across content from eCommerce, higher education, fitness, media and entertainment, and enterprise industries. All testing used real content, and lots of it, reflective of the most common type and volume that we receive at 3Play Media.


Key Findings

Some of the best Automatic Speech Recognition systems can achieve accuracy rates in the ‘80s and low ’90s if all conditions align perfectly. These accuracy levels are sufficient for certain applications, such as with personal assistants, where there are a limited number of inputs and outputs. However, when it comes to captioning and transcription, there will need to be some very fundamental advances in machine learning in order to replicate professional human editors.

3Play Media will continue to monitor the landscape for improvement in these technologies, and share those results in our annual report.

Discover more findings in the full 2019 report.

