3Play Media Study Finds AI Innovation Has Led to Significant Improvements in ASR

May 3, 2023 BY REBECCA KLEIN

BOSTON, Mass., May 3, 2023 – ASR technology has never been as accurate as it is today thanks to advances in artificial intelligence (AI), according to a report from 3Play Media, the leading media accessibility provider, released today. The annual State of ASR study analyzes the general state of speech-to-text technology as it applies to the task of captioning and transcription.

According to the study, in which the company tested speech recognition with ten relevant ASR engines, the accuracy of the technology has improved measurably since the company’s last evaluation in 2022. As ASR improves, it’s important to understand which engine is best for different use cases. Some nuances to consider include performance on different error types, transcription styles, formatting, and industry-specific content.

“The advances in AI we’ve seen across industries have also had an impact on ASR,” Chris Antunes, co-CEO and co-Founder, 3Play Media, said. “Longtime industry leader Speechmatics and newer entrants AssemblyAI and Whisper performed at the top of the pack, with each excelling in different areas. This proves that not all engines are created equal – the training material and models matter – and that there is room at the top for multiple engines to specialize in different use cases.”

Accuracy is the key component in captioning for several reasons, most importantly ensuring that individuals who are deaf or hard of hearing and rely on captions as an accommodation receive information that fully depicts the original content. For captions to be accessible and legally compliant, they need to be 99% accurate, the industry requirement for accessibility. While there was improvement across industry leaders, the study found that even the best engines performed well below 99% accuracy, indicating a continued need for human revision.

This report measures accuracy against two measurements, Word Error Rate (WER) and Formatted Error Rate (FER). While WER is used as the standard measure of transcription accuracy, FER takes into account formatting, sound effects, grammar, and punctuation and is a better representation of the experienced accuracy of captioning. Accuracy in FER is harder to achieve, and even the best-tested engines were only 82% accurate, whereas the best-tested engines in WER were 93% accurate.

Additionally, the study identified a new type of error. Hallucinations are the tendency to generate text that has no basis in the audio. The State of ASR report found evidence of hallucinations in the Whisper transcriptions, often occurring when the topic shifted. Some of the hallucinations were significant and could pose issues for the captioning use case in particular. However, hallucinations seemed rare and did not prevent Whisper from performing competitively.

To download the report, please visit: https://go.3playmedia.com/rs-2023-asr

About 3Play Media

3Play Media is an integrated media accessibility platform with patented solutions for closed captioning, transcription, live captioning, audio description, and subtitling. 3Play Media combines machine learning (ML) and automatic speech recognition (ASR) with human review to provide innovative, highly accurate services. Customers span multiple industries, including media & entertainment, corporate, ecommerce, fitness, higher education, government, and elearning.

Media Contact

Phil LeClare

[email protected]

617-209-9406

www.3playmedia.com

@3playmedia

Busting the Myths of Human-Managed ASR Captions

by Jena Wallace in Video Accessibility

Tuned ASR: How 3Play is Advancing Live Automatic Speech Recognition for Closed Captions [Blog] Have you heard the buzz about our human-managed automatic speech recognition (ASR) solution, Tuned ASR? Tuned ASR is an exciting development in 3Play Media’s suite of live captioning…

Updated October 18, 2023

3Play Media Introduces Virtual Caption Encoding Solution to Simplify Live Captioning Workflows

by Rebecca Klein in Press Release

Solution eliminates the need for additional live captioning hardware, delivering high-accuracy and low-latency captions to streaming platforms BOSTON, March 29, 2023__3Play Media, the leading video accessibility provider, announced today the availability of its live virtual caption encoding solution. The solution, which streamlines…

Updated August 15, 2023

Press Release: 3Play Media Reinvents Live Captioning with the First Solution to Provide Professional Live Captioning and Auto Captioning Failover

by Elisa Lewis in Product Updates

Modern approach leverages professional and auto captioning to deliver highly accurate and reliable real-time captions BOSTON, Mass., January 12, 2022 __ 3Play Media, the leading video accessibility provider, announced today the availability of Live Professional Captioning, becoming the first captioning platform to…

Updated January 11, 2022

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.

Product

Why 3Play?

Learn

Company

Further Reading

Busting the Myths of Human-Managed ASR Captions

3Play Media Introduces Virtual Caption Encoding Solution to Simplify Live Captioning Workflows

Press Release: 3Play Media Reinvents Live Captioning with the First Solution to Provide Professional Live Captioning and Auto Captioning Failover

Subscribe to the Blog Digest