Press Release: 3Play Media Study Reveals Automatic Speech Recognition (ASR) Engines are Fine Tuning After a Year of Massive Improvement

June 20, 2024 BY ELISA LEWIS
Updated: June 24, 2024

June 20, 2024 09:59 AM Eastern Daylight Time

2024 State of ASR Report

BOSTON–(BUSINESS WIRE)–After a year of profound improvement in accuracy, ASR providers are doubling down on improving the accuracy of their solutions and focusing on their differentiation, according to the latest State of ASR report by 3Play Media, the leading media accessibility provider in North America, released today.

“The ASR market continues to evolve and is fiercely competitive. It is clearly reaching a maturation stage in its evolution”

“The ASR market continues to evolve and is fiercely competitive. It is clearly reaching a maturation stage in its evolution,” Josh Miller, co-CEO and co-Founder, 3Play Media, said. “After a year of revolutionary changes in the accuracy of the technology, the 2024 report finds vendors working on their differentiation based on specific use cases and fine-tuning their technologies accordingly.

“This year, it has become clear that not all errors are equal, challenging the standalone metric of accuracy rate. Ultimately, ASR alone is still insufficient for the captioning use case, especially regarding formatting and hallucinations. Human-in-the-loop captioning and transcription workflows remain critical for accuracy, quality, and accessibility.”

The annual study analyzes the general state of speech-to-text technology as it applies to the task of captioning and transcription. In addition to a surge in new advancements, 2023 brought several new players, such as Assembly and Whisper, whose ASR engines rivaled top competitors such as Speechmatics.

The new report investigates errors like hallucinations, where the engine generates incorrect words not present in the input. Whisper, a fast gainer in last year’s study, continues to be a competitive engine, but its hallucinations remain a cause for concern. These hallucinations appear more common than initially believed, and the consequences for accessibility – and ultimately a brand – are profound.

This year’s State of ASR report additionally highlights the need for a more nuanced evaluation framework that considers factors like Word Error Rate (WER), Formatted Error Rate (FER), and the Canadian NER Model. The top engines were found to have different strengths and weaknesses, and each prioritizes differing types of content or styles of transcription.

To obtain a free copy of The 2024 State of ASR report, please visit:

About 3Play Media

3Play Media is an integrated media accessibility platform with patented solutions for closed captioning, transcription, live captioning, audio description, and subtitling. 3Play Media combines machine learning (ML), artificial intelligence (AI), and automatic speech recognition (ASR) with human review to provide innovative, highly accurate services. Customers span multiple industries, including media & entertainment, corporate, e-commerce, fitness, higher education, government, and eLearning.


Media Contact
Phil LeClare
[email protected]

3Play Media logo

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.