Captioning Accuracy: How to Measure Error Rates

April 11, 2019 BY ELISA LEWIS
Updated: April 22, 2021

We use automated speech recognition (ASR) software to make many of our daily tasks easier. Maybe your morning routine starts with asking Alexa the weather forecast for the day, having Siri transcribe an email or text for you as you dictate on your way into work, or turning on your favorite Netflix show at the end of the day by speaking into the TV remote. In cases like these, ASR will suffice – but when it comes to closed captioning, we’ve seen how ASR (when used alone) can decrease overall captioning accuracy and create some serious fails.

Caption quality is incredibly important for accessibility purposes, but for many additional reasons as well, including search functionality. When we talk about captioning quality, we often refer to the accuracy rate. This post will discuss what captioning accuracy rate really is–including how it’s measured and what might be missing from the equation.

What IS Captioning Accuracy?

Captioning accuracy is made up of two different pieces: Formatted Error Rate (FER) and Word Error Rate (WER). FER is the percentage of word errors when formatting elements such as punctuation, grammar, speaker identification, non-speech elements, capitalization, and other notations are taken into account. For closed captioning, all of these formatting requirements are required to achieve at least a 99% accuracy.

Common Causes of Captioning Accuracy Errors:

Formatting Errors

Speaker labels
Punctuation
Grammar
Numbers
Relevant non-speech elements
No [INAUDIBLE] tags

Word Errors

Multiple speakers
Overlapping speech
Background noise
Poor audio quality
False starts
Acoustic errors
“Function” words

Take A Listen Yourself

When relying solely on ASR technology, the accuracy rates are pretty abysmal. Below is an example of a transcript captured by ASR. Listen closely to the audio and compare what you hear with the words shown on the screen. See if you catch any errors. (Hint: There are several!)

Notice any captioning accuracy errors?

ear

There are certainly a number of incorrect words in this clip. Here are a few we found:

New England Aquarium shows up as “new wing of the Koran.”
Forester shows up as “four-story.”
Because hesitation words are not removed from the transcript, they then spill over into other words, causing many additional inaccuracies.

These errors are acoustic errors, meaning that they may sound ok to the ear, but linguistically don’t make any sense. A human transcriptionist would not make these types of errors.

Another immediately noticeable issue is a lack of punctuation. In this transcript, there are some glaring punctuation and grammar errors including:

Very few periods
Incorrect capitalizations
A lack of notation for speaker changes and speaker ID’s

These formatting errors greatly contribute to the inaccuracy of the file and make it not only difficult to understand but simply incorrect. Calculating captioning accuracy rates solely based on the word error rate would be an inaccurate portrayal of the number of issues in the transcript.

Punctuation Matters!

At 3Play, we measure our captioning accuracy including punctuation because we understand that incorrect punctuation can change the meaning of language tremendously.

We bet this grandma agrees! ➡️

Incorrect punctuation can also make it difficult to comprehend a file. In the example clip above, it’s hard to follow along, know who is speaking, and what they are even referring to because there are so many formatting errors.

Add accurate captions to your videos today!

Get started with accurate captions today, click to learn more!

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

by Jena Wallace in Industry Trends

3Play’s Patent Playbook blog series tells the stories behind our patented technology. Learn how 3Play Media’s Research and Development (R&D) teams are spearheading innovation in accessibility tech and creating breakthroughs in the media accessibility industry at large. Captioning Best Practices for Media…

March 22, 2024

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

by Jena Wallace in Industry Trends

Human-In-The-Loop AI Dubbing Artificial intelligence (AI)-based dubbing solutions are emerging as a way to make videos accessible globally. But can AI deliver the quality necessary? Enter human-in-the-loop (HITL), a critical part of any successful AI dubbing workflow. In this blog, we will…

Updated April 4, 2024

New Apple Podcasts Transcripts Are Changing the Way Users Consume Podcasts

by Rebecca Klein in Industry Trends

Attention, all podcast creators! Get ready for a shift in podcast accessibility and listener engagement. Apple is taking an industry-defying leap forward with transcript support in the next iOS update, set to be released in March 2024. Apple will provide automatically generated…

February 9, 2024

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.

Product

Why 3Play?

Learn

Company

Further Reading

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

New Apple Podcasts Transcripts Are Changing the Way Users Consume Podcasts

Subscribe to the Blog Digest