Captioning Accuracy: How to Measure Error Rates
Updated: June 7, 2019
We use automated speech recognition (ASR) software to make many of our daily tasks easier. Maybe your morning routine starts with asking Alexa the weather forecast for the day, or maybe you have Siri transcribe an email or text message for you as you dictate on your way into work, or you turn on your favorite Netflix show at the end of the day by speaking into the TV remote. There have certainly been many exciting developments in speech recognition software over the last few years, and in cases like these, ASR will suffice.
When it comes to captioning, however, we’ve seen how ASR – when used alone – can create some serious fails. Caption quality is incredibly important for accessibility purposes, but for many additional reasons as well such as search. When we talk about captioning quality, we often refer to the accuracy rate. This post will discuss what accuracy rate really is, including how it’s measured, and what might be missing from the equation.
What IS Accuracy?
Captioning accuracy is made up of two different pieces: Formatted Error Rate (FER) and Word Error Rate (WER). FER is the percentage of word errors when formatting elements such as punctuation, grammar, speaker identification, non-speech elements, capitalization, and other notations are taken into account. For closed captioning, all of these formatting requirements are required to achieve at least a 99% accuracy.
Common Causes of ASR Errors:
- Speaker labels
- Relevant non-speech elements
- No [INAUDIBLE] tags
- Multiple speakers
- Overlapping speech
- Background noise
- Poor audio quality
- False starts
- Acoustic errors
- “Function” words
Take A Listen Yourself
When relying solely on ASR technology, the accuracy rates are pretty abysmal. Below is an example of a transcript captured by ASR. Listen closely to the audio and compare what you hear with the words shown on the screen. See if you catch any errors. (Hint: There are several!)
What errors did you notice?
There are certainly a number of incorrect words in this clip. Here are a few that I noticed:
- New England Aquarium shows up as “new wing of the Koran.”
- Forester shows up as “four-story.”
- Because hesitation words are not removed from the transcript, they then spill over into other words, causing many additional inaccuracies.
These errors are acoustic errors, meaning that they may sound ok to the ear, but linguistically don’t make any sense. A human transcriptionist would not make these types of errors.
Another issue I notice right away is a lack of punctuation. In this transcript, there are some glaring punctuation and grammar errors including:
- Very few periods
- Incorrect capitalizations
- A lack of notation for speaker changes and speaker ID’s
These formatting errors greatly contribute to the inaccuracy of the file and make it not only difficult to understand but simply incorrect. Calculating an accuracy rate solely based on the word error rate would be an inaccurate portrayal of the number of issues in the transcript.
At 3Play, we measure our accuracy including punctuation because we understand that incorrect punctuation can change the meaning of language tremendously. (We bet this grandma agrees! ➡)
Incorrect punctuation can also make it difficult to comprehend a file. In the example clip above, it’s hard to follow along, know who is speaking, and what they are even referring to because there are so many formatting errors.
Start adding accurate captions to your videos today!
4 Tips for Online and Remote Fitness Classes
Many fitness brands are turning to remote, online classes for members instead of offering classes solely at physical locations. Members are enjoying this alternative in order to accommodate their new schedules and exercise routines. However, non-members are loving it too, as a…
Overview of NAD v. Harvard and NAD v. MIT Lawsuits
On Thursday, February 5, 2015, the National Association of the Deaf (NAD) filed a federal class-action lawsuit against the Massachusetts Institute of Technology (MIT) and Harvard University for allegedly violating U.S. accessibility laws. Please note that as of February 2020, after years…
Shifting to Online Only Classes? Here Are 3 Tips to Get the Most out of a Virtual Classroom
Many U.S. colleges and universities are cancelling in-person classes in an effort to limit the spread of Coronavirus. As of March 11, sixty three institutions have cancelled in-person classes, and many of these institutions are moving to a virtual classroom to continue…