How Accurate is Your Transcription Service?
Updated: June 3, 2019
Accuracy is often the most important quality to look for when hiring a video transcription service. If you’re going to pay to outsource your transcription, you deserve an accurate transcript.
Whether you’re a media broadcaster who needs to meet certain FCC standards for accuracy, an educator who needs maximum accuracy for accessibility reasons, or if you simply want to avoid embarrassing caption errors, accuracy matters.
When choosing a transcription vendor, find a company that guarantees a transcript accuracy rate of 99% or higher. Investigate how the transcription service copes with accents, inarticulate speakers, poor audio quality, background noise, and complex vocabulary. Can they still guarantee that level of accuracy despite those challenges?
Automatic Speech Recognition
Automatic speech recognition is a software that converts an audio file into text. ASR is often fast, cheap, but highly inaccurate.
YouTube’s automatic captions use automatic speech recognition alone to create captions for YouTube videos: this is an example of a well-intentioned initiative that has produced some hilariously inaccurate captions.

Typically, automatic speech recognition produces about 60-70% accurate transcripts, which means that 1 out of 3 words are wrong — and when speech recognition is wrong, it’s usually spectacularly wrong (like in the example above).
ASR also doesn’t include speaker identifications or important sound effects. They are often riddled with inconsistencies in spelling and in grammar.
ASR is a good first draft, but it’s important to have human editors review the transcript before finalizing it.
Accuracy and Comprehension
The chart below outlines the propagated implications of accuracy rates from speech recognizers, assuming a range of accuracies, and 8 & 10 word sentences.
You can see how quickly accuracy rates drop as more words are introduced into a sentence. For example, 67% accuracy means 1 out of every 3 words is incorrect. For an 8-word sentence, the likelihood that the recognizer got all 8 words correct is 67%8 ≅ 4%. Similarly, for a 10-word sentence, the likelihood of the recognizer getting all 10 words in a row correct is 67%10 ≅ 2%.
This explains why an accuracy rate of at least 99% is needed to provide an equivalent experience for deaf and hard-of-hearing viewers.
Video Transcription Accuracy Rates | |||
---|---|---|---|
Word-to-Word Accuracy | 1 of x Words Incorrect | 8-Word Sentence Accuracy | 10-Word Sentence Accuracy |
50% | 1 of 2 | 0% | 0% |
67% | 1 of 3 | 4% | 2% |
75% | 1 of 4 | 10% | 6% |
85% | 1 of 7 | 27% | 20% |
90% | 1 of 10 | 43% | 35% |
95% | 1 of 20 | 66% | 60% |
98% | 1 of 50 | 85% | 82% |
99% | 1 of 100 | 92% | 90% |
How Do Transcription Services Measure Their Accuracy Rate?
The industry standard for caption and transcript quality is a 99% accuracy rate. Accuracy measures punctuation, spelling, and grammar.
Many captioning competitors claim they meet a 99% accuracy rate, but in a study we conducted analyzing the accuracy of 8 different files between two vendors, we discovered otherwise.

In reality, the measured accuracy rate between two captioning vendors falls between 84.7% and 94.4%. Throughout every single file we submitted there were numerous spelling errors and inconsistencies.
When we asked one vendor how they measured their accuracy rate, they replied, “We don’t use an exact measurement to determine the 99%.”

So why were there so many inconsistencies?
Well, the process directly impacts the file accuracy. Both competitors cut a single file into smaller segments that are then distributed among a pool of transcribers. Then they piece the file back together and send it to the customer – without quality assurance.
While this process may provide faster turnarounds, it leads to inconsistencies and errors in a single file.
Inaccurate captions mean more work for you, can lead to miscomprehension of content, and can hurt your brand.
Caption Quality Standards
How to Select the Right Closed Captioning Vendor: 10 Crucial Questions to Ask
The FCC released quality standards for closed captioning and transcription of all network and broadcast video, including online distribution of that content. The FCC rules are a helpful guideline for other industries.
Your transcription service should comply with the FCC’s standards for caption accuracy, synchronicity, program completeness, and caption placement. On accuracy, the FCC states, “Captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible.”
Captions must include essential nonverbal information, such as sound effects, music playing, and audience reactions, in order to be considered accurate.
Captions and subtitles should also preserve the tone and intent of the speaker. The ultimate goal: maintain the impact of the original performance as much as possible.
How Accuracy Affects SEO
If YouTube search rank or video SEO is your main objective, then accuracy is critical.
Transcription errors are not uniformly distributed. The most common errors happen with words that are most vital for search: names of products, people, and places, URLs, formulas, technical vocabulary, and acronyms. What this means is that even a slight reduction in accuracy rate (e.g. 98% instead of 99%) makes the content significantly less viable for search.
Keep in mind that using automatic speech recognition alone may register with Google as “automatically-generated gibberish” and could actually harm your SEO efforts.
Speaker Identification and Verbatim vs. Clean Read
Does your video subtitling company provide options for speaker identification? If not, what is the default? If you need something different, will the transcription service follow instructions correctly?
Do they allow you to choose between verbatim and clean read practices for transcription? Most people prefer a “clean read” transcript, where the transcriptionist removes words like “um” or “uh,” as well as stutters and unnecessary filler words that take away from the meaning of the sentence.

Verbatim transcripts capture every utterance that comes out of the speaker’s mouth, including “um,” “uh,” and stutters. They are usually much more frustrating to read and follow than clean read transcripts. However, for scripted television, where stutters are intentional, verbatim transcription is preferred.
Make sure the transcription service can provide you with the transcription style of your choice.
Consistency
Finally, it’s important to assess a vendor’s ability to maintain accuracy and consistency across many files. When testing out vendors, keep in mind that anyone can produce high accuracy for just a few files. Vendors should be tested with a large number of files containing a range of different types of content.
Summary
Your video transcription and captioning vendor should provide you with near-perfect captions. Inaccurate captions could even be detrimental to your accessibility and SEO initiatives.

This blog was originally published on June 19,2015 by Emily Griffin and has since been updated.
Further Reading
Subscribe to the Blog Digest
Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.
By subscribing you agree to our privacy policy.