How Accurate is Your Transcription & Subtitling Service?

June 19, 2015 BY EMILY GRIFFIN
Updated: January 4, 2018

 How Accurate is Your Transcription & Subtitling Service?

Accuracy is often the most important quality to look for when hiring a video transcription and subtitling service. If you’re going to pay to outsource your transcription, you deserve an accurate transcript.

Whether you’re a media broadcaster who needs to meet certain FCC standards for closed caption accuracy, an educator who needs maximum accuracy for accessibility reasons, or if you simply want to avoid embarrassing caption errors, accuracy matters.

When choosing a captioning vendor, find a company that guarantees a transcript accuracy rate of 99% or higher. Investigate how the transcription company copes with accents, inarticulate speakers, poor audio quality, background noise, and complex vocabulary. Can they still guarantee that level of accuracy despite those challenges?

Automatic Speech Recognition

YouTube’s automatic captions use automatic speech recognition alone to create captions for YouTube videos: this is an example of a well-intentioned initiative that has produced some hilariously inaccurate captions.

Typically, automatic speech recognition produces about 60-70% accurate transcripts, which means that 1 out of 3 words are wrong — and when speech recognition is wrong, it’s usually spectacularly wrong (like in the example above).

Accuracy and Comprehension

The chart below outlines the propagated implications of accuracy rates from speech recognizers, assuming a range of accuracies, and 8 & 10 word sentences. You can see how quickly accuracy rates drop as more words are introduced into a sentence. For example, 67% accuracy means 1 out of every 3 words is incorrect. For an 8-word sentence, the likelihood that the recognizer got all 8 words correct is 67%8 ≅ 4%. Similarly for a 10-word sentence, the likelihood of the recognizer getting all 10 words in a row correct is 67%10 ≅ 2%.

This explains why an accuracy rate of at least 99% is needed to provide an equivalent experience for deaf and hard-of-hearing viewers.

Video Transcription Accuracy Rates
Word-to-Word Accuracy1 of x Words Incorrect8-Word Sentence Accuracy10-Word Sentence Accuracy
50%1 of 20%0%
67%1 of 34%2%
75%1 of 410%6%
85%1 of 727%20%
90%1 of 1043%35%
95%1 of 2066%60%
98%1 of 5085%82%
99%1 of 10092%90%


Caption Quality Standards

The FCC released quality standards for closed captioning of all network and broadcast video, including online distribution of that content. The FCC rules are a helpful guideline for other industries.

Your captioning vendor should comply with the FCC’s standards for caption accuracy, synchronicity, program completeness, and caption placement. On accuracy, the FCC states, “Captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible.”

Captions must include essential nonverbal information, such as sound effects, music playing, and audience reactions, in order to be considered accurate.

Captions and subtitles should also preserve the tone and intent of the speaker. The ultimate goal: maintain the impact of the original performance as much as possible.

How Accuracy Affects SEO

If YouTube search rank or video SEO is your main objective, then accuracy is critical.

Transcription errors are not uniformly distributed. The most common errors happen with words that are most vital for search: names of products, people, and places, URLs, formulas, technical vocabulary, and acronyms. What this means is that even a slight reduction in accuracy rate (e.g. 98% instead of 99%) makes the content significantly less viable for search.

Keep in mind that using automatic speech recognition alone may register with Google as “automatically-generated gibberish” and could actually harm your SEO efforts.

Speaker Identification and Verbatim vs. Clean Read

Does your video subtitling company provide options for speaker identification? If not, what is the default? If you need something different, will the transcription service follow instructions correctly?

Do they allow you to choose between verbatim and clean read practices for transcription? Most people prefer a “clean read” transcript, where the transcriptionist removes words like “um” or “uh,” as well as stutters and unnecessary filler words that take away from the meaning of the sentence.

Verbatim transcripts capture every utterance that comes out of the speaker’s mouth, including “um,” “uh,” and stutters. They are usually much more frustrating to read and follow than clean read transcripts. However, for scripted television, where stutters are intentional, verbatim transcription is preferred.

Make sure the captioning service can provide you with the transcription style of your choice.


Finally, it’s important to assess a vendor’s ability to maintain accuracy and consistency across many files. When testing out vendors, keep in mind that anyone can produce high accuracy for just a few files. Vendors should be tested with a large quantity of files containing a range of different types of content.


Your video transcription and captioning vendor should provide you with near-perfect captions. Inaccurate captions could even be detrimental to your accessibility and SEO initiatives.

Learn more about How to Select the Right Closed Captioning Vendor.

3play media logo in blue

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.