How Accurate is Your Transcription Service?

August 22, 2018 BY SOFIA ENAMORADO
Updated: June 3, 2019

arrow sticking out of a dart board

Accuracy is often the most important quality to look for when hiring a video transcription service. If you’re going to pay to outsource your transcription, you deserve an accurate transcript.

Whether you’re a media broadcaster who needs to meet certain FCC standards for accuracy, an educator who needs maximum accuracy for accessibility reasons, or if you simply want to avoid embarrassing caption errors, accuracy matters.

When choosing a transcription vendor, find a company that guarantees a transcript accuracy rate of 99% or higher. Investigate how the transcription service copes with accents, inarticulate speakers, poor audio quality, background noise, and complex vocabulary. Can they still guarantee that level of accuracy despite those challenges?

Automatic Speech Recognition

Automatic speech recognition is a software that converts an audio file into text. ASR is often fast, cheap, but highly inaccurate.

YouTube’s automatic captions use automatic speech recognition alone to create captions for YouTube videos: this is an example of a well-intentioned initiative that has produced some hilariously inaccurate captions.

YouTube ASR video captions read: celebrate your tiny victories last thursday i also like in general. the actual should be Now put the stir in stir fry. You guys can't see, but I'm lightly browning the chicken on either side. I feel so fancy, and also so fulfilled.


Typically, automatic speech recognition produces about 60-70% accurate transcripts, which means that 1 out of 3 words are wrong — and when speech recognition is wrong, it’s usually spectacularly wrong (like in the example above).

ASR also doesn’t include speaker identifications or important sound effects. They are often riddled with inconsistencies in spelling and in grammar.

ASR is a good first draft, but it’s important to have human editors review the transcript before finalizing it.

Accuracy and Comprehension

The chart below outlines the propagated implications of accuracy rates from speech recognizers, assuming a range of accuracies, and 8 & 10 word sentences.

You can see how quickly accuracy rates drop as more words are introduced into a sentence. For example, 67% accuracy means 1 out of every 3 words is incorrect. For an 8-word sentence, the likelihood that the recognizer got all 8 words correct is 67%8 ≅ 4%. Similarly, for a 10-word sentence, the likelihood of the recognizer getting all 10 words in a row correct is 67%10 ≅ 2%.

This explains why an accuracy rate of at least 99% is needed to provide an equivalent experience for deaf and hard-of-hearing viewers.

Video Transcription Accuracy Rates
Word-to-Word Accuracy 1 of x Words Incorrect 8-Word Sentence Accuracy 10-Word Sentence Accuracy
50% 1 of 2 0% 0%
67% 1 of 3 4% 2%
75% 1 of 4 10% 6%
85% 1 of 7 27% 20%
90% 1 of 10 43% 35%
95% 1 of 20 66% 60%
98% 1 of 50 85% 82%
99% 1 of 100 92% 90%


How Do Transcription Services Measure Their Accuracy Rate?

The industry standard for caption and transcript quality is a 99% accuracy rate. Accuracy measures punctuation, spelling, and grammar.

Many captioning competitors claim they meet a 99% accuracy rate, but in a study we conducted analyzing the accuracy of 8 different files between two vendors, we discovered otherwise.

graph demonstrating accuracy rate. vendor accuracy rate ranges from 84.7% to 94.4%. industry standard is 99%, and 3play media measured accuracy rate is 99.6%"

In reality, the measured accuracy rate between two captioning vendors falls between 84.7% and 94.4%. Throughout every single file we submitted there were numerous spelling errors and inconsistencies.

When we asked one vendor how they measured their accuracy rate, they replied, “We don’t use an exact measurement to determine the 99%.”

We uncovered numerous spelling errors and inconsistencies throughout every single file we submitted. Here's a taste of what we saw: Competitors Transcript Examples "Then, my favor poet E.E. Cummings came along" versus correct "Then, my favorite poet E.E. Cummings came along" competitor "Bob Nice is a co-inventor of integrated circuits" and correct ""Bob Noyce is a co-inventor of integrated circuits." competitor ...expand the coup of influence in more militant ways. and correct ...expand the scope of influence in more militant ways.


So why were there so many inconsistencies?

Well, the process directly impacts the file accuracy. Both competitors cut a single file into smaller segments that are then distributed among a pool of transcribers. Then they piece the file back together and send it to the customer – without quality assurance.

While this process may provide faster turnarounds, it leads to inconsistencies and errors in a single file.

Inaccurate captions mean more work for you, can lead to miscomprehension of content, and can hurt your brand.

Caption Quality Standards

looking for a captioning vendor? Download the guide: how to select a captioning vendor
How to Select the Right Closed Captioning Vendor: 10 Crucial Questions to Ask

The FCC released quality standards for closed captioning and transcription of all network and broadcast video, including online distribution of that content. The FCC rules are a helpful guideline for other industries.

Your transcription service should comply with the FCC’s standards for caption accuracy, synchronicity, program completeness, and caption placement. On accuracy, the FCC states, “Captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible.”

Captions must include essential nonverbal information, such as sound effects, music playing, and audience reactions, in order to be considered accurate.

Captions and subtitles should also preserve the tone and intent of the speaker. The ultimate goal: maintain the impact of the original performance as much as possible.

How Accuracy Affects SEO

If YouTube search rank or video SEO is your main objective, then accuracy is critical.

Transcription errors are not uniformly distributed. The most common errors happen with words that are most vital for search: names of products, people, and places, URLs, formulas, technical vocabulary, and acronyms. What this means is that even a slight reduction in accuracy rate (e.g. 98% instead of 99%) makes the content significantly less viable for search.

Keep in mind that using automatic speech recognition alone may register with Google as “automatically-generated gibberish” and could actually harm your SEO efforts.

Speaker Identification and Verbatim vs. Clean Read

Does your video subtitling company provide options for speaker identification? If not, what is the default? If you need something different, will the transcription service follow instructions correctly?

Do they allow you to choose between verbatim and clean read practices for transcription? Most people prefer a “clean read” transcript, where the transcriptionist removes words like “um” or “uh,” as well as stutters and unnecessary filler words that take away from the meaning of the sentence.

elements of a good transcript include speaker IDs, non speech elements, audio description, on screen text and a summary of the object


Verbatim transcripts capture every utterance that comes out of the speaker’s mouth, including “um,” “uh,” and stutters. They are usually much more frustrating to read and follow than clean read transcripts. However, for scripted television, where stutters are intentional, verbatim transcription is preferred.

Make sure the transcription service can provide you with the transcription style of your choice.


Finally, it’s important to assess a vendor’s ability to maintain accuracy and consistency across many files. When testing out vendors, keep in mind that anyone can produce high accuracy for just a few files. Vendors should be tested with a large number of files containing a range of different types of content.


Your video transcription and captioning vendor should provide you with near-perfect captions. Inaccurate captions could even be detrimental to your accessibility and SEO initiatives.

get started with transcription

This blog was originally published on June 19,2015 by Emily Griffin and has since been updated.

3play media logo in blue

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.