Plans & Pricing Get Started Login

How Accurate is Your Transcription & Subtitling Service?

  •  How Accurate is Your Transcription & Subtitling Service?

    Accuracy is often the most important quality to look for when hiring a video transcription and subtitling service. If you’re going to pay to outsource your transcription, you deserve an accurate transcript.

    Whether you’re a media broadcaster who needs to meet certain FCC standards for closed caption accuracy, an educator who needs maximum accuracy for accessibility reasons, or if you simply want to avoid embarrassing caption errors, accuracy matters.

    When choosing a captioning vendor, find a company that guarantees a transcript accuracy rate of 99% or higher. Investigate how the transcription company copes with accents, inarticulate speakers, poor audio quality, background noise, and complex vocabulary. Can they still guarantee that level of accuracy despite those challenges?

    Automatic Speech Recognition

    YouTube’s automatic captions use automatic speech recognition alone to create captions for YouTube videos: this is an example of a well-intentioned initiative that has produced some hilariously inaccurate captions.

    Typically, automatic speech recognition produces about 60-70% accurate transcripts, which means that 1 out of 3 words are wrong — and when speech recognition is wrong, it’s usually spectacularly wrong (like in the example above).

    Accuracy and Comprehension

    The chart below outlines the propagated implications of accuracy rates from speech recognizers, assuming a range of accuracies, and 8 & 10 word sentences. You can see how quickly accuracy rates drop as more words are introduced into a sentence. For example, 67% accuracy means 1 out of every 3 words is incorrect. For an 8-word sentence, the likelihood that the recognizer got all 8 words correct is 67%8 ≅ 4%. Similarly for a 10-word sentence, the likelihood of the recognizer getting all 10 words in a row correct is 67%10 ≅ 2%.

    This explains why an accuracy rate of at least 99% is needed to provide an equivalent experience for deaf and hard-of-hearing viewers.

    Video Transcription Accuracy Rates
    Word-to-Word Accuracy 1 of x Words Incorrect 8-Word Sentence Accuracy 10-Word Sentence Accuracy
    50% 1 of 2 0% 0%
    67% 1 of 3 4% 2%
    75% 1 of 4 10% 6%
    85% 1 of 7 27% 20%
    90% 1 of 10 43% 35%
    95% 1 of 20 66% 60%
    98% 1 of 50 85% 82%
    99% 1 of 100 92% 90%

    Caption Quality Standards

    The FCC released quality standards for closed captioning of all network and broadcast video, including online distribution of that content. The FCC rules are a helpful guideline for other industries.

    Your captioning vendor should comply with the FCC’s standards for caption accuracy, synchronicity, program completeness, and caption placement. On accuracy, the FCC states, “Captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible.”

    Captions must include essential nonverbal information, such as sound effects, music playing, and audience reactions, in order to be considered accurate.

    Captions and subtitles should also preserve the tone and intent of the speaker. The ultimate goal: maintain the impact of the original performance as much as possible.

    How Accuracy Affects SEO

    If YouTube search rank or video SEO is your main objective, then accuracy is critical.

    Transcription errors are not uniformly distributed. The most common errors happen with words that are most vital for search: names of products, people, and places, URLs, formulas, technical vocabulary, and acronyms. What this means is that even a slight reduction in accuracy rate (e.g. 98% instead of 99%) makes the content significantly less viable for search.

    Keep in mind that using automatic speech recognition alone may register with Google as “automatically-generated gibberish” and could actually harm your SEO efforts.

    Speaker Identification and Verbatim vs. Clean Read

    Does your video subtitling company provide options for speaker identification? If not, what is the default? If you need something different, will the transcription service follow instructions correctly?

    Do they allow you to choose between verbatim and clean read practices for transcription? Most people prefer a “clean read” transcript, where the transcriptionist removes words like “um” or “uh,” as well as stutters and unnecessary filler words that take away from the meaning of the sentence.

    Verbatim transcripts capture every utterance that comes out of the speaker’s mouth, including “um,” “uh,” and stutters. They are usually much more frustrating to read and follow than clean read transcripts. However, for scripted television, where stutters are intentional, verbatim transcription is preferred.

    Make sure the captioning service can provide you with the transcription style of your choice.


    Finally, it’s important to assess a vendor’s ability to maintain accuracy and consistency across many files. When testing out vendors, keep in mind that anyone can produce high accuracy for just a few files. Vendors should be tested with a large quantity of files containing a range of different types of content.


    Your video transcription and captioning vendor should provide you with near-perfect captions. Inaccurate captions could even be detrimental to your accessibility and SEO initiatives.

    Learn more about How to Select the Right Closed Captioning Vendor.

4 Responses to How Accurate is Your Transcription & Subtitling Service?

  1. sam scholz says:

    Video transcription is big, legal wise, and a hot buzz topic.. not many people do on the fly transcripts.. if my video provider made transcription available that would be great..

    Clearly this is a space that is being targeted and pitched by a few hope to be major vendors..

  2. ALL of our business and client videos on our site and YouTube were captioned by 3PlayMedia, out of respect for and desire to appropriately reach ADA and ESL audiences while driving SEO.

    You owe it to your audiences to do this, as TED.com discovered as their viewership and SEO exploded when they added interactive transcripts and subtitles in 40+ languages – EXPLODED!

    YouTube’s automatic captions are, in a word, embarrassing. Another word: disservice. Counterproductive.

  3. Michael Tuccillo says:

    Looking for a tool that automatically removes non verbal words from audio/video. Doing by hand now. Is there a better automated way to do this?

    • Emily Griffin says:

      What sort of non-verbal words? Do mean things like ‘um’ and ‘ah’? If so, then our default style of transcription, “clean-read” (as opposed to verbatim), does just that.

Leave a Reply

Your email address will not be published. Required fields are marked *

Interested in Learning More?