Plans & Pricing Get Started Login

Are Automatically Generated Captions and Transcripts Detrimental to Video SEO?

  • When you upload a video to YouTube, you’ve probably noticed by now that it automatically adds what they call “YouTube automatic captions” to your video. There’s a reason why Google and YouTube don’t index these captions: namely, because the accuracy level is too low. YouTube automatic captions – and any transcripts generated by automatic speech recognition (ASR) – will typically provide you with about 60-70% accuracy, which means that 1 in 3 words are wrong. This accuracy rate can be improved with good audio quality and simple content, but worsens when there is background noise, accents, or multi-syllable words.

    As we said, Google and YouTube don’t index YouTube’s automatic captions. However, there has been a trend of people downloading, renaming, and re-uploading the automatic captions to their YouTube videos so that search engines will recognize the captions as user-uploaded and begin to index them. On non-YouTube videos, users may similarly upload ASR-generated transcripts thinking that it will help their video SEO. After all, even if the transcript isn’t that accurate, it will add density to your keyword strategy, right?

    We’re going to go over why that’s not necessarily true, and why it might even be detrimental to your video SEO strategy (and to accessibility) to use ASR transcripts.

    Automatic Speech Recognition Is Detrimental to Video SEO

    Google’s Definition of Pure Spam

    One reason why automatic captions might be detrimental to your video SEO strategy is because Google could mark your page as spam. “Pure spam” is the definition Google gives to websites that have displayed what they deem to be the most offensive spam tactics. Google marks a website as pure spam if the “site appears to use aggressive spam techniques such as automatically generated gibberish, cloaking, scraping content from other websites, and/or repeated or egregious violations of Google’s Webmaster Guidelines.”

    The critical part of this definition is the phrase “automatically generated gibberish.” Because about 1 in 3 words in an automatically generated transcript are wrong, and because when ASR is wrong it’s usually spectacularly wrong, if you are using automatically generated transcripts on your website you are risking the likelihood of Google identifying your transcripts as automatically generated gibberish.

    Google deems pure spam to be so egregious that it has a live feed of removed sites

    If Google labels your site as pure spam, there is very little you can do to redeem your standing. Chances of recovery, even after addressing the issue, are slim, as the violation is so massive in Google’s eyes that they are unlikely to grant re-inclusion. Google in fact deems pure spam to be such an egregious violation that it has a live feed of sites recently removed from appearing in search results.

    If ASR produces low accuracy for your transcripts, take this as a warning: it’s not worth it! Publishing inaccurate transcripts could be a vast blow to your video SEO, let alone to your site.

    Will Automatic Captions Make You Rank for the Wrong Terms?

    If Google hasn’t marked your video page or your site as spam, it’s probably because your ASR is good enough to get through its spam filters. If you have good audio quality, if your content is relatively simple, and if your speaker has a clear voice without an accent, the ASR might have managed a higher accuracy rate. However, that doesn’t mean that your transcript is going to help you rank.

    Automatic speech recognition is never going to be perfect. Even if you tweak your content to get a very high ASR accuracy rate – say around 80-90% – it’s likely that the words that are wrong are the most important words in your transcript. The whole point of video SEO is to give Google additional, relevant information to better understand the content of your video. If the transcript is inaccurate, the terms you are ranking for will be inaccurate, as well.

    Are Automatic Captions Good Enough for Accessibility Requirements?

    Another thing to be aware of is that automatic speech recognition does not provide a high enough accuracy rate to be considered compliant with accessibility requirements. If your content is subject to the rules of the ADA, CVAA, Rehabilitation Act, or FCC regulations, then you will need more than ASR to provide accessible captions and transcripts.

    The FCC’s quality standards for captioning state, “In order to be accurate, captions must match the spoken words in the dialogue, in their original language (English or Spanish), to the fullest extent possible.” The standards go on to say that captions must use proper spelling, spacing between words, capitalization, and punctuation – and that in order for captions to be accurate, they must convey the tone and intent of the content. It is obvious, if you have ever seen ASR-generated captions, that they do not properly convey the tone or intent of the content, nor do they display accurate spelling, punctuation, or wording.

    While the ADA and Rehabilitation Act do not specifically point to accuracy standards for captioning, they both require that captioning should ensure effective communication. Accurate captions (of at least a 99% accuracy rate) are the only way to ensure that those who are deaf or hard of hearing can understand video content: in fact, most accessibility advocates would argue that ASR-generated captions are actually detrimental to accessibility, just as they are to SEO.

2 Responses to Are Automatically Generated Captions and Transcripts Detrimental to Video SEO?

  1. James Rae says:

    Not sure where you got the 60/70% caption accuracy rate. As a Deaf person that is the biggest load of crap. Test it out your self. Turn the sound off and use the captions only.

    • Emily Griffin says:

      The lack of accuracy in automatic captioning is definitely frustrating, and even Google (the original source of our stat) admits that it’s not good enough. The 60-70% accuracy rate is an average. For some videos with one speaker talking clearly and slowly with high quality audio, captions are more accurate. But if there are overlapping speakers, background noise or music, mumbling speakers, speakers with a thick accent, etc., then accuracy is far below 60%.

      Check out this (captioned) video of an interview with YouTube regarding their autocaptioning accuracy: http://www.3playmedia.com/resources/video-highlights/#video-tutorial-1/48

Leave a Reply

Your email address will not be published. Required fields are marked *

Interested in Learning More?