How to Extract Intelligent Keywords from a Video Transcript to Boost SEO
Updated: January 4, 2018
There are several known ways to use a video transcript to increase SEO. In past blog posts we discussed how to maximize the impact of a transcript on a web page and how to use an interactive transcript to improve video engagement and retention. We also described how video captions help with YouTube discoverability and viewership.
Another way to leverage video transcripts is to extract keywords and use them for video sitemap tags, YouTube tags, and to tailor on-page SEO. Keywords can also be used to view a transcript in “scan view” mode, which increases the font-size of more important words, as in the example below.
New Approach to Keyword Extraction Using Natural Language Processing
A common, easy strategy to automatically generate keywords from a video transcript is to simply find the words that appear most frequently, while ignoring a known list of common but not-too-interesting words (called “stop words”), such as “and” and “the”. While this is a good starting point, our approach goes much further by using natural language processing to prioritize words based on their linguistic importance in the transcript. Here is how it works:
Step 1 –Determine the Most Important Words Using Heuristics
The first step is to determine which words are more likely to be important to the transcript using several heuristically defined features often used in supervised machine-learning. For example, proper nouns, words near the beginning and end of a document, and unique numbers (such as calendar years) can be given high priority. These results are treated only as suggestions at this point; no hard decisions are made other than to eliminate stop words.
Step 2 – Let the Words Vote
Next, we determine which words are important to the transcript by letting the words themselves “vote” in an algorithm called TextRank1. This algorithm is derived from the same method that Google uses to rank web pages in its search results (called PageRank). Essentially, each unique word has a fixed amount of votes that it can split between other words in the transcript. The computation of word votes is determined by the proximity of words to each other in a sentence. So, for example, in a document that repeatedly mentions “generating keywords”, the words “generate” and “keywords” would vote heavily for each other in addition to receiving votes from surrounding words. Votes between these two keywords would also be counted from similar phrases such as “keyword generation”, “generating suggestions for keywords”, and so on, because the root words “keyword” and “generate” occur close to each other.
Each word is allowed to vote as many times as it gets voted for. This way, words that are voted on by other keywords are more likely to be counted as keywords themselves. This may seem a little circular, and it is. We start with the initial scores computed in Step 1, then use an iterative algorithm to make adjustments until the final result complies with this rule.
We also determine which of the highest-scoring words make up consecutive phrases. So, continuing our example, the phrase “keyword generation” would be counted as a single phrase, while “generating suggestions for keywords” would only count toward the individual keywords and not any larger keyphrase (unless “suggestion” was also a highly-voted keyword).
Step 3 – Give Extra Weight to Unique Words
Lastly, in order to select keywords and keyphrases that are most unique to the transcript, we score the resulting keywords and keyphrases against a reference. The reference is a large collection of English documents that can be restricted to a specific field or category. Phrases that are very common in the reference are voted down relative to their given score, while phrases that are less common in the reference are voted up. This gives higher priority to phrases that are specific to the transcript, and lower priority to general phrases.
Final Word Scores
The final keyword scores tell us how important each word or phrase is to the transcript. After your videos have been processed, you can download keywords and keyphrases and their respective scores by selecting the Cloud transcript format, as shown below.
References – Mihalcea, R., & Tarau, P. (2004, July). TextRank: Bringing order into texts. InProceedings of EMNLP (Vol. 4, No. 4, p. 275).
7 Ways Captions and Transcripts Improve Video SEO
Looking to gain an edge in video SEO? Video SEO is an extension of SEO. It marries the vibrant medium of online video with the on-page textual elements in order to maximize video discoverability and viewership. A study from Cisco predicts that by…
3 Reasons to Caption Your YouTube Videos
72 hours of video are uploaded to YouTube every minute. Video creators love YouTube because of its popularity, ease-of-use, and advanced toolset. The price is right for tapped budgets and very few video platforms can compete with YouTube’s massive audience. The flipside…
How to Add Captions and Audio Description to Vidyard Videos
Vidyard prides itself on going beyond the typical video hosting and management platform, helping businesses connect with more viewers through interactive and personalized video experiences. This idea of humanizing communications and personalizing customer experiences is at Vidyard’s core, with solutions focused around…