Transcript Alignment Webinar Recap: Answers, Tips and Demonstration
Yesterday our Customer Happiness Manager, Dave Zylber, along with our VP of Research & Development, Roger Zimmerman, conducted a webinar with our sales and development teams on the ins and outs of our automated transcript alignment service. In the webinar summary below, they answer some of the most frequently asked questions regarding transcript alignment and also provide best practices to adhere to when submitting transcripts for alignment.
Who Should Use Automatic Transcript Alignment?
Automatic transcript alignment is a great option for customers who already have transcripts for their video content. After transcripts are synchronized, customers are able to download captions and transcripts in a variety of formats, create translations and multilingual subtitles, as well as utilize the interactive transcript and captions plugins.
What is the Difference between Standard Captioning/Transcription & Automatic Transcript Alignment?
Our transcription and captioning service assumes you do not currently have a transcript or captions. When a video or audio file is uploaded for transcription, our system combines an automated process, called automatic speech recognition or ASR, with human editing to generate a near perfect transcript. The alignment service on the other hand, assumes you have a video and a transcript that is not synchronized, or time stamped to the audio. Our transcript alignment process is 100% automated, matching the words of your transcript to the speech in your video. Since transcript alignment doesn’t go through any human clean-up, it is important that your transcript file is formatted to achieve the best results.
Transcript Alignment Best Practices
The duration of each file submitted for alignment should not exceed 2 hours.
Text Formatting Requirements
.TXT file: All transcripts submitted must be in an unformatted plain text (.TXT) format (no tables, HTML markup, MS Word, etc).
Text and audio congruence: All text in the transcript must correspond to the spoken audio. Text that does not correspond to the spoken audio interferes with the alignment process. It is best to remove all annotations, directions, and other extraneous information. Also, large chunks of text that do not correspond to the spoken audio or, conversely, significant sections of spoken audio that do not correspond to the text will likely throw off the synchronization.
Speaker IDs: Speaker IDs must be ALL CAPITAL LETTERS followed by a colon. For example, SPEAKER 1: or BOBBY:
Non-speech elements: Occasional non-speech elements (e.g. [CLAPPING], [MUSIC PLAYING]) should not cause a problem.
Paragraph breaks: The processing assumes that each and every line-feed sequence in the transcript is intended to be a paragraph break. If this is not your intention, you must manually remove these line-feeds from your input transcript prior to upload.
Drag and drop the transcript: When uploading transcripts directly from your account interface, it is best to drag and drop the .TXT transcript file instead of pasting the text which can introduce encoding errors.
FTP or API is better for large files: For large or numerous alignment submissions, FTP or API are the best ways to upload files.
Audio Quality and Content
Overlapping speakers: Overlapping speakers are difficult because the text cannot accurately show the ordering of words.
Low-quality audio: Usually reflected in the audio difficulty rating for the file, this can interfere with text alignment. See more information on audio difficulty.
Accents: Non-native accents work well as long as the audio quality is good.