Plans & Pricing Get Started Login

« Return to video

Quick Start to Captioning [Transcript]

LILY BOND: Welcome, everyone, and thank you for joining this webinar entitled “Quick Start to Captioning.” My name is Lily Bond, and I’m the Marketing Manager here at 3Play Media. My presentation will last about 20 minutes, and we’ll save approximately 10 minutes at the end for questions.

So before we get started, I have a quick poll question for everyone. So the poll should read, “How do you foresee your captioning needs changing in the next year?” And you can select increasing significantly, increasing moderately, staying the same, or decreasing. So it looks like most people are expecting some increase, which is consistent with trends in captioning, for sure.

So I’m quickly going to go over an agenda, and then we’ll dive right in. I’m going to begin with some captioning basics, go over the benefits of captioning, go through the applicable accessibility laws, go through some of our captioning services and tools, and then, again, we’ll have about 10 minutes for Q&A at the end. And as always, feel free to type those into the window throughout the presentation.

So I’ll take it from the beginning. What are captions? Captions are text that has been time-synchronized with the media so that it can be read while you’re watching a video. And captions assume that the viewer cannot hear the audio at all, so the objective here is not only to convey the spoken content but also to convey any sound effects, speaker identification, and other non-speech elements.

Basically, the objective is to convey any sound that’s not visually apparent but that is integral to the plot. For example, you would definitely want to include the sound effects “keys jangling” if you hear the sound behind a locked door because it’s important to the plot development that someone is trying to get in. But you wouldn’t include it if it’s the sound of keys jangling in someone’s pocket while they’re walking down the street.

Captions originated in the early 1980s as a result of an FCC mandate specifically for broadcast television. Now, as online video becomes more and more an everyday part of our lives, the need for web captions has greatly expanded, and it continues to expand. As a result, captions are being applied across many different types of devices and media, especially as people become more aware of the benefits, and as laws become increasingly more stringent.

So first, let’s go through some terminology to make sure that we’re all on the same page. The difference between captions and a transcript is that a transcript is not synchronized with the media. On the other hand, captions are time-coded so that they can be displayed at the right time while watching a video. For online media, transcripts are sufficient for audio only content, but captions are required anytime there’s a video component.

So the distinction between captions and subtitles is that captions assume that the viewer cannot hear, whereas subtitles assume that the viewer can hear but cannot understand the language. So that’s why captions include all relevant sound effects. Subtitles are really more about translating the content into a language that the viewer can understand.

The difference between closed and open captions is that closed captions allow the end user to turn the captions on and off. In contrast, open captions are burned into the video and cannot be turned off. With online video, you’ll usually see closed captions.

Post production versus real-time refers to the timing of when the captioning is done. Real-time captioning is done by live stenographers, whereas post production captioning is done offline and takes a few days. And there are advantages and disadvantages to each of these.

There are a lot of different caption formats that are used with specific media players. On the left, you’ll see a list of some of the more common caption formats and where you might need to use them. And then the image on the top right shows what a typical SRT caption file looks like. That’s the kind of caption file you would want to use for a YouTube video player, for example. And you can see that it has three caption frames, and each caption frame has a start time and an end time, followed by the text that appears in that time frame. And at the bottom right is an SCC file, which uses hex frames.

So once a caption file is created, it needs to be associated with the corresponding video file. The way to do that depends on the type of media and video platform that you’re using. For sites like YouTube, all you have to do is upload the caption file for each video, and we call that a sidecar file. In other cases, like for iTunes, you actually need to encode the caption file onto the video.

And then another way to associate captions with the video is with open captions. I mentioned these when I talked about caption terminology, but again, those are burned directly into the video and cannot be turned off. And if you’re using one of the video platforms that we are partnered with such as Brightcove, Mediasite, Kaltura, or Ooyala, then this step becomes trivial because it all happens automatically.

So let’s talk about some of the benefits of captioning. The primary purpose of captions, and transcripts, obviously, is to provide accessibility for people who are hard of hearing. 48 million Americans experience hearing loss, and closed captions are the best way to make media content accessible to them. Outside of accessibility, though, people have discovered a number of other benefits to closed captioning.

So one of those benefits is that closed captions provide better comprehension to everyone. The Office of Communications in the UK conducted a study where they found that 80% of people who were using closed captions were not actually deaf or hard of hearing. So closed captions really provide increased comprehension in cases where the speaker has an accent, if the content is difficult to understand, if there’s background noise, or the viewer knows English as a second language. And captions also provide flexibility to view videos in noise-sensitive environments like offices, libraries, or gyms.

Captions also provide a strong basis for video search. And there are certain plug-ins that we offer that can make your videos searchable. People are used to being able to search for a term and go directly to that point, and that’s what our interactive transcript lets viewers do within a video. And I’ll go through that more later on.

For people who are interested in SEO– which is Search Engine Optimization– closed captions provided a text alternative for spoken content. Because search engines like Google can’t watch a video, this text is the only way for them to correctly index your videos. So Discovery Digital Networks did a study to see the impact of captions on their SEO, and they actually found that adding captions to their YouTube videos increased their views by 7.3%.

Another benefit of captions and transcripts is their reusability. University of Wisconsin found that 50% of their students were actually repurposing video transcripts as study guides, so they make a lot of sense for education. You can also take a transcript from a video and use it to quickly create infographics, white papers, case studies, and other docs.

So of course, once you have a caption file in English, you can translate that into foreign languages to create subtitles, which will make your video accessible to people on a more global scale.

And finally, captions may be required by law. And I’m going to dive into the federal accessibility laws right now.

So the first big accessibility law in the US was the Rehabilitation Act of 1973. And in particular, the parts that apply to captioning are Sections 508 and 504. Section 508 is a fairly broad law that requires federal communications and information technology to be accessible for government employees and the public. So this is where closed captioning requirements come in. Section 504 is basically an anti-discrimination law that requires equal access for disabled people with respect to electronic communications.

So Section 504 applies to both federal and federally funded programs, and Section 508 applies only to federal programs. However, any states that receive funding from the Assistive Technology Act are required to comply with Section 508. So often, that law will extend to state-funded organizations like colleges and universities because most states do receive funding from the Assistive Technology Act.

The Americans with Disabilities Act is a very broad law that is comprised of five sections. It was enacted in 1990, but the ADA Amendment Act of 2008 expanded and broadened the definition of disability. So Title II and Title III of the ADA are the ones that pertain to video accessibility and captioning. Title II is for public entities, and Title III is for commercial entities. And this is the area that has had the most legal activity.

Title III requires equal access for places of public accommodation. So the gray area here is what constitutes a place of public accommodation? In the past, this was applied to physical structures– so, for an example, requiring wheelchair ramps– but recently, that definition has been tested against online businesses.

One of the landmark lawsuits that happened a couple of years ago was the National Association of the Deaf versus Netflix. So the National Association of the Deaf sued Netflix on the grounds that a lot of their streaming movies didn’t have captions, and they cited Title III of the ADA. One of Netflix’s arguments was that they do not qualify as a place of public accommodation, but the courts ended up ruling in the end that Netflix does qualify.

So they ended up settling, and now Netflix has captioning on close to 100%, if not 100%, of all of their content at this point. So the interesting thing to come out of this case is that Netflix was considered a place of public accommodation, which sets a very profound precedent for the ADA’s application to the web and to online content, including for places like colleges and universities.

So another lawsuit to note is that Harvard and MIT were sued in February by the National Association of the Deaf for discriminating against the deaf and hard of hearing by not providing captioning for their online content. And the decision in this case will have huge implications on higher education.

An interesting thing to note is that edX, which is the online video platform launched by Harvard and MIT in 2012, recently entered into a separate settlement agreement with the Department of Justice. And so in that agreement, the Department of Justice, which has the duty of enforcing the ADA, required edX to provide accessibility measures for their online content, including closed captioning. And this settlement could play a large role in the current lawsuit since it indicates that the Department of Justice believes that online content is subject to the ADA.

And there are a couple of other ADA cases that don’t have decisions yet. One is against FedEx and another against Time Warner. And these decisions will further shape the scope of the ADA.

So the CVAA is the most recent accessibility act, and it was passed in October of 2010. It requires captioning for all online video that previously aired on television. So for example, this applies to publishers like Netflix and Hulu, or any network websites that stream previously aired episodes online. There are a lot of upcoming FCC updates to this law, but the biggest one is that starting in 2016, clips from television programs must be captioned when they go online. So for example, a two-minute excerpt from a show that you can view online would need captions. And with the CVAA, the copyright owner bears the responsibility for captioning.

So in February of 2014, the FCC came out with specific quality standards for captions, which was the first time that legal standards had been placed on things like accuracy. This applies to broadcast captions and online video that previously appeared on television, but they’re a good standard for all captions to follow.

So there are four parts. They’re all fairly self-explanatory. Captions must match the spoken words and include pertinent nonverbal information. They must coincide with the spoken words. They must run from start to finish. And they must not obscure important onscreen content.

Like in a documentary, for example, if the name and the occupation of the speaker is written on the bottom of the screen, the captions should be moved. Our solution to that is vertical caption placement, where we move captions to the top of the screen when we detect important information, and then we move the captions back once we’ve detected that that information is gone.

To talk a little bit about our company, 3Play Media is an MIT spinout. We’re still based in Cambridge, Massachusetts. For the last seven years, we’ve been providing captioning, transcription, and subtitling services to over 1,000 customers in higher education, government, enterprise, and media and entertainment.

Our goal is really to simplify the process of captioning and of transcription, which can be a barrier for a lot of people. We have a really user-friendly online account system, and offer fast and reliable turnaround, with a lot of options in terms of turnaround time. We have integrations with most of the leading video platforms and players, which can automate the process to make captioning even easier. And as I mentioned earlier, there are a lot of caption formats. We offer over 50 different output options, so you should never run into a problem there.

We also recently released a feature that allows you to import existing captions and subtitles, which gives you access to all of our tools. And these tools include various video search plugins that make your videos searchable and interactive, which I’ll go over more soon. We also now provide closed captioning and transcription for Spanish source content. And if you’re implicated by the FCC, the FCC does have the same captioning requirements for Spanish and mixed English-Spanish content as it does for English content.

So accuracy is very important to us. We comply with all of the FCC’s quality standards, and have developed very strict best practices for our transcriptionists. We use a multi-step review process that delivers over 99% accuracy, even in cases of poor audio quality, multiple speakers, difficult content, or accents.

So typically, 2/3 of the work is done by a computer, and then the rest is done by transcriptionists. This makes our process more efficient than other vendors, but more importantly, it affords our transcriptionists the flexibility to spend more time on the finer details. For example, we diligently research difficult words or names or places, and we also put more care into ensuring correct grammar and punctuation.

We’ve also done a lot of work on the operational side of the businesses so that we can now actually match transcriptionists’ expertise to certain types of content. We have over 800 transcriptionists on staff, and they cover a broad range of disciplines. For example, if you send us tax related content, we can match that content with a transcriptionist who has a financial background.

And of course, without exception, all of our work is done by professionally trained transcriptionists in the US. Every transcriptionist goes through a rigorous training program before they ever touch a real file, and they also go through a background check and enter into a confidentiality agreement.

Once your account is set up, the next thing is to upload your video content to us. There are a lot of different ways to do that. You can use our secure uploader in the account system, you can use FTP, our API, or you can use one of our integrations.

Our account system is all web-based, and there’s no software to install. And as I mentioned a little earlier, we have flexible turnaround options so that you can actually select the turnaround that you need. If it’s urgent, we have options for that, or if you have a more relaxed deadline, we have options for that as well.

We’ve also built integrations with the leading online video platforms and lecture capture systems, including Brightcove, Mediasite, Kaltura, Ooyala, YouTube. You can see a lot of the logos on the screen there. And if you’re using one of these platforms, then the process is even further simplified.

So these are some of the 50-plus output formats that we offer. After you’ve uploaded your media, it goes into processing. And when your captions are ready, you’ll receive an email alert.

You can log in to your account and download your files in many different transcript and caption formats. There’s no limit to the number of downloads or the number of formats, and there are a lot of features in the account system that you’ll have access to at this point. One of these features is the Captions Editor, which is an editing interface that lets you make changes to your transcripts or captions. And when you finalize your edits, they propagate to all outputs and plug-ins without having to reprocess that content.

So if you already have captions or subtitles, we have a new feature called Caption Import, which is a monthly subscription where you can import your captions and have access to all of our tools and plug-ins like the interactive transcripts. And you can also translate those captions into multilingual subtitles, convert them into other formats, and securely manage your assets. If you already have transcripts, you can use our automated transcript alignment service to create time-coded captions for your videos. Then you’d have access to all of the same tools as with our captioning service, like the translation and interactive transcripts.

One of our plug-ins which is available to you and included in the cost of captioning is the interactive transcript, which I have mentioned several times at this point. Basically, this is a time-synchronized transcript that highlights the words as they’re spoken in the video. You can click anywhere in the transcript to jump directly to that point in the video, or search for a term within the transcript and go to that point in the video. This is really popular and engaging for all viewers. It really appeals to the modern tendency to be able to search and find something immediately, and that’s a big limitation in video when it doesn’t have something like this associated with it.

So interactive transcripts are really popular in education because it helps so much for students when they’re studying. For instance, imagine that a student wants to find a specific section of an hour-long lecture that they know is important for their exam. Rather than struggling to find that exact section, they can just search for the term and go directly to that point. They can also print or download the transcripts and highlight important sections to study later.

While we’ve built a lot of tools that are self-service and automated, much of our success is based on the fact that we give all of our customers lots of attention. We expect to walk people through the account tools, and we really enjoy building relationships with people. In December, we did a survey with our current customers, and the word cloud on the right highlights the most common terms mentioned. And as you can see, support stands out as one of the key aspects that our customers are happy with.

So before we start questions, I have one more poll question for you, which should read, “What is your greatest barrier to implementing captioning?” And the possible answers are cost and budget, resource time, technical challenges, or not sure I need to. So no surprises. Cost and budget is always at the top of that list. But resource time and technical challenges are big things to keep in mind as well. So with that, we’ll start answering some questions.

So I see a question here about how an integration works, specifically how a YouTube integration works. So our integrations are really simple. Basically, you would go into our account system.

So for a YouTube integration, you would log in to your 3Play Media account, select Upload, and then select Linked Account. Click YouTube. It’ll ask you to log in with your YouTube credentials.

And then all of your YouTube videos will appear in your 3Play Media account. And you can just select the videos that you want captioning for, submit them, and then when the captions are ready, we’ll automatically post them back to your YouTube videos. So you never have to worry about downloading caption files or what formats you might need. We’ll just automatically take care of that, and your captions will appear on your YouTube videos.

So a question here about how much our captioning service costs. Our pricing is all duration-based, so it’s based on the length of the video content that you submit to us. You can find our full pricing information on our website. And again, there are always volume discounts if you have a lot of content.

So a couple of questions about how we handle different types of content. So if the content is technical or if the speaker has an accent, how do we handle that? Good question.

We have a number of systems in place to handle technical content, as well as accents. So as I said, first, everything is going to go through automatic speech recognition. But then we do a lot with our editors to ensure that you have an accurate output.

So we have transcriptionists– as I said, over 800 of them– from a lot of different backgrounds. They go through really rigorous training. And what we can do is basically pair transcriptionists with types of content that they’re familiar with. So if you have financial content in your video, maybe we have a transcriptionist who used to be a banker. And so they do a really good job at pairing transcriptionists with content that they will be successful with.

And then another thing that we do is we have a really thorough QA process. So once your transcriptionist has gone through and edited your captions, we actually have a third level of review where a QA person will go in and research any terms that they weren’t sure about, research any things that were flagged, and just make sure that you have a perfect output.

And then another thing that we offer is that you can upload a cheat sheet with your video. So if you know that the speakers in your video have difficult names or if there are specific places that you know the spellings of, you can just upload that with your video, and it’ll help your transcriptionist out a lot in creating an accurate product.

So then the other thing we can do is that the editing interface which I showed earlier will help you. If there is a small term that was misspelled, if it was a name that we didn’t catch, you can go in and edit that yourself and just finalize the file, and we’ll reprocess that and propagate it to all of your outputs.

So a question here. “Do you do transcription in Spanish?” We do. We recently released Spanish captioning and transcription. It’s very similar to our English captioning and transcription product, but for Spanish source content. And so it would work the same way, where you would just select when you upload that the file is in Spanish rather than in English.

A question here about information on the accessibility laws in Canada. So Ontario has a very strict accessibility law called the AODA. And with that one, starting in January of 2014, all organizations, both public and private, with more than 50 employees must conform to WCAG 2.0 level A. And then there’s a series of deadlines that phase in over the next several years. So we have a white paper that details the specifics of the AODA, which you can download on our website for free, which might be a little bit more helpful with the specifics of laws in Canada.

So another question here is, “Is there a certain format that transcripts need to be in to utilize automated transcript alignment?” We just need a text file for that. You would just upload the plain text, and we would be able to process that.

A question here, “Is the vertical caption placement done automatically, or would we provide instructions to place the captions in a certain segment at the top of the screen?” So our vertical caption placement is all automated. We actually just analyze the pixels on each frame. And if we detect that there’s text on the screen, we’ll move the captions to the top of the screen when they obstruct that key text. And then when we detect that that element is gone, we’ll move the caption frame back to the bottom of the screen.

A question here about captioning YouTube videos that you did not create. We had a great webinar recently on copyright and fair use for captioning third party videos, which I highly recommend. You can find that on our webinars page.

But the short answer is that, yes, you can. So we have a captions plug-in, which basically allows you to just, rather than uploading a file, you can insert a link to the YouTube video, and we will create captions for it. And then you can just use our captions plug-in, which is a single line embed code that you would associate with the video embed, and that way you can view the captions for the video without having to republish a YouTube video.

I think we have time for one more question. It looks like someone is asking about the Section 508 refresh. So Section 508 was recently put before the Federal Register. And basically, it’s looking to update Section 508 accessibility requirements to include some more strict and specific accessibility measures for web and online content.

As of now, the proposed rule was published in February, but the final rule is awaiting comment through the end of May. Once they’ve gotten all of the comments, then they can publish the final rule. But it’s unlikely that that will happen this year. But it’s a good thing to keep an eye out for, because the refresh would involve compliance with WCAG 2.0 standards, which are a lot more stringent than what we’ve been needing to comply with now. And WCAG 2.0 is an international accessibility standard. And if you want to learn more about that, we have a webinar coming up on the 21st, which I highly recommend that you register for.

So that’s about all we have time for today. Thank you all so much for joining us. Have a great day.