Quick Start to Captioning [Webinar Transcript]
TOLE KHESIN: All right. So we’ll go ahead and get started. Thanks again, for attending this webinar on closed captioning. My name is Tole Khesin.
We have about 30 minutes to cover the basics of captioning. We’ll try to make the presentation about 20 minutes, and leave the rest of the time for your questions. The best way to ask questions is to type them in the question window in the bottom, right corner of your control panel. We’ll keep track of them and address them all at the end. Also, feel free to email or call us anytime after this webinar.
One thing to notice, that this webinar is being recorded. And you will receive a link, probably tomorrow, to watch the recorded version of this webinar. And it will have captions and a searchable, interactive transcript.
So today we’re going to give you an overview of closed captioning, including some of the applicable legislation. We’ll talk about the services that we provide, and go over the process and workflow. So I’d like to start off with a little bit of global and national accessibility data from the World Health Organization and the US Census Bureau.
There are some pretty interesting trends that came out of these reports. One thing is that there are now over one billion people in the world today that have some form of disability. Nearly one in five Americans aged 12 and older experience hearing loss severe enough to interfere with day to day communications.
And one of the most interesting conclusions that came out of this is that the number of disabled people is increasing rapidly and disproportionately with population growth. And you might ask, why is that happening? There are actually a number of reasons, but the main factor is due to medical and technological advancements.
For example, the survival rate for premature babies has increased significantly, which is great. But the side effect is that more babies are being born with disabilities. We’re also coming out of a decade of war. And as a result of modern armor, soldiers are 10 times more likely to survive an injury than in previous wars. And again, this is a very good thing. But it also means that they’re more likely to sustain an injury such as a hearing loss.
So all of this points to the fact that accessibility is a critical issue that will become even more prevalent in the years ahead. And captioning is obviously an important part of this, which is why we’re talking about it today.
So we’ll take it at the very beginning. What are captions? Captions are text that has been time-synchronized with the media so that it can be read while watching a video. Captions assume that a viewer can’t hear the audio at all. So the objective is not only to convey the spoken content, but also the sound effects, speaker identification, and other non-speech elements. Basically, the objective is to convey any sound that’s not visually apparent, but integral to the plot.
Captions originated in the early ’80s, as a result of an FCC mandate specifically for broadcast television. But now with the proliferation of online video in pretty much every facet of our lives, the need for web captions has expanded greatly. And as a result, captions are being applied across many different types of devices and media, especially as people become more aware of the benefits, and as laws become increasingly more stringent.
So we’ll talk a little bit about the terminology. So captions versus a transcript– so the difference there is that a transcript is not synchronized with the media. On the other hand, captions are time-coded, so they can be displayed at the right time while watching the video.
For online media, transcripts are sufficient for audio-only content, but captions are required anytime there’s a video component. And this doesn’t necessarily mean that there has to be a moving picture. For example, a slide-show presentation with an audio track would require captions.
Captions versus subtitles– the distinction here is that subtitles assume that the viewer can hear, but just can’t understand the language, whereas, captions– as I mentioned before– assume that the viewer can’t hear anything. So subtitles do not have the non-speech elements. And they’re usually associated with translation to a different language.
Closed versus open captions– so the difference is that closed captions allow the end user to turn the captions on and off. In contrast, open captions are burned into the video and can’t be turned off.
With online video, people really moved away from open captions for many different reasons. The workflow is a lot more complicated. Open captions can obstruct the video, if there’s important information. And you have to retain multiple versions of the media assets. So for all of these reasons, people are really moving towards closed captions.
Post-production versus real-time– this really just refers to the timing of when the captioning is done. Real-time captioning is done by live stenographers, whereas, post-production captioning is done offline and usually takes a few days. And there are advantages and disadvantages to each type of process.
So we’ll talk a little bit about the accessibility laws. So first of all, Section 508 and 504– these are both from the Rehabilitation Act of 1973. Section 508 is a fairly broad law that requires Federal Communications and information IT be accessible for employees and the public. For video, as I mentioned before, this means closed captions. For podcasts or audio-only, transcripts are sufficient.
Section 504 basically has the same effect, but it has a bit of a different angle. It’s an anti-discrimination law that requires equal access for disabled people with respect to electronic communications. Both of these laws apply to all governmental agencies and certain public colleges and universities that receive federal funding, such as through the Assistive Technology Act.
Also, many states have enacted their own laws that mirror these federal laws. The ADA, or the Americans with Disabilities Act, is a very broad law that is comprised of five sections. Title II and Title III are the ones that pertain to video accessibility and captioning.
Title II is for public entities. And Title III is for commercial entities. And this latter is the one that has had the most legal activity. So Title III requires equal access. The way it’s written, it requires equal access for places of public accommodation.
And the gray area here is what constitutes a place of public accommodation. In the past, this was something that typically was applied to physical structures– for example, requirements for wheelchair access. But recently, that definition has been tested against online businesses.
And there have been some interesting case law that has really expanded that definition. In the recent case of NAD– that’s National Association of the Deaf– versus Netflix, the court ruled that Netflix did, in fact, qualify as a place of public accommodation. And there is an ongoing lawsuit of the greater Los Angeles Agency on Deafness versus Time-Warner that cites the ADA because CNN’s videos lack captions in some cases.
The Americans with Disabilities Act was enacted in 1990. In 2008, it was expanded. And that expansion broadened the definition of what it means to have an disability.
So the most recent accessibility law is the CVAA, which is an abbreviation for the 21st Century Communications and Video Accessibility Act. It was passed in October, 2010. And this law requires captioning for all online video that previously aired on television. It applies to publishers, like Netflix, and Hulu, and pretty much anybody that’s publishing video content on a website and that also aired on TV.
In February of this year, the FCC came out with very specific quality standards for captions. And I don’t want to get into this too much in this webinar. We actually have another webinar that you can view on our website that goes into a lot more detail about captioning quality, best practices and standards.
But just a quick overview. This FCC ruling was comprised of four parts. Number one, caption accuracy requires that the spoken words be transcribed very accurately. It does allow some leniency for live captioning. But for post-production captioning, it requires maximum accuracy possible.
The second part was caption synchronization– basically requires that the text be aligned very precisely with when it’s spoken. Program completeness– so there had been some complaints that captions sometimes dropped off before the program actually ended. And so now captions are required to run all the way from beginning to the end of a program.
And the last piece of this is on-screen caption placement. So what this is, is sometimes captions– which are usually in the lower, bottom third area of the screen– sometimes they tend to obstruct on-screen text or graphics that are crucial to the video. And in that case, those captions need to be relocated. So we actually provide an automated service that will automatically relocate captions, if the algorithm detects that there’s onscreen text at the bottom that could be obscured.
So a little bit about the benefits. So the primary benefit that people and companies add captions is an accessibility accommodation for people with hearing disabilities. And this is obviously critical. But there are also a number of other reasons why companies decide to add captions. I just wanted to quickly run through these.
So it turns out that there was a study that was done in conjunction with the BBC. And they learned that 80% of the people that use captions actually aren’t deaf, and they don’t have any hearing disabilities at all. That they do it for a number of other reasons, but mainly because it helps them to understand the content better. In particular, people who know English as a second language– it really, really helps them to have captions turned on.
Captions also provide flexibility to view in a noise-sensitive environment, like a library or workplace, or if you’re using a computer that doesn’t have speakers. Captions can also be leveraged to make the videos searchable. And we provide a number of these plug-ins, and interactive transcript, and other tools that basically leverage the time text data to let users search through the video, navigate to a specific point just by clicking on a word.
So the other piece of it is search engine optimization and discoverability. This is a very popular reason to transcribe and caption video content, because search engines like Google can’t actually watch a video. They know that a video is there, and they know the title of the video, sometimes they might know the tags, but they don’t know anything else about it.
And if you expose a transcript, it just provides the search engines with a much deeper and broader understanding of what that video is about. And so consequently, it will rank for more keywords in search results. And of course lastly, transcription and captioning is a source for translation, if that’s something that you’re interested in doing.
So a little bit about the caption format. So there are many different types of caption formats. And here is an example in the top, right corner of an SRT caption. So it’s a very simple caption format that’s very popular for web media.
What we see here are three caption frames. In the first one it says, “Hi, I’m Arne Duncan, the Secretary of Education.” And right above that, you can see there’s a time code. At that time is when this caption frame appears. And then it disappears at three seconds. And then the next caption frame appears.
So that is a common format. Below that, this is an SCC example. So this is something that’s a lot more complicated and has a lot more capabilities beyond the SRT example. And also, it’s not even a text. So if you open that up, that’s actually in hex code, so you can’t even really read that. But this just shows you the different flavors of caption formats that are out there.
And the way that you associate the caption file with the video file, there are a few different ways of doing it. With web media, the most common way to do it is as a sidecar file. So what that means is that your caption file is going to be a different file from your video file. And your video player basically just references the caption file to be able to, basically, display the captions in the player.
Another way to do it is to encode the closed captions with the video. And that’s a service that we offer as well. So you would have a video file where the captions are embedded as a separate track in the video file.
And then, lastly, open captions. So this is where you take the captions and you actually burn them into the video, so that they’re on all the time. And the user can’t turn them on or off. And that’s something that we can provide as well.
So a little bit about 3Play Media. So we focus on captioning and transcription and subtitling. This is our bread and butter. This is basically all we do. And we’re going on six years doing this.
We’re born out of MIT. We’re based in Cambridge, Massachusetts. We work with over 800 customers in higher ed, media and entertainment, enterprise, government, all kinds of different companies.
Apart from the standard services of captioning, and subtitling, and transcription, we’ve built, pretty much, a comprehensive suite of tools and capabilities around these services. For example, we have a transcript alignment service where, if you have a video and you already have a plain transcript, you can upload that along with your video. And we use an automated process to automatically align that transcript and produce captions in all these different formats and give you the capability of using the video search plug-ins as well.
A little bit about accuracy and quality. So we use a multi-step review process that delivers more than 99% accuracy, even in cases of poor audio, multiple speakers, difficult content or accents. The way that it works is that, when we receive video, we first put it in speech recognition, which gets it to a point where it’s about 70% accurate on average.
We then have a professional transcriptionist who will go in and clean up the mistakes left behind by the computer. And subsequently, we’ll have another person who will go in and double-check grammar and punctuation, research difficult words, and just do an overall review. And by the time we’re done with it, it’s pretty much a flawless output that’s perfectly synced with the video. And all the work is done in the US.
Although rarely necessary, occasionally, an error will sneak in, especially with names of people or places, if they’re a little obscure. So what we’ve developed in the account system is a captions and subtitles editor. So what you can do is, once your file has been processed, you can just go into this editor. You can make the change. There’s even a Find and Replace feature in there. If you want to use that, you can just make the change.
And as soon as you click Save, that will propagate to all the output files. You don’t need to reprocess anything. That will change everything. And if you’re using any of the integrations– which I’ll talk about in one second, but let’s say you’re using the YouTube integration– it will actually resend that captions file to YouTube and update it over there as well. So it’s a pretty neat feature and pretty unique in this industry.
One thing we’ve found is that, although we’ve built a lot of self-serve tools, much of our success as a company is based on the fact that we give all of our customers lots of attention. And we expect to walk people through the account tools. And we enjoy building relationships with people. So support is something that’s really, really important to our business.
So with uploading files, we provide a number of different ways to do that. Once your account is set up, you can upload your video files using the web uploader. You can use FTP. You can use the API, if you want to automate that part of the workflow.
And we also have built-in integrations with all of the leading video platforms and lecture capture systems, including Brightcove, Mediasite, Kaltura, Ooyala, YouTube. Those are a few, but there are probably about a dozen there. And so we aim to make the captioning workflow as unobtrusive as possible.
We give you the ability to automate much of the workflow. And our captions and tools are compatible with most video players. And another thing to notice, that the account is all web-based. And there’s no software to install.
So transcript alignment, this is something that I briefly mentioned. If you have a transcript along with the video, you can just upload that. And we’ll create all the captions and basically provide all the same outputs, how you transcribe the video from scratch, but at a fraction of the cost. We produce over 50 different output formats.
We also have an algorithm that extracts intelligent keywords. And we also provide automated vertical caption placement. This relates back to that FCC requirement where you have to move the caption frame so as not to obstruct any onscreen text. And then also, as I mentioned, we are able to provide these formats either as a sidecar file, or we can encode them in M4V file.
If translation is something that you’re interested in, that is something that is integrated in the account. So once your files have been captioned, you can basically just select which languages you want to translate them to. You click Go. And then a few days later, or whenever you specify, those translations will be ready in whatever languages you need. And they’ll show up as subtitles. And they’ll be available for download in all those different formats.
This is something that a lot of video publishers are adding, either below or to the side of their videos. It shows you a much larger box with the transcript in it. And users can search through it. They can click on any word to jump to that exact point in the video.
The neat thing about these tools is they let you search not only within one video– but let’s say you have a library with hundreds of thousands of videos. These tools allow you to search across the entire library and jump to a very specific point of when that keyword was spoken. Very useful.
Last thing. This is a service that we started about a year ago. It’s called Captions For YouTube. This is a separate website, captionsforyoutube.com. It’s really just, we think, the easiest and fastest way to add captions to your YouTube videos.
Basically, you just log in with your YouTube credentials. You select which videos you want to have captioned. You press a button, and then everything is done automatically.
YouTube sends us the videos. We process them. We send the captions back. They just show up. On this particular site, everything is just optimized for YouTube.
Great. So at this point, I wanted to open it up for questions. So as I mentioned, you can type the questions in that Questions window in the lower, right corner of the control panel. There are few resources here on this page that might be useful for you.
The top one will point you to a number of different video tutorials, and FAQs, and webinars, and how-to guides. The second link is for our online support and documentation site. So there are literally hundreds of articles there about all the different transcription captioning and subtitling capabilities. And then that last link is for Captions for YouTube. This is what I just showed you a screenshot of.
Great. So we will just take a minute here to aggregate the questions, and we’ll be right back. Great. So thanks, everyone. So we have a number of questions here that we’ll go through.
So the first one is about the vertical caption placement. So which caption formats support vertical caption placement? So the formats that we currently support for that are WebVTT, SMPTE Timed Text, SCC, STL, and DFXP. So those are the five formats. You can get that information on our website.
Yes. So there’s a question here about Captions for YouTube and how that process differs from our standard process. So it’s actually exactly the same process. With Captions for YouTube, it’s basically just an interface on a separate website. All of the transcription and captioning processing is done by 3Play Media.
So when you upload captions at captionsforyoutube.com, it comes to 3Play Media. We process them. And then we send it back to YouTube.
3Play Media also has the same bi-directional integration in place with YouTube. But it’s just that a lot of people don’t need all of the other capabilities that are available through 3Play Media. They just want to caption YouTube videos. And that’s why we set up that site. But you do have all the same features available through 3Play Media.
There is another question, while on this topic, another question about our integrations, a question here about how our integration with Kaltura works. And so it works exactly the same way as with YouTube. Basically, you link your Kaltura account with your 3Play Media account.
And from that point on, you actually don’t even need to ever go into your 3Play Media account again. From within Kaltura, you just add tags to the videos that you want to have captioned. So you add a tag of 3Play to whatever videos you want to have captioned. And that’s really all you have to do.
Kaltura will send us those videos. We’ll create the captions. And then we’ll send them back. And then they’ll just show up. So all the workflow is completely automated. And the process is basically identical in Brightcove, Ooyala, in YouTube, and Mediasite, and Echo 360, and all of these video platforms and lecture capture systems.
So there’s a question here about translation and subtitling it. And the question is, are your translations done with the video, or from the transcript alone? So the way that works is we create captions, and so we break it up into caption frames using optimal linguistic standards. And then we’ll take those caption frames and we’ll submit them to the translator for translation.
And so basically, each caption frame will correspond to a subtitle frame. So to answer your question directly, in the translation process. We use the transcript. Although sometimes, the translator may refer to the video for context, if necessary.
There’s a question here about whether this webinar is being recorded. And yes, it is being recorded. And everyone that registered for this webinar will receive a link to be able to watch this recorded version.
So there are a few questions here about captioning standards and how we create those caption frames and split up the captions on the screen. So this is something that we have put a lot of time into and have developed a lot of technology around. We have incorporated a lot of natural language processing, in order to make those caption frames as optimized as possible for the English language.
The embed process is very similar to the way that you embed a video player. You embed the video player, then you embed this plug-in. And it will communicate with the video player and display the captions at the right time. It’s also searchable, and it supports multiple languages.
So there is a question here about speaker identification. And yes, we definitely support speaker identification. That’s something that you can control through the settings in the account system. We definitely account for different speakers. And there are different ways, different standards that we can use to display them.
Great. So I think that brings us to the end of our webinar. I wanted to thank everyone for joining us today. There are a few questions that we couldn’t get to in time, but we’ll definitely reach out offline.
As I mentioned before, a recording of this webinar with captions will be available tomorrow. And you’ll receive an email with a link to watch it. Thanks again. And I hope you have a great day.