Plans & Pricing Get Started Login

« Return to video

Quick Start to Captioning – Webinar Transcript

JOSH MILLER: Hey, everyone. Thanks for joining us today. We are going to be talking about closed captioning today. My name is Josh Miller. We have about 30 minutes to cover the basics of closed captioning. The presentation itself is probably only going to be about 15, 20 minutes.

We’ll leave the rest of the time for questions. And the best way to ask questions is by typing them into the window on the bottom right corner of your control panel. We’ll keep track of them and address all the questions at the end. Certainly feel free to email us or get in touch with us directly if you have any specific questions that we don’t get to today. And I also just want to make sure that the sound is working all right as we’re getting going.

So for the agenda today, we’re going to, like I said, give an overview of closed captioning for web video. We’re going to talk a little bit about some of the applicable legislation. And we’re going to talk about some of the services we provide and talk about the actual process step by step.

So real quick, just what are closed captions? Captioning refers to the process of taking an audio or video track and transcribing it to text. Then we synchronize that text with the media. So closed captions are typically located underneath a video or overlaid on top. In addition to spoken words, captions convey all meaning and include sound effects. And this is a key difference from subtitles that we’ll talk about.

Closed captions originated in the early 1980s by an FCC mandate that applied to broadcast television. And now that online video is rapidly becoming the dominant medium, captioning laws and practices are proliferating there as well. We’ll go over some key terminology here.

First, captioning versus transcription. A transcript is usually a text document without any timing information. On the other hand, captions are time synchronized with the media. You can make captions from a transcript by breaking up the text into smaller segments called caption frames and synchronizing each one of those segments with the media. And that way, each frame is displayed at the right time.

Next, captioning versus subtitling. The key difference between captions and subtitles is that subtitles are really intended for viewers who do not have a hearing impairment, but they may have an understanding with the language issue. Subtitles are usually going to capture all the spoken content but not sound effects, and that’s because the expectation is that the person can hear. They just need to have it translated into another language. For web video, it’s possible to have multilingual subtitles and have multiple tracks for a given video.

Closed versus open captioning. The difference between closed captions and open captions is that closed captions specifically can be turned on and off by the viewer themself. They can choose whether to show the captions. Open captions are burned into the video and can’t be turned off at all. Most web video players that you see online today can support closed captions, which is usually the preferable method.

And then post production versus real-time. Post production means that the captioning process itself occurs offline or after the video content’s been produced. It could take a few days to complete, depending on the situation, whereas real-time captioning is done by live captioners. And that means that the news or any kind of live broadcasting that you see with captions, that’s done by someone who’s usually a captioner or a stenographer who’s typing in real-time. And certainly, there are advantages or disadvantages of either process.

So captions are used in a number of different ways, clearly, and they’re being applied across many different types of media. And certainly as people become more aware of some of the benefits, and some of the laws are becoming more stringent, we’re seeing the use increase quite a bit as well.

So let’s talk a little bit about the accessibility laws that come into play. Section 508 is a fairly broad law that requires all federal electronic and information technology to be accessible to people with disabilities, including employees and the public. For video, this means that captions must be added. And for podcasts or audio files, a transcript is usually sufficient because the text doesn’t necessarily have to coincide with the visual.

Section 504 entitles people with disabilities to equal access to any program or activity that receives federal subsidy. Web-based communications for educational institutions and government agencies are covered by this as well. And Section 504 and 508 are both from the Rehabilitation Act of 1973, although Section 508 wasn’t added until the ’80s. But it’s important to note that many states have enacted legislation that’s very similar to Section 508 or 504. It might be known as a different name, and so each state may have slightly different rules as to how content is handled.

Next is the ADA. That’s the Americans with Disabilities Act, which is from 1990. That covers federal, state, and local jurisdictions, and it applies to a range of domains, including employment, public entities, telecommunications, and places of public accommodation. The Americans with Disabilities Amendments Act in 2008 broadened the definition of disability to actually make it more in line with Section 504.

Now, the ADA is interesting because the NAD, the National Association of the Deaf, sued Netflix successfully. And the argument was that the ADA would apply to Netflix because they have such a wide-ranging audience. Netflix argued that the ADA should not apply because, based on the way the law is written, really it should only apply to places of public accommodation, meaning physical places of public accommodation.

But the ruling was that that’s exactly what Netflix is. They are a place of public accommodation and therefore must be accessible to everyone. So they were forced to add captions to quite a bit of their content.

When it comes to reading into this a little bit more, it’s actually a bit difficult, because the explanation was left pretty vague as to what this means. But we can certainly assume that if Netflix is considered a place of public accommodation, that means that web entities that are heavy in content are probably going to start to be looked at in a similar vein. And the laws may start applying in ways that aren’t quite as literal as we may have expected.

The 21st Century Video Communications and Accessibility Act was signed into law in October of 2010, and this law is often referred to as the CVAA. This law expands closed caption requirements from broadcast that we originally talked about, and it’s basically stating that all online video that previously aired on television must be accessible and closed captioned. And there’s a timeline based on the type of content that’s up online and the different milestones that have to be hit. So for example, if a show was on television and then is put up in its entirety online, it must have closed captions as of today. There are different rules for different types of content, such as archival programming or clips, and the timeline is different for each type of content.

So accessibility as a whole is actually a growing concern, which is pretty interesting in this case. So this is some data from the 2011 WHO Report on disability. And it states that more than a billion people in the world today have a disability of some sort. And nearly one in five Americans age 12 and older experience hearing loss severe enough to interfere with day-to-day communication, which is a much higher number than certainly I would expect.

The other interesting thing that’s happening here is just how much these numbers are growing. And so you kind of start to ask why, and you can draw some conclusions. And a lot of it has to do with just some of the medical and technological advances that we’ve seen over the last few years. For example, nowadays it’s far more common for babies that are born premature to be able to survive and have mostly normal lives. These are great things, but it also does mean that there is an increased rate of disabilities.

We’re more likely to survive car accidents. We have an aging population. These are all good things, but there are realities that come with that. The next part is we’ve been at war for more than a decade now, and with modern armor, soldiers are actually 10 times more likely to survive injuries than in the past. Again, this is a very good thing, but it brings to light other realities that we have to deal with.

Let’s talk about some of the benefits other than the obvious, having closed captions for the deaf and hard of hearing. Captions really do make a big difference for people who are viewing the content and speak English as a second language, for example. Captions and text in general improve comprehension and really do remove language barriers. It allows people to go at their own pace, really, if you’re giving them the text.

Captions can also compensate for poor audio quality or noisy backgrounds, even if the background issue isn’t necessarily in the video but maybe where you are. Maybe you’re in a sound-sensitive environment, like a workplace or a library. Having the captions, in some cases, become a necessity.

Search engine optimization becomes a real benefit when you have closed captions because a search engine can’t really listen or understand what’s in a video. It needs to be given the text to index what’s actually being said. Next, once the video’s been found, captions allow for that particular video to be searched and reused. This is especially important with long-form video. So if you’re looking for something very specific within a one-hour piece of content, you can actually search through the text rather than using that scrubber bar to try to find the exact segment and figure out where the segment is that you’re looking for.

The text being synchronized with the video actually becomes a very valuable tool to navigate and jump to different parts of the video. And we even have a number of interactive tools that provide exactly that, use the text to navigate to a specific point in a video. And then certainly, the other part that’s really interesting is if you have a global audience and you need to translate your content, the first thing you need to do is actually create the captions in English. And that becomes the stepping stone to subtitles in other languages.

So there are a number of different caption formats that are used, and it really depends on the type of media player you’re using. And each media player might take a slightly different format, which is really just the template that the file takes on. But what you see here– and this is an example of an SRT file, which is a pretty commonly-used file format– what you see here is that the key information is the timing and the text itself. And any of these formats you see listed all have that same structure. There’s the time information and the text. Just the template or the code or however it’s constructed is what’s going to change from one file format to another.

So we offer a number of different formats, certainly, even more than what you see on this list. We even have a plug-in that allows you to display captions with media players that don’t have native support for captions. So YouTube, for example, has really, really good captioning support. It’s really easy to upload a caption file for any video that you have on your channel. Whereas Vimeo actually has not built in any captioning support, so you would need another tool to display captions. And that’s what the plug-in that we offer allows you to do. It allows you to tie into that Vimeo player and add a caption track.

So real quick, just so that we’ll give you a little bit of background about 3Play Media. The inspiration for the company started when we were doing some work in the Spoken Language Lab at CSAIL, which is the computer science department at MIT. We were approached by MIT OpenCourseWare with the idea of applying speech technology to captioning for a more cost-effective solution. And we quickly recognized that speech recognition alone wouldn’t suffice, but it did provide an interesting starting point.

So we’ve developed an innovative transcription process that uses both technology and humans to yield high-quality transcripts with time synchronization that can be used as closed captions. And we’re constantly developing new products or ways to use the transcripts or captions. And we really look to our customers for feedback as to what’s useful, how else can we make this output valuable for you. And just quick, we work with a number of different types of organizations– everything from higher ed to entertainment to enterprise and government. We really do have different solutions for different types of organizations, and in each case, it should be relatively easy to tie into what you’re using.

So a quick overview of what we actually do. It’s a combination, obviously, of transcription and closed captioning. We also have other tools, though, such as automated transcript alignment. So if you have a transcript, we can create the closed captions from that transcript and tie it to the media that you have. We do have a translation option as well. So once you’ve created those captions, you can then translate them to create subtitles in other languages.

We have a number of interactive search tools. So I mentioned that idea that once you have the captions and time-synced text, you can use that as a search tool. And so we actually do package that up and provide tools to implement on your site very easily. And then we have a number of integrations and an API that make the workflow itself as easy as possible.

So every file goes through a three-step process. The idea is that the speech recognition will get used first, but it gets reviewed and completely cleaned up by a human and then another human QA check. And we’ve found that, on average, probably about 2/3 of the work gets done by a machine, whereas the rest is really all cleaned up and made as close to perfect as possible by the transcriptionists who are trained on our system. And so there’s a lot of work that’s gone into this operational process.

And we have hundreds of transcriptionists on staff now who are all trained on our system, and they’re all US-based. And that’s something that we find is really important for quality, that people who are true English speakers are the ones who are transcribing your English content. And there are a number of reasons for that.

One is certainly any colloquial language or kind of more slanging-type spoken content. Those types of files are really only going to be able to be completed by someone who really understands it. The other thing is we also can handle accents really well, and that’s because we have a wide range of people who have been exposed to many different types of content.

So everything from accents to different domain types, we can handle all that. And that’s also where, clearly, a machine can’t do it all. If there’s a strong accent, really, you’re going to need to have a human involved.

So as much as we’d like to say it’s going to be incredibly accurate, the reality is that there are going to be times that certain proper nouns or vocabulary can be very difficult to get exactly right. So we’ve built in the ability for you to make changes on the fly yourself. So if a name is misspelled or if you decide you even want to redact a phrase, that’s something that you can do very quickly. Press Save, and off you go. All the files that we offer get updated immediately.

And again, we’ve built a lot of this system to be self service. But the reality is that we expect– and we really think that much of our success is based on the fact that we give our customers a lot of attention. So we expect to walk people through the account tools. We enjoy building the relationships, and it’s that feedback that we get that really allows us to build out the new features, new product ideas. So feedback is something that we take really seriously.

Account setup with us is very, very quick. You can pay by credit card. You can be invoiced. You can institute POs.

We have a number of different security and user role settings built in, so you can give people access who really should only have certain types of access. Maybe someone can only upload. Other people can get billing. All that can be completely customized to the way you need it to work. So the setup process is meant to be very, very quick.

We have a number of ways that you can upload video content to us, everything from a secure web uploader to FTP to an API and then certainly integrations with a number of the leading media platforms. Our idea is to really make the captioning workflow as unobtrusive as possible, and really, we almost, in a way, become a black box. And so we allow to you automate as much of the workflow as you possibly can, and the tools become really compatible within many different video players. So this is a quick view of some of the platforms that we already integrate with, so it’s very easy to pull content out of these systems and then put captions back in pretty much automatically. And this is a list that’s constantly growing as well.

Another service that we offer that I mentioned is this automated transcript alignment. So if you have a transcript, it’s very easy to select that option and add a transcript to your video, and then we will synchronize it for you. Since it’s an automated service, it’s actually very quick. You’ll have it back within less than 24 hours at most, often back in about an hour or two. So that’s a very quick and cost-effective method if you already have a transcript.

This is a view of what it might look like if you want to download one of your files. In fact, I think we even offer more versions now than what you see here. But basically, this is a view of all the different closed caption and transcript formats that you can download at any time. So whether you’re using one of the integrated solutions to automate the workflow or if you’re just uploading content manually, you would have the option of downloading any of these formats yourself or over our API any time you want.

Once the captions are complete, you have the option to translate. So you can pick the language you want to translate into. And then you can even pick the price and quality level that fits your needs, so you can keep budget in mind if you need to.

And then as I mentioned, we have this captions plug-in. So this is something that you can embed with a media player on your site at any time, and it basically pulls the caption track into this little window. What’s really cool about this is that you can also search and basically navigate through the video based on the caption text, and that’s all built into the plug-in pretty much by default. And one of the big benefits is it works with a Vimeo player, which doesn’t actually have its own captioning support.

The interactive transcript that we offer basically provides a view of more of the transcript. Again, this is embeddable with many different media players. On a page, each word will highlight as it’s spoken. You can click on a word, jump to that part of the video. You can search within the video based on what’s actually being spoken, and you can even create clips. You could send a unique URL to people, and it would take them to the exact segment that you had selected.

We recently launched captionsforyoutube.com, which is a very, very easy way to tie your YouTube channel in and basically select which videos you want to have captioned and basically just pay with a credit card and automate the whole process. You’ll still get our full service, closed captioning service, where we’ll run it through our standard process. Humans are still going to be involved.

But from your perspective, if you have a YouTube channel, the workflow itself is completely automated for you. And you can basically post those captions back automatically by just signing in with your YouTube account. So it’s very, very easy to use and very streamlined specifically for YouTube content.

So we’re going to take a minute to aggregate some of the questions. If you have questions, now’s a great time to type them in. Again, feel free to reach out directly and ask questions. There are a couple links here that could be helpful as well. So we have a number of resources on our website and our support page that walk you through the captioning process and just provide a number of free resources, so feel free to check those out. But we’ll take one minute, and I will be right back.

So looking at the time, we may go over beyond 2:30 just a little bit. If you need to drop off, that’s fine. But certainly feel free to stay on and ask other questions. So there are a couple questions about downloading content, where it can go, and even content ownership.

So the way we look at it is once we’ve processed content and created captions and transcripts, you, the content producer and the holder of the account, own all that output. That is yours, so you can do whatever you want with it. We store it for you so that you can come back in and download other formats if you need them.

But technically, you own it, and you can do whatever you want with it. You can download it any time. It’s accessible over our API or through the account system. And it’s really up to you as to what you do with it or where you put it. It’s totally in your control.

There are some pricing questions. We have a full pricing breakdown available on our website if you want to check it out. But basically, everything’s usage based, so you pay for however much content we actually process. And then we do offer volume discounts as well. So the more you process, the more you commit to processing, the lower that per minute or per hour rate will become. But the full breakdown and the volume tiers are all listed on our website, so that’s probably the easiest way to take a quick look.

Just a quick reminder. We are going to post a version of the webinar on our website. So if you are interested in looking it over, it will be posted, and you’ll receive a link. Having registered for this webinar, you’ll receive a link to that.

A question about the search that’s built into the interactive transcript. As of right now, it’s built in as a pure text search, so it’s a client-side text search. So it’ll return results based on what appears in the transcript itself. We do have a number of other search tools as well and are building out other search functionality, so we are going to be expanding some of the search capabilities soon.

A good question about the account system, and there are a couple questions about this. One is, first, there’s nothing that you would have to install. Everything is web based, and everything is stored in the cloud. So you don’t have to install anything onto any of your machines.

And you can have multiple users access the same account. They’d have their own login, but you can have it be a shared account. So it can all be very centralized for everyone.

There are a couple of questions also about some of the media platforms we integrate with. Certainly, of the list you saw, those are all live. Plus there are a couple that we’re working on to add very soon, specifically Ensemble and Panopto. So those will be live hopefully very, very soon, but those are in the works.

A good question about copyright issues and commercial content. Basically, we take the stance of we’re not going to distribute the content at all. It’s your content. You have access to it, and we don’t give it to anyone else.

So in terms of copyright issues, things like that, we kind of leave it up to you, to be honest. And so it’s up to you to make sure you’re using the content appropriately. We see this most often in the education space, where maybe a movie’s being shown in a classroom. And the way we look at it, that’s completely reasonable. But ultimately, it would be up to you to make sure you’re not redistributing it in a way you’re not supposed to.

In terms of the translation services, there are some questions about how that works. Basically, we have a couple of translation partners that are tied into our account system. So once you’ve created captions or a transcript, you can very easily select a particular file and then choose to have it translated.

There are a couple things that we do to try to make that process easier. One is the UI itself is very easy to use. You can select whatever language you want and go ahead and have it translated. What will come back are essentially the same output files you see in English. You’ll get all the same options in the other languages that you’ve translated into.

And there are two other pieces to this that we think are pretty important. One is before you actually submit for translation, we have what’s called a translation profile. And we really recommend that everyone fill that out because it’s really a style guide. It’s your way of conveying all the contextual information, style information, and even particular vocabulary that should be translated in a certain way. And that information will be submitted with the file to the translator so they have really full context over how to do the work.

And then after that, when the file comes back, we also have an editing interface. We mentioned the editing interface for English. We also have an editing interface for translated subtitles. And it’d basically be English on one side, the target language on the other side, and you can go in and make changes as you need to. And again, those changes will update automatically on all of the different output files.

So there are some questions about what happens if the audio quality isn’t very good. So we can definitely handle a wide range of audio quality. There are times when, unfortunately, it’s not possible to transcribe. And if it’s helpful, the reality is if you really can’t hear it, there’s probably not much more we can do.

But we do have kind of a cutoff for what we would consider to be normal or good audio and then difficult audio. And that basically means that, for whatever reason– it could be background noise. It could be the person is really far away from the microphone– that we call it difficult. And that means it’s going to take a lot more human effort to transcribe. So we can definitely handle it.

Sometimes, there is a little bit of an extra charge to handle that, but it’s really dependent on the audio quality in the recording as opposed to the type of content. That’s an important distinction. So we can handle lots of difficult content in terms of the vocabulary that’s being used. That’s not what we would consider to be difficult necessarily. What we really care about is the recording quality.

So there are some questions about caption formatting. There is some flexibility in terms of how many lines are displayed per caption frame, how many characters are displayed per caption frame. So there are certain constraints that are built in by default, but there is some manipulation that can take place. And that’s something that definitely can be discussed based on your needs. By default, for what it’s worth, we put two lines of text per caption frame, and we start with the standard 32 characters per line. And that’s part of just the traditional captioning standards.

Questions about turnaround time. Basically, we have a system that allows you, the user, to choose every time you upload. The standard is four business days, but then we offer different options for one day, two day, same day, which is eight hours. So lots of flexibility in terms of the turnaround options, and that’s really up to you to choose.

So we’re going to call it a day here. Thank you for the questions. If we didn’t get to your questions, please feel free to reach out. We’ll also try to reach out to people if we didn’t answer any questions.

But definitely feel free to be in touch. We’re happy to answer other questions or even talk through the process in more detail if that’d be helpful. So thanks for joining us today, and you’ll receive an email when this is up online for you to view on demand.

Interested in Learning More?