« Return to video

Quick Start to Captioning [TRANSCRIPT]

KELLY MAHONEY: So with all of that information taken care of, it’s probably best for me to introduce myself next. My name is Kelly Mahoney. I’m a Content Marketing Specialist at 3Play Media. And I create content on all things accessibility and the cutting edge of captioning using platforms, like our blog, social media, and presentations like these. Just a quick note on my physical appearance for anyone who may not be able to see me, I’m a young white woman with my brown hair clipped up in a bun and I’m wearing a white collared shirt.

So let’s get into the real material for today. For most of our time together, I’ll be talking about captions, including live captions and closed captions. Then I’ll get into some of the most important laws related to accessibility compliance and the benefits of including captions. And after that, we’ll open the floor for our Q&A session.

So what are captions? Before we get into anything technical, we need to define the basics. So the most important things that you need to know about captions is that they are always time-synchronized. These are texts that can be read while watching a video, and they’re typically indicated by the capital “CC” icon. They originated as an FCC mandate in the 1980s, and they’re provided as an accommodation for deaf and hard-of-hearing individuals.

They were originally required for all broadcast media, but their use has since been expanded to include online video and internet applications as well. And because captions are intended for an audience who cannot hear audio, they should also include relevant non-speech elements, like speaker identifications and sound effects, that are critical to a viewer’s understanding of what is going on in the environment.

One example of this is, if you are watching a show and someone on screen is unlocking a door and you can visually see their keys jingling, you would not need to caption that. But if the keys are jingling off-screen and it’s important to the plot for some reason, you would need to caption that so that every viewer has the same level of situational awareness.

Now, it’s important to distinguish between captions versus subtitles versus transcripts. Because sometimes, in places like the UK, they use the words “captions” and “subtitles” interchangeably. But in the United States, the difference is actually pretty significant. So what we mean when we’re referring to “captions” is what we just went over. These assume that the viewer cannot hear audio. They are time-synchronized, and they include non-speech elements as an accommodation for an audience that cannot hear.

Subtitles, on the other hand, assume that the viewer can hear but can’t understand the audio. These are also time-synchronized, but their primary purpose is to translate any spoken dialogue from one language to another. And finally, transcripts are just a plain text version of any spoken dialogue. These are best for audio-only content, such as podcast episodes.

Captions are far and away the most common what people are the most familiar with, and there’s a variety of ways to create them. You could do it yourself. You could use Automatic Speech Recognition, or ASR, software. Or you could outsource the work to a captioning vendor. So the first way, if you have the time, is to create captions yourself. But when factoring in the time that it takes to add those non-speech elements as well as synchronize the spoken dialogue to the audio, it could take an inexperienced captioner up to five to six times a video’s length to complete captions.

Additionally, this method could be pretty costly to scale. So a lot of places will follow the second method, which entails the use of ASR software. ASR engines are really helpful for creating a foundational transcript for that first round. However, they often do not provide a sufficient level of accuracy to be used independently, so this method does often require additional manual editing.

For any ASR beginners, we like to recommend YouTube’s automatic caption generator, because it will automatically align the transcript and the audio, breaking it up into correctly timed caption frames so it’s user-friendly if you’ve never done anything like that before. But no matter what platform or what software you’re using, certain factors will aid in a more accurate first-round transcription, like high-quality audio, clearly-spoken language, and one speaker speaking at a time.

Now, if you don’t feel like worrying about doing any of this, you could always outsource the work to a captioning vendor, like 3Play Media. And we like to say that we combine the best of both worlds using proprietary technology with real human editors to provide top-notch video accessibility services. But no matter what company that you decide to work with, you should always be looking for a minimum guarantee of 99% accuracy because that’s the industry standard for accessible captions.

So speaking of accuracy rates, the quality of closed captions is crucial in determining accuracy. And like I said, the industry standard is 99% accuracy, which only allows for a 1% margin of error. And to put this into practical terms, if you have a 10-minute video file containing 1,500 words, a 1% margin of error means there can only be 15 errors total. And it would be nice to strive for perfection. But unfortunately, a 100% accurate caption file is not necessarily attainable, no matter whether you’re using machine technology or if you’re using a real human captioner.

And there are also different ways of writing your captions, like verbatim or clean read. So if your video contains scripted content, you’ll probably want verbatim captions. If you’ve ever turned on captions for a Netflix show or some kind of movie, you’ll see that the “um’s” and the “uh’s” are included in the captions. Because oftentimes, they’re intentional and they’re included in the scripted dialogue. Whereas for a lecture or a live presentation, like this one, you’ll want clean read captions, which will eliminate any of those filler words so that viewers can clearly read what’s actually being spoken about.

No matter what kind of captions you want, there are some general frame requirements to make sure that the captions are actually legible. Each individual caption frame should be around 1 to 3 lines with 32 characters per line. And like I said, they should always be time-synchronized, but they should otherwise appear on screen for a minimum of one second to make sure that viewers have enough time to read.

And design-wise, it’s best to use a non-serif font and typically place captions in the bottom center of the screen. But they should be moved if they’re blocking any other important elements, like video text or something that might be important to a viewer’s understanding.

And as for silences or any long pauses in dialogue, you’ll actually want the captions to disappear for a moment so that viewers don’t mistakenly believe that anything is going on when it’s not. But if you want more resources or more guidance on what you should or shouldn’t caption, we’re going to send some links in the chat to resources that you can check out, including The Described and Captioned Media Program, or the DCMP, as well as standards from the FCC and the Web Content Accessibility Guidelines, which are very, very helpful for digital accessibility. And I’ll get into those in just a little bit. I’ll take a sip of water here before I move on.

All right. So after you’ve written your captions, you need to publish your captions and there’s a variety of ways to do this as well. The most common way is as a sidecar file, which is essentially an additional file that stores the caption information that will be uploaded alongside your video file. So if you’ve ever uploaded a video to YouTube and then attached a separate file for captions, you’ve already uploaded a sidecar file. This gives users the ability to toggle the captions on or off, so they can choose whether they’d like to view them or not.

Another way to give viewers this ability is to encode the captions directly onto the video, and this is commonly found on kiosks or offline video. Whereas, on the other hand, open captions are burned into the video and cannot be turned off. These are used in circumstances, like for example, social media videos, some platforms like Instagram and Twitter don’t allow or don’t support the upload of an additional text file. So writing captions directly onto the video that you’re uploading to these platforms is a great way to sort circumvent that.

And finally, integrations are like an automatic publishing process for your captions. They’re a preset workflow between your captioning process and your video publishing process so that everything stays all together in one and can be uploaded as a package.

All right. So now, you’ve learned all about closed captions. We’re going to dive into live captions. So closed captions are used for prerecorded content. And, as the name suggests, live captions are used for live content happening in real time. So some examples are webinars like this one, fitness classes, or online learning environments. But similar to closed captions, live captions ensure the accessibility of your live event to deaf and hard-of-hearing individuals as well as work to make your content more engaging to everyone.

The two primary ways to create live captions involve the use of an automatic software, similar to the ASR software that we talked about earlier but a little more advanced, or a human stenographer. In either case, depending on the platform that you use, there’s going to be slight delays and caption appearance, which we call “latency.” And this is just to allow for the ASR software to process the words or for the human stenographer to type the words. And I’ll go over that a little bit more in just a minute here.

But when discussing live captioning, these are some important terms to know before we get into anything too technical. So “LPL” stands for “Live Professional Captioning.” This implies the use of a human professional who’s trained in real-time caption production somewhere in the workflow.

Another method of live caption production is called “CART,” “Communication Access Real-Time Technology.” This is a service for live captioning that also involves the use of human stenographers who work remotely. And then finally, we have ASR, which, again, is “Automated Speech Recognition.” This is used a lot in live automatic captioning solutions, and this would imply that there is not a human involved in the caption creation process.

So unsurprisingly, just as with closed captions, the quality of live captions is a determining factor in the accuracy of live captions. So in pre-recorded content, we talk about that 99% industry standard. But since live captioning is happening in real time, the accuracy rate can be and often actually is a little bit lower. But we still recommend striving for 80% to over 90% accuracy, and certain factors can aid in improving your caption quality.

So some things that can affect this include filler words like “um’s” and “uh’s,” homophones or words that sound like other words can cause substitution errors and sometimes distort the meaning of sentences, background noise in the external environment has an impact, and then whether or not humans are involved in the caption creation process also has an impact.

So typically, humans tend to be more accurate but commit more omission errors, which means they’re more likely to skip words in order to keep up with the speed at which they need to transcribe. Whereas, by comparison, ASR or automated technology can be less accurate, but it won’t skip words. It will substitute words. So instead of omission errors, you’re more susceptible to substitution errors where the software will just pick a word that it thinks it picked up and replace it for whatever it did not understand. Live automatic captioning also does not include speaker identifications or non-speech elements, so those would have to be added manually as well.

Another element affecting live caption quality is latency, which I mentioned earlier. Since live captions are generated in real time, there’s always going to be a little bit of latency just to allow for the machine to process the words and spit them out as well as for a human to process the words and write them down.

Average latency should be around three to five seconds, but this can also be impacted by things like streaming equipment and overall connection. Currently, there are no legislative guidelines or governing body for live captioning specifically, but some states do have standards in place regarding live caption quality in instances of live court reporting or sports events.

But to sort of expand a little bit on the factors that can affect caption quality and caption accuracy, these are some best practices that can help ensure that, no matter what kind of live event you’re hosting, you can improve the quality of your captions. First and foremost, a strong network connection never hurt anybody. We all practically live online now, so we’re very familiar with the need for a strong Wi-Fi signal. If you really want to make sure your Wi-Fi never goes down, you could use a wired ethernet cable for your internet connection.

Another factor that would help is good quality audio. So if you have a microphone or a headset to use, if and when appropriate, that would be great to use. It would enhance the quality and make the reception a lot easier. But if you don’t plan on investing, our next recommendation is to monitor your surroundings. Do your best to work in a quiet space, because computers aren’t as good at people at filtering out background noise. And also, try to have a single person speaking at a time. Just like many of us, computers would be confused by many people speaking at once.

So now, you have the background, the basics of closed captioning and live captioning. But what are the benefits? There are a lot of reasons that you should be captioning. Accessibility is the biggest one. You’ve already heard me talk about it a lot. More than 20% of Americans have hearing loss, and your content is inaccessible to them if you’re not providing captions. But not only are you making your content accessible to those who require this accommodation, but a 2019 Verizon study found that 92% of users are actually watching video on mobile with the sound off. And if you’re not providing captions, users could just be scrolling right past your content.

Video accessibility also offers tremendous benefits for things like improving your SEO, user experience, brand awareness, and engagement, because it makes your content more engaging and easily searchable. It can also be used in learning environments. Captions and transcripts can be used as a focus tool or repurposed as study guides to boost classroom comprehension.

And finally, one of the biggest benefits of providing captions is that you avoid any legal trouble. So I’m going to go over some of the most important accessibility laws. Try not to get too bogged down. Remember, this is recorded, so you don’t have to take notes. You will have this retroactively if you’d like to come back and reference it. But the first major accessibility law in the US was the Rehabilitation Act of 1973.

It has two sections specifically impacting accessibility, Section 504 and 508. 504 is a broad anti-discrimination law that requires equal access for disabled individuals for federal and federally-funded programs. Section 508 requires that federal communications and information technology also be made completely accessible. And there is a Section 508 Refresh in 2017, I believe, that specifically referenced the Web Content Accessibility Guidelines, or WCAG, which is the first real reference to digital accessibility standards. It’s very, very exciting, but we’ll get into WCAG in a little bit more detail in just one second.

The second major accessibility law in the US was the Americans with Disabilities Act. Again, there are two sections specifically impacting video accessibility, Title II applying to public entities and Title III applying to places of public accommodation. So places of public accommodation also include private organizations that offer a public accommodation, so a hotel, or a restaurant, or a library.

And this term “place of public accommodation” has actually been tried in a lot of legal contexts in terms of its application to online spaces, and places of public accommodation have often been extended to include online spaces. One of the most frequent examples that we give in relation to this is Netflix. Netflix was sued for a lack of providing closed captions. But because their streaming platform is considered a place of public accommodation, they were then required to provide captions and they did.

So moving on, the third most important accessibility law in the US is the 21st Century Communications and Video Accessibility Act or the CVAA. And this applies specifically to online video that has previously been broadcast on television. It requires that it must be captioned when uploaded online. And in terms of audio description, this document references rules set forward by the FCC in 2000. And the FCC is sort of the governing body on captioning quality standards for broadcast media. So again, we sent a link to reference that if you want to check out more about what their standards require of captions in a legal context.

But a little bit more about the Web Content Accessibility Guidelines, this is an international standard set of guidelines that helps make digital content accessible to everyone, including those with disabilities, and outlines best practices for how to make your content universally perceivable, operable, understandable, and robust. But I’ll spare you all the details of how to do exactly that. On the screen, you can see that WCAG is divided into levels. And there are certainly more than this, but you can see some examples of accessible features that are included in each of these levels.

And I’ll give you a brief overview of how these levels apply. So Level A is the easiest to maintain. This is the easiest to satisfy. Level AA is what most people are aiming for and what you should aim for. This is the mid-level of standards making pre-recorded and live content accessible. That’s the biggest difference between Level A and Level AA, is that the second level requires live content be accessible as well. And Level AAA is the most comprehensive with the highest accessibility standard.

So again, I’ll spare you all the details of that very comprehensive document, but we did send the link and it’s a great set of standards for digital accessibility on anything and everything. I have reached the end of my scheduled presentation. But I do want to share that, if you’d still like to learn more, we offer a lot of really great free resources. On our website, you can find weekly blogs, free ebooks, checklists, and research studies. We also host monthly webinars, which I’ll share more about on the next slide.

And we also have a podcast, Allied, where you can hear from accessibility professionals across industries on a variety of topics, like the accessibility of gaming, architecture, the auto industry, and more. So to learn more about that, we’ll send a link to our website in the chat. These are some of our upcoming webinars. I’ll leave those up on the screen. And I think that we can go ahead and pivot to some questions in just a moment. Let me take a look back through the chat here.

It seems like my counterpart has done a really good job of keeping people’s questions answered. I will go ahead and stop sharing my screen. It seems like we’ve got a few in. Will you be doing a similar webinar for audio description? Yes. We actually do host an Intro to Audio Description webinar on a recurring basis. I do not know, off the top of my head, when the next one is coming. But we did just send the link in the chat to our website where you can definitely find all of our upcoming webinars. And I don’t even remember the last– [AUDIO OUT]

Someone asks, would captions still be required if no audience requests it for a live event? This is where it gets a little tricky because, like I said, there is currently no governing body for live captioning specifically. But we always reference back to the Web Content Accessibility Guidelines. We always recommend you should have live captions, because not only are they helpful to those who may request them but they can be downloaded as a transcript and used for content in the future. They can also be repurposed as handouts, study guides, if any of that is applicable to the content that you would be presenting.

Oh, it looks like our next Intro to Audio Description webinar is August 11 at 2:00 PM. That was just put in the chat if anyone missed that. The upcoming webinars are on our website. Let me go ahead and grab– [AUDIO OUT]

I had it up on my computer, and then I closed the tab. That’s how it always goes. You need it right after you close it. Let’s see. Perfect. It’s right in the chat. So upcoming webinars can be viewed on our website. Any recommendations for ensuring names and non-English words are spelled correctly in English language captions? That’s a really great question.

So at 3Play, I can only speak to our process and what I am familiar with, but we solve that by allowing users to upload word lists. So whoever you may be working with, if it’s possible to create a list of words that you think are likely to be misspelled or misconstrued, then that can help cut back on the errors that you might receive in people misunderstanding things that they’re not familiar with, such as industry terms or, like you said, speaker names.

We have another question. A theater artist with a question about the CVAA, does this legislation apply to streaming video created from live staged performances, something that’s becoming more commonplace due to the pandemic? That’s a really great question. I don’t know if we’ve received that question to that level of specificity before.

Reach out to us at 3Play if you would like to follow-up on that, because I don’t know that I have the answer to it off the top of my head. I would like to assume that yes. Again, we’re 3Play. We’re always going to suggest that you should include captions no matter what, so we would always err on the side of caution. But that’s a really great question. I haven’t thought about it specifically in that context.

Another recommendation I would give you, if you’re interested in theater arts, is to listen to our podcast episode. We invited someone from the Lincoln Center for Performing Arts to speak about accessibility in that space, so if you’re interested in learning more about that.

Is there any software that you would recommend for creating captions for prerecorded video, aside from doing it in Final Cut or Adobe Premiere? You can actually create your own captions using your computer. You don’t need a special program. I believe, on Mac, it would be Text Editor. And on PC, it may be something different.

That’s perhaps more complicated than you would like to go, so we typically recommend the YouTube caption generator. If you upload your video, they will automatically parse your transcript, timecode things. And that’s a little bit more user-friendly than just trying to type out your captions. Someone asked again what our podcast is called. It’s called Allied Podcast. Yes, thank you. Someone just sent the link in the chat to that episode that I referenced.

There will also be a blog in the chat. We’re looking for a link about DIY podcast creation. Not podcast creation. Pardon me. Caption creation. DIY caption file creation. It can be done through SRT files or through YouTube. We’re just trying to find that blog link. There we go. That just populated in the chat, so that’s another way that you can create captions on your own. Like I said, SRT files may be a little bit more heavy-duty than using something like YouTube, which can automatically create that first round for you. But again, we still recommend a human editor takes a look through that to make sure that it’s truly accurate.

All righty. And it looks like that’s all we have time for today. Thank you guys for asking so many questions and being so involved. We hope you enjoyed this presentation. Remember, you’ll be receiving a recording as well as a link to the slide deck. And if you need any more resources, don’t hesitate to reach out. Thank you all for joining us.