« Return to video

How Non-Profit Organizations Can Create Accessible Video [TRANSCRIPT]

SOFIA LEIVA: Thank you, everyone, for joining me today in the webinar How Nonprofit Organizations Can Create Accessible Video. My name is Sofia Leiva, and I’ll be presenting today. Today, we’re going to cover the basics of accessible video, how do you create accessible video, why should you create accessible video, who is 3Play Media, and then head over for the Q&A.

There’s many different creative ways that people can use videos. For example, here at 3Play, we use them for webinars. Many nonprofit organizations also do webinars, and they also record events and upload those videos to their websites. Many nonprofit organizations also make promo videos that they use to disseminate on their websites and things like that.

Now, I’d first like to start this webinar by talking about what accessibility actually is. So in order for something to be accessible, it must offer an equivalent experience to everyone, including those with a disability. This can refer to a physical location, but in the context of online accessibility, it refers to a disabled user’s access to electronic information. The content and design must provide the most convenient and all-encompassing experience possible to prevent any level of exclusion.

And you’ll see accessibility often referred to as A with 11Y, and this is just a another term for accessibility. And it depicts how there are 11 letters between the A and the Y in accessibility and also can be used to describe the word “ally” because we’re all allies of accessibility.

Now, the formula for accessible video is pretty simple. What you need is captions, audio description, and transcripts, along with an accessible video player. Now, if your video player it doesn’t allow you to publish captions or audio description or even transcripts, there are workarounds that we’ll talk about later on so that you can ensure that your videos are accessible.

Let’s first start talking about what are captions. Captions are a time-synchronized text that can be read while watching a video and are usually noted with a CC icon. Captions originated as an FCC mandate in the 1980s, but the use has expanded to online video and internet applications. Captions assume the viewer can’t hear, so they include the relevant sound effects, speaker identifications, and other non-speech elements to make it easier for the viewer to understand who is speaking.

Now, it’s important to distinguish between captions, subtitles, and transcripts. Captions assume the viewer can’t hear the audio. They’re time synchronized, and they include the relevant sound effects. You can spot if a video has captions when you see a CC icon.

Subtitles, on the other hand, assume the viewer can hear, but can’t understand the audio. The purpose is to translate the audio, and, like captions, they’re also time synchronized.

And transcripts are just a plain-text version of the audio. It’s not time synchronized, and it’s good for audio-only content. So if you’re creating podcasts, using transcripts is a way to make it accessible.

In the US, the distinction between captions and subtitles is important because they mean something different. But around the world, you’ll see that “subtitles” and “captions” are sort of used synonymously to describe captioning.

Now, what was the first television show to ever air with closed captions? Just a little fun trivia. If you know, you can chat it into the chat window. But the first show to ever be captioned was Julia Child’s The French Chef.

So how do you create captions? There are a few ways that you can create captions. You can do it yourself. You can use an automatic speech recognition software, or you can use a captioning vendor.

The first way is doing it yourself. And there’s many different programs that you can use and techniques that you can use to ensure that you create accurate captions. One thing to note, though, is that when you are creating captions yourself, it can be very time-consuming. So just be a little bit aware.

Now, the way that we recommend for you to do your own caption is to first use an automatic speech recognition software to transcribe the video. Now, automatic speech recognition, as we’ll talk in a little bit, can be highly inaccurate. So you want to make sure that you go back and you edit it and you capture the speaker identifications and the non-speech elements so that it can be accessible.

Now, some softwares will time it for you so that it’s time synchronized. Otherwise, if it’s not, then you’ll have to do that manually. And that can also be a very time-consuming component.

And then after you have your transcript all ready and your captions, you want to convert it to the format that is accepted by your video player. And that’s something that you can check with your video player. The most common is something called an SRT file. So you’ll see a lot of videos like YouTube and Brightcove accept an SRT file.

Now, one way that we recommend for you to create your own captions is to use YouTube, because they have a software that automatically transcribes the caption for you. And then all you have to do is just go back and edit it because it also time-codes it for you, so it’s very easy. And the last thing is they also allow you to download the caption file. And you can download it in various formats. So then you can later use that to upload to whatever video player that you use.

Now, if you select to use a captioning vendor, we use sort of the same process that we recommend for you as an individual to caption it yourself. We first start with an automatic speech recognition software, and then we have two rounds of human cleanup just to ensure that it’s highly accurate.

Now, let’s talk about caption quality. When it comes to captioning, it’s important to follow the best practices for caption quality. The industry standard for spelling is in the 99% accuracy. And the 99% accuracy, though close to perfection, means there’s still a 1% chance for error. So in a 10-minute file with 1,500 words, there is a leniency for 15 errors.

Now, if your video is scripted content, then you’ll want to ensure that all the words are verbatim. So for a broadcast, you want to include ums and ahs because it’s scripted. But for lectures or live captioning, a clean read is preferable, meaning you’ll want to eliminate those filler words of ums and ahs.

Now, each caption frame should have around one to three lines with 32 characters per line, and the best font to use is a non-serif. You also want to ensure that they’re time-synchronized and last a minimum of a second on the screen so it gives the viewers enough time to read.

And another key thing to keep in mind is the caption placement. Typically, captions are placed in the lower center part of the screen, but should be moved when they’re in the way of important text or elements in the video.

Now, I just want to quickly show you the distinction of why accuracy matters. So what you’ll see on the screen is going to be a transcript that was transcribed by an automatic speech recognition software, and I just want you to pay attention to the differences between what is actually being said and what is noted in the transcript.


– One of the most challenging aspects of choosing a career is simply determining where our interests lie. Now, one common characteristic we saw in the majority of people we interviewed was a powerful connection with a childhood interest.


– For me, part of the reason why I work here is when I was five years old, growing up in Boston, I went to the New England Aquarium. And I picked up the horseshoe crab, and I touched a horseshoe crab. And I still remember that, and I still– I love–


SOFIA LEIVA: So at 3Play, this is why, in our process, we include human editors because often computers aren’t smart enough to distinguish between “horseshoe crab” and “four story.” Or for example, in this transcript, there was one where they said “New England Aquarium” and it was transcribed to “the new wing of the Koran.”

So you want to really make sure that you go and you edit the automatic speech recognition transcripts that you create when you’re creating captions, because these inaccuracies can really affect your brand. It can really affect the comprehension of the content. It can really affect the accessibility of it.

So how do you publish captions? So there’s several ways that you can publish captions. The first is by a sidecar file, which is basically what you would upload to a video player, like a YouTube or Brightcove or anything like that. And the most common, like I said before, is an SRT file.

You can also encode your captions into the file or onto the video. And so for example, this can be used for a kiosk or if you’re sharing a video offline that you want to make sure it’s accessible. One big use case of encoded captions are for social media, because many social media accounts like Instagram and Snapchat don’t allow you to have a separate caption file. Many brands will encode them onto the video itself, and so that way there are just always captions available on that video.

Another way is open captions, which is just when you share something like a CD. You can turn the captions on and off.

And then lastly, it’s through integrations, which we’ll cover later on. But essentially, this is just a link between your caption provider and your video player, and it allows you to automatically publish those captions onto your video.

Now, why should you caption? There are many reasons why you should caption. The first one, as we discussed first, was for accessibility. Because there’s a big percentage of the population that is deaf or hard of hearing, you want to make sure that your videos are accessible to them.

Another factor to caption is because it makes it more accessible to the whole population. So a study found that 41% of videos are incomprehensible without sound or captions, which means that if someone doesn’t have headphones or is watching their video without sound, they’re unable to understand what your video is saying.

Now you may be wondering, are people even watching video with the sound off? And in studies that we’ve found, they actually are. So Facebook uncovered that 85% of videos are watched with the sound off. And this means that if your video relies heavily on sound, a lot of people are going to probably scroll past it.

Video accessibility has tremendous benefits for improving SEO as well, the user experience, your reach, and your brand. And a study by Liveclicker found that pages with transcripts earned an average of 16% more revenue than they did before transcripts are added. And according to another Facebook study, videos with captions have 135% greater organic search traffic.

Other reasons to caption is because captions help improve your brand recall. They help improve verbal memory and behavioral intent. And this was from a study by the Journal of the Academy of Marketing Science.

And then another study by OSU– they uncovered the captions are really beneficial in the educational space. So they found that the majority of students– 98.6%– said that they found captions helpful. Many reported that predominantly, it’s because it helps them focus, and many students also said that they use captions as a learning aid.

So you can imagine, if you have really complicated context with really difficult words, captions can really help clarify that. And it can help students or people watching your videos know where they are within a section, quickly go back, and make sure that they understood what the speaker was saying.

Other reasons that captions are really beneficial is because it allows you to include cool tools, like, for example– this is called a “playlist search.” So because the videos are transcribed, you can create this sort of playlist with all your videos. And people can go and search for keywords, and then it’ll show you the videos that have that keyword and allow you to jump to that video and watch from where that word is mentioned. And this is just another way of making your videos interactive.

Another tool is called the interactive transcript. And we use this a lot on our websites with our webinar recordings. And essentially, this is a time-synchronized transcript that highlights the words as they’re being spoken in a video. The transcript is hooked up to the video player, and they work in tandem to deliver an interactive experience for the viewer.

An interactive transcript grants the viewer the ability to search within a video, and then viewers can simply type a search term into the search bar to see every location where the keyword is spoken within the transcript. And if you click on that search word or if you click within the transcript, the video will automatically jump to that section where the word is being spoken. So this is just another way of making your videos really interactive, really providing more value to your users and viewers.

Now let’s dive into what is audio description. So audio description is an accommodation for individuals who have vision loss around the world. So there are 245 million people who have some kind of vision loss. And so audio description really makes them able to understand the videos that they want to watch.

Essentially, what audio description does is it describes what’s going on in the video. And I’m going to play two quick clips, and what I’d like you to do is to close your eyes and see if you can distinguish between the one that’s described and the one that’s not described.




– Hello.













– From Tangled and Wreck-It Ralph, Disney. A carrot-nosed, coal-eyed snowman shuffles up to a purple flower peeping out of deep snow.

– Hello.



– He takes a deep sniff.





His nose lands on a frozen pond. A reindeer looks up and pants like a dog.



SOFIA LEIVA: So if you noticed, the described version tells you exactly what’s going on in the video. It paints the picture so that when you close your eyes, you can really visualize what is going on in the video. So this is what audio description is.

It’s often described as a newscaster narrating a game over the radio. It’s going to tell you all the relevant visual information, and it’s going to often describe– well, like I said, similar to a sports announcer on the radio. And like captions are demonstrated with a CC icon, audio descriptions are going to be shown with an AD icon.

Now, there is a difference between standard and extended description. You may be wondering, what if there’s not enough natural pauses within the video to include descriptions? One way around that is extended, where it allows you to pause the video to allow for longer descriptions wherever it’s needed.

So how do you create audio descriptions? As with captions, there are many ways to create audio descriptions. The first one is, for example, if you have a talking head video and it’s just a professor talking on the video camera, there is not really a lot going on. So you wouldn’t want to– there’s not much to describe.

But if he’s showing visuals, if he’s showing slides, that kind of thing, then you those are the things that you want to include because they add value to the video and they add value to the person watching. And so if you’re recording a video, one good practice to do is to just narrate the visuals as you’re recording the video because then that way you don’t have to go back and add the audio descriptions.

Another way is to create a text description or a WebVTT file. It’s essentially going to be like a transcript, but you’re going to include the descriptions within the transcript. And that’s something that you can link to from the video that you’re showing, and someone can go read that through their screen reader. Or you can put it into your video player, and some video players that allow for audio description will be able to distinguish that WebVTT file and read out the audio descriptions.

Another way is to have your original audio and then also create a separate one with the voice descriptions and merge those together. And sometimes, you can include those as a separate video, link to it, that kind of thing. Or you can use a professional vendor as well.

So traditionally, the cost of audio description, if you outsource it, is around 15 to 75 minutes per video, so just be mindful of that. At 3Play, we’ve worked to sort of lower that cost and have really innovative solutions for audio description.

We used a synthetic voice, sort of like Alexa kind of voice, to cut down the costs, so it’s not going to be like the Frozen video where it’s very cinematic. This is more intended for education spheres, that kind of thing, where you don’t need to spend extra on the cinematic innovation. The pros of using a synthesized speech is that it’s faster to process. It costs less, and it gives the ability of the users to manipulate the speed of it.

There are some cons. Because it’s a robot, you kind of lose that cinematic element of audio description. So those are just some things that you have to also be mindful.

So how do you publish audio description? Unlike captions, many video players don’t support audio description. Some that do are the Able Player, the OzPlayer, Brightcove, JW Player, Kaltura, and Ooyala.

There’s workarounds to that. For example, at 3Play, we have something called the 3Play Plugin that allows you to publish captions. But other ways that you can do it is you can provide a secondary video with the descriptions. You can link to a secondary audio track. You can have the WebVTT track as well, or you can also include a text-merged-only transcript version.

Like captions, there are many benefits to audio description as well. At the beginning of this presentation, we talked about how much video is being produced and published. And with that said, we have to make sure that we’re complying with all the laws and to make video accessible to people with disabilities. And in this case, audio description makes a video more accessible to people who are blind or low-vision.

However, there are many benefits beyond that for the ones listed here. So for example, audio description allows users to view videos in an eyes-free environment. So many people say that they view Netflix with audio descriptions on, sort of using them like audio books. Individuals in the autistic spectrum find the audio description helps them to better understand emotional and social cues, which are only demonstrated through actions or facial expressions.

Listening is a key step in learning language and associating it with appropriate actions and behaviors, so the visual component of the video combined with the audio narration can help with the language development. The research on how the brain processes information reveals that there are two key channels– auditory and visual. So the visual component of the video combined with the audio narration can help with learning.

And it’s not listed here, but sometimes audio description is also required by law. And in that case, it benefits compliance.

So let’s talk a little bit about the accessibility laws. In particular for nonprofit organizations, the Americans with Disabilities Act is going to be a major accessibility law to follow in the US. And the Americans with Disabilities Act, also known as the ADA, has two sections that impact video, and that’s Title II and Title III. Title II applies to public entities, and Title III applies to places of public accommodation.

And although the ADA doesn’t have clear web accessibility regulations, it doesn’t mean that you can’t get sued for not having accessible video. In fact, the uncertainty in the law leaves space for the interpretation by judges. And in recent years, the law has extended to online accommodations through case law.

Because many nonprofits supply resources that are open to both private members and members of the public, they should assume that they fall under the category of place of public accommodation. And because of the way the courts have ruled on past digital accessibility cases, these organizations should work towards accurately captioning all video content to avoid potential accessibility lawsuits.

Another important sort of element of web accessibility that you should know is called WCAG, and this stands for Web Content Accessibility Guidelines. These are just guidelines. It’s not a law, but it’s referenced by many laws and lawsuits as sort of ways to make sure that your online content is accessible.

And there are three iterations of these guidelines, but the most widely used is going to be called WCAG 2.0. WCAG is split into three levels, so there’s level A, level AA, level AAA. And each one gets progressively more comprehensive, requires a little bit more work to make accessible, but it’s definitely not unattainable.

So for example, with level A, you want to include transcripts for audio-only content– so, for example, podcasts. You want include captions for pre-recorded video, and you want to provide an audio or text alternative for audio description. In level AA, you want captions for pre-recorded, captions for live, audio description for pre-recorded video. And level AAA, you want to have a sign language track, extended audio description, and live transcript for audio only.

And many laws and lawsuits are going to mention these WCAG laws. And in particular, they’re going to require people to strive for WCAG level AA as a level to meet for accessibility compliance.

Now quickly, I’ll just dive into who is 3Play Media. We’re a video accessibility company that spun out of MIT in 2007 and are currently based in Boston. We provide a range of video accessibility services from closed captioning, live automatic captioning, subtitles, and translations, and audio description. We work with a range of industries in many different industries, and our goal, really, is just to make video accessible and make it easier for you.

We offer a range of tools, a robust account system, fast turnaround options. We offer a lot of automated workflows, one-click search solution tools, things like the 3Play Plugin, the interactive transcript. Those all come free to you. And we also have an easy solution for audio description.

And more than anything, we love to educate people and provide resources on accessibility in video, so we do have a lot of free content on our website that you can peruse. We have blogs. We have free white papers, checklists, research papers, everything that you can download for free. We hold a lot of monthly webinars with accessibility experts, and we’re also releasing a free online video accessibility certification very soon. And if you want to learn more about that, you can visit 3playmedia.com/certification.

All right. That’s all I had. I would love to invite you now to– if you have any questions, we have a couple of minutes left. And I’d love to get to as many as possible. So if you have any questions, please type them into the Q&A window, and I’ll make sure to answer.

So someone is asking about our cost. And so our captioning starts at $2.50 per minute, but depending on how much content you have, that will definitely go down with bulk discounts. And that’s something that you can talk about with our sales reps. They’ll be happy to reach out and sort of work around your project availabilities.

We had another question about the legal requirements. So someone is asking, is there any legal requirement or accommodation for voice type? And I believe that’s an audio description, and the answer is no. However, it’s suggested that an audio description of the voice be distinguishable from other voices so that it’s not jarring or distracting. And so for example, if you have a male voice, you want to have a female voice that narrates the audio so it’s very distinct.

Someone is asking, what about live? They do annual conferences every year, and how would that work? So currently, our live solution is only for online content, so if you’re doing a Facebook live or a YouTube live.

I would say for an annual conference, because there’s a lot of factors that go into live captioning, our live solution would not work for that because it’s automatic. And so with automatic, you want to make sure that you have a quiet room where it’s very clear and the automatic recognition software can understand that. So I would recommend for annual conferences to use a live captioning solution where they can be there. Or they can also be remote and also have a human there to really make sure that it’s accurate.

And with that, there’s various live captioning providers. We use AI-Media for our webinars with live captioning. They’re super great, and they can work out with you the kinks of how that would work.

Someone is asking, how does an integration work? So I had mentioned that earlier. And so integrations are going to be a disparate system or platform that make it easier for you to share information between two different workflows. And so our integrations are designed to sort of automatically upload your captions to your videos so that it’ll [AUDIO OUT]

Hello? Someone said we lost audio. Are you able to hear me?

Someone mentioned that we lost audio, so I’m just going to switch to my computer to quickly answer this last question. So someone asked if 3Play offers subtitling in non-English languages, and the answer is yes. We offer for a plethora of different languages, and that’s something that you can– I can put you in touch with a sales rep, and we’ll be happy to discuss your project needs and get you the pricing for that.

And if you do have any questions, I recommend reaching out to sales@3playmedia.com. And they’d be happy to help you get started with a demo, answer any questions you may have, and also connect you with the right rep to help you with your projects.

So thank you, everyone, for joining me today. I would love to invite you to check out our sources on our website. We have a lot of really great stuff. If you have any questions, you can feel free to email me at sofia@3playmedia.com. That’s S-O-F-I-A, and I’m happy to help answer that. Or reach out to our sales team at sales@3playmedia.com.