« Return to video

Intro to Audio Description [TRANSCRIPT]

SAMANTHA SAULD: Hello, everyone. And thank you for joining me today for Intro to Audio Description. I’d like to take a moment to introduce myself. My name is Samantha Sauld, and I’m a content marketing specialist at 3Play Media. I create digital content on all things in accessibility, and my goal is really to just make the topic of accessibility approachable and digestible for everyone. A bit about me outside of work– in my free time, I love to immerse myself in all things health and wellness, I love to read, and I love cooking as well. And if you want to reach me after this presentation, I can be reached at samantha@3playmedia.com.

So before diving in, just a quick word about 3Play Media and what we do. We are a video accessibility company providing our customers with many services, including closed captioning, live captioning, transcription, subtitling and translation, and audio description services. We have an easy-to-use online platform where you can manage all of your video files from one place, and we have a variety of plug-ins and integrations for captioning and audio description that really help to simplify the process of creating accessible videos. Essentially, we’re a future-proof solution for all your video accessibility needs. And without further ado, let’s get started with the presentation.

So this presentation will go over several things, including what is audio description, how to create descriptions, the benefits of audio description, the legal landscape, integration, and then finally, we’ll end with how to publish description. And I’ve also incorporated several examples of audio description throughout this presentation so that you can get an idea of what it is and experience it for yourself if you haven’t before.

So the first question at hand is, what is audio description? So audio description is an accommodation for blind and low vision viewers. It’s a secondary audio track that plays in addition to the main audio track, and it’s often represented by an AD icon that is similar to the CP icon you would see for closed captions. But rather than tell you about it, I’ll show you some examples on the next slide.

So the video on the screen is undescribed. The thumbnail is a scene from the Disney film Frozen, with the snowman Olaf and the reindeer Sven. For this video, try watching without looking at the screen, and close your eyes, or just look away and see if you can figure out what is going on in the scene based on the audio alone.





– Oh. Hello.













OK, so for me at least– and I imagine for many of you, it’s difficult to understand what’s happening in the scene if you can’t see the screen. There isn’t any real dialogue to provide context. Really all we have to go off of are some verbal expressions and the musical track. So now let’s watch the described version of the same clip. This time you can close your eyes, you can look away, or look at the screen if you prefer, and listen to the audio description to see if it’s easier to picture what’s happening.


– From the creators of Tangled and Wreck It Ralph, Disney. A carrot-nosed coal-eyed snowman shuffles up to a purple flower peeping out of the deep snow.

– Hello.

– He takes a deep sniff.


His nose lands on a frozen pond. A reindeer looks up and pants like a dog.


Seeing the reindeer slip on the ice, the snowman smiles and moves towards him. Though, actually he’s running on the spot. The reindeer falls on his chin.

The snowman uses his arm as a crutch. The reindeer paddles his front legs. Head over heels, the snowman over the ice.

The reindeer does the breaststroke. The snowman rolls his body but flips onto his back. The reindeer’s tongue sticks to the ice.

The snowman throws his head. Twig arm and reindeer lips tug at the carrot.


The carrot flies off and lands in soft snow.

The reindeer goes after it with snowman and his body parts hanging on his tail.

The snowman puts himself back together again and glumly contemplates his noseless state.

The reindeer jams the carrot back in place and pants like a proud puppy.


The snowman pats him with a stick [INAUDIBLE]. Then he goes to sneeze. He grabs his nose with both hands.


His head shoots off.

Frozen, coming this winter in 3D.


SAMANTHA SAULD: So this time around, the description really makes up for the lack of dialogue in the scene. It does a great job of visually bringing these fictional characters to life like when the narrator says, the reindeer looks up and pants like a dog. The viewer knows that Sven, the reindeer, is kind of goofy. And the description paints a great picture of this whimsical scene.

I particularly like the description when the narrator says, the snowman puts himself back together again and glumly contemplates his noseless state. I think it’s a very creative description, but describes perfectly what’s happening on screen. I hope that you enjoy this example, and all of this is just to give you a glimpse of what audio description is about.

And the purpose of audio description is to narrate the relevant visual information, to describe the characters, what the scene looks like, and the actions that are going on, as you just witnessed in the Frozen example. It paints an image of the visual for those who can’t see the screen, such as blind and low vision viewers.

You could think of audio description as something similar to a sports broadcast. Up on the screen here I’ve got an image of a football player holding a football and running with it. Imagine for a moment that this player has just scored a touchdown. In this instance, the closed captions on screen would read something like, audience cheering, because the audience would be cheering.

And the broadcast may sound something like, there’s a pass to the player downfield. He catches the ball and touchdown, or he’s passed the five yard line and touchdown. The idea is that you could follow along with the game without having to see the screen. You could be in the kitchen with the game on in the living room and still have a great sense of what’s happening, and that’s similar to the goal of audio description, where one should have a great sense of what’s happening in the program without needing to see the visuals.

So there are two types of audio description, standard and extended. The Frozen example was an example of standard audio description. The audio description snippets were able to fit in the natural pauses within in the video, and since there was no dialogue, there was a lot of space to insert descriptions without interrupting the scene.

Extended audio description allows you to add pauses to the video to make room for description as needed. So if your content is packed with dialogue, extended is a great option. It could be useful for more dense and complex content, such as lectures or presentations.

So now let’s talk a little bit about how to create audio description. The first option is a more proactive solution, and you can narrate at the time of the recording. For example, in a recorded lecture, the professor can describe the visuals on the slide. If you’re creating a talking head type of video where you have a slide deck, you can narrate the visual information on the slide as you go along presenting. And this allows you to eliminate the need to go through and add audio description and post-production, so it helps to cut costs.

There are also some other solutions. So you can potentially create a text-only description, essentially writing down all the visual information that’s happening in the video and making the text available to viewers. It’s important to note that this method loses much of the cinematic detail for the viewer, and it doesn’t offer the same amount of accommodation. So this method may be considered if you’re in a pinch.

Another option is to create a text only description, and that is time-coded. Just like you would time code a caption file. So you can use this to create a web BTT file similar to captions, but for audio description. And this is supported natively in HTML 5 browsers, but most browsers and players don’t support the playing of descriptions in the same way that they support the playing of captions.

And then here’s another way. So if you created a text description and have good recording equipment and video editing software, you can also record your own voice descriptions, merge it with the source audio, and output a second video with descriptions. And as you can probably tell, this option is a bit more consuming and involves more work. And then lastly, the option to outsource to a professional description vendor is an option. And one thing to consider for this method is that it does cost money to outsource, but in the end it will save you a ton on time.

So it’s good to know that there are quality standards for audio description. This is helpful if you’re creating descriptions from scratch that you may provide the best quality for viewers. And even if you’re outsourcing, you should know what to look out for to ensure that the third-party service follows quality standards.

So for quality standards, I’m going to turn my attention to the DCMP. The DCMP stands for the Described and Captioned Media Program. It is funded by the US Department of Education and administered by the National Association of the Deaf and provides helpful guidelines and standards to follow for audio description.

The DCMP has five main measures for quality. So according to the DCMP, a quality description is accurate, meaning there must be no errors in word selection, enunciation, diction, or enunciation. They must be prioritized, meaning the content is essential to the intended learning and enjoyment of viewers. It must be equal, and equal access requires that the meaning and intention of the program be conveyed. It must be appropriate, meaning the intended audience is considered. And then finally it must be consistent, meaning both the description content and the voicing should match the style, tone, and pace of the program.

Now I’m going to show you another example of audio description for Disney’s The Lion King, and I’d like you to think about the quality of the description and if it matches the DCMP standards. Consider the accuracy– sorry about that. Consider the accuracy of the audio description track, whether it’s essential to the scene, if it’s appropriate for the intended audience, and if it flows well with the movie itself. So let’s watch the video.


– (SINGING) It’s the circle of life.

– Hundreds of animals gather at the bottom of Pride Rock, a tall, flat ledge that towers over the rest of the Savannah. Zazu, a small blue bird with a large beak, flaps to the ledge. He bows to Mufasa, a powerful dignified lion with a thick red main.

– (SINGING) Till we find our place.

– Rafiki, an elderly baboon with white hair, slowly climbs up to the ledge and hugs Mufasa warmly. They walk back to a cave where Mufasa’s wife, Sarabi, cuddles a tiny lion cub in her paws.

– (SINGING) The circle of life.

– Smiling, Rafiki bends over Simba, the baby lion, and shakes his walking stick, which has two melons tied to it. Simba swats his paws at the melons playfully. Rafiki breaks one open. The wise old baboon dips his thumb in its juice and draws a line on Simba’s forehead. Then he takes a handful of sand and sprinkles it over him. [SNEEZE]

Symbols parents, Mufasa and Sarabi, smile and lean their heads together. Rafiki, who is much smaller than the adult lions, takes Simba up in his arms. Carrying him like a baby, he walks slowly to the end of the ledge then holds Simba high in the air for all the animals to see.

– (SINGING) The circle of life.

– The antelopes jump up and wave their front hooves. The elephants raise their trunks in the air in a salute. The monkeys hop up and down, clapping with joy. The zebras paw the ground, sending up clouds of dust. High above them, Simba dangles from Rafiki’s arms, looking small and scared.

A ray of sparkling sunshine beams down on Simba like a spotlight. Far below, the animals bow down, their heads nearly touching the ground. From far away we see every animal from the savanna paying respect to their king’s new son.

– (SINGING) The circle.


SAMANTHA SAULD: All right, so I really like this audio description, particularly the detailed description of the characters and their actions in the scene. I do feel that if I close my eyes, I get a great picture of what’s on screen, and to me it feels reminiscent of what I’d hear in an audio book.

So when it comes to audio description, it’s important to paint a picture of the scene without overwhelming the viewer with too many details. I feel this practice well at balancing and sharing the relevant visual information that tell what the scene is all about so that the viewer is in the know.

From the DCMP, we learned what to describe, when to describe, and how to describe to create a great descriptions like the one we just watched. It’s a great resource that I recommend everyone reference, whether you’re making your own descriptions or outsourcing. The link is www.descriptionkey.org/quality_description.html, and that should be in the chat as well.

All right, so we’ve covered some of the basics of audio description, but the next question is why we should describe our video content. And to answer that question, I’m going to talk about some of the many benefits of audio description. The number one benefit is that it provides accessibility. It’s an accommodation for blind and low vision viewers.

In 2015, the National Health Interview Survey found that 23.7 million Americans, which is actually about 10% of the population, have trouble seeing to some extent. So audio description is a critical accommodation for these viewers to have access to video content and entertainment. Audio description also provides a lot of flexibility to be able to view videos in eyes-free environments. So if you want to play a video, but you’re going to be looking away from the screen, like maybe you’re cooking or working on your computer while the television is playing, this gives you the flexibility to still grasp what’s going on visually. People have even reported using audio description to listen to a movie or videos as audio books.

So it’s also helpful for auditory learners. Research shows that the brain processes information in two channels. There’s both an auditory channel and a visual channel. And 20% to 30% of students say that they retain information best through sound. So description is really helpful for those auditory learners.

And another sort of out-of-the-box benefit of audio description is it combats inattentional blindness, which is a phenomenon where one actually fails to recognize visual information, even when it’s in plain sight. So we often have instances where we miss key visual elements until it’s pointed out to us. An audio description can bring attention to these often missed visuals.

Audio description is also really helpful for language development. Listening is a key step in learning and associating language with appropriate actions and behavior. So this can be beneficial for children’s language development. And then lastly, audio description is required by law, so there are three major accessibility laws in the US that relate to audio description. And we’ll cover those in the next section.

So the next section is covering the legal landscape of audio description. So the Rehabilitation Act of 1973 was the first legislation to address the notion of equal access for individuals with disabilities. Section 504 applies to federal programs and programs receiving federal funding, such as universities, and these entities must make themselves accessible to those with disabilities.

This may include providing audio description for visual medium. Section 508 applies to federal programs, but can be applied on a state level through what we call little 508s. And 508 references WCAG 2.0, which has audio description requirements, and those organizations that fall under 508 must comply with WCAG 2.0. So WCAG stands for the Web Content Accessibility Guidelines, and it’s the international standard for web accessibility.

There are three levels of accessibility standards, A, AA, and AAA, with AAA being the highest level of accessibility. Audio description is required in WCAG guidelines for pre-recorded synchronized video media. WCAG also provides the best criteria for how to meet these requirements. And please take note that WCAG 2.1 is actually the most recent update, which is an extension of WCAG 2.0, but 2.1 is not referenced in any laws yet.

And then there’s the 21st Century Communications and Video Accessibility Act, or the CVAA, which enacted the goal to save an audio description requirements between 2010 and 2020. The rules on description require that major networks, primetime viewing, and children’s programming provide 50 hours of described programming for calendar quarter by 2012.

So in 2015, that 50 hour requirement expanded to the top 60 TV markets, and the FCC adopted an order that increased the requirements from 50 to 87 and 1/2 hours per calendar quarter for broadcast and cable networks. And a proposal was submitted under the CVAA to expand video description regulations by phasing them in additional markets each year for four years beginning on January 1 of this year. So you can expect more audio descriptions to come out this year.

Lastly, the Americans with Disabilities Act, or the ADA, is a broad anti-discrimination law and prohibits disability discrimination. It requires effective communication, which means providing assistive technology and services for content. Title II covers government entities and the services, activities, and programs they provide. The content and materials they offer must be accessible so as to not discriminate towards the people with disabilities. And this may include describing video content.

Title III covers places of public accommodation like restaurants, hotels, theaters, schools, and doctors’ offices. And under Title III, some precedent has been set that the ADA may apply to websites as well. There’s been a lot of action to enforce web accessibility under the ADA.

A couple of lawsuits involving audio description come to mind. So the first settlement under the ADA was the American Council of the Blind versus Netflix. And Netflix agreed to provide audio description for many of their streaming titles. Netflix now offers audio description for nearly all of its original titles and other select movies and TV shows which they license. So they really stepped it up due to the settlement.

Another lawsuit was against Hamilton, The Musical, which was sued for violating Title III of the ADA, for not providing descriptions for their show. So under this suit, Hamilton would potentially have to agree to describe their shows. I believe this case is still active, so there’s no conclusion yet as to what will come of this, but it really shines a light on the importance of audio description and access to entertainment.

And after all is said and done, why is audio description important? To me and to many others, it’s all about accessibility. We know that there are many benefits like we’ve discussed, but the fact that it provides equal access to people is a strong motivator to step up and offer audio description for video content. So this final section, which I called the audio description guide, will cover some of the video players that support audio description and our audio description process here at 3Play.

So our AD process– and when I say AD, I mean Audio Description– so our AD process at 3Play consists of three steps. We start with the time-coded transcript, which is actually what we use as the first step to create captions. From there, Human Describers go in and create time-coded descriptions based on the content. And then after that, the description file itself is time-coded and created using synthesized speech. So you have the audio description file. But from there, how do you publish it?

As far as platforms that natively support audio description tracks, there’s a pretty short list. There’s Able Player, OzPlayer, Brightcove, JW Player, Ooyala, Kaltura, and Wistia. There’s still a long way to go for audio description players, and hopefully this list will grow, but for now there are some other options you can turn to for publishing audio description.

The first option, of course, is to upload the audio description mp4 to your host video platform if it supports it. If it doesn’t support it, you can potentially publish one video with or without the description and one with the description. And this is like the Frozen example I showed at the beginning. In this case, we have to utilize video editing software to burn or merge the audio description text into the video itself and render it as one file. The last option is to have the mp4 file on hand and to provide it when somebody requests it or to post directly on your website for viewers to access.

And with 3Play Media, there’s a fourth option. It’s called 3Play Plugin, which is a keyboard and screen reader accessible audio description plug-in. It allows your description to play with the video player without having to republish the video. So by using the plugin, you can post the video with a description track directly on your site.

The plugin then creates a fully interactive and accessible video experience for viewers, and it works with YouTube, Vimeo, Brightcove, and several other video platforms. From us, you’ll get either an iframe or JavaScript embed code, which you can then post right on your website. The goal of the plugin is to enable you to add audio description to your video easily.

And that is it for my presentation today. We have a few minutes left for questions. So just keep them coming in, and we’ll try to answer in time.

OK, so the first question is, what should you do if your video player doesn’t support audio description? So like I mentioned in the presentation, you can publish a second version of the movie or the video that contains audio description or provide a version of the video that contains extended audio description. If you’re a 3Play customer, like I mentioned, you can use the 3Play Plugin to add audio description to any video player.

The next question is, during a talking head lecture video, we may insert images to support a relevant talking point. For example, the instructor is talking about a specific historical person, so a picture of the person is added to the video. Does that type of image require a description to be compliant?

So audio description is necessary only to describe the relevant and important information in order to understand what’s going on, not really for aesthetic reasons. So if the image is just inserted to make the slides look better or more appealing, then description would not be necessary. But if it’s in there to help understand and explain the content of the video, then audio description would be necessary. That was a great question.

So the next question is, how do you manage audio description for videos that are full of human speech. It would be interesting to see how audio description is handled with lots of voice over. How does it describe when someone on screen is narrating or talking?

So as for specifically– specifically interested in higher education setting– sorry, let me just read this question again. OK, yes. So for this question, I would say that this is where extended audio description comes into play.

So with extended audio description, you’re actually able to pause the original video and make room for adding description. So basically, you just force a pause here as opposed to being concerned for the natural pauses within the video. So it just makes it a lot more easier. Obviously, the video will end up being longer, but that way you’re able to get the descriptions in without taking away from the original dialogue.

The next question is, is the 3Play Plugin screen reader accessible. And yes it is. So the plugin is both keyboard and screen reader accessible, and we do this so that it’s fully accessible and that there are no issues with any of that.

Can an institution use audio description with YouTube videos they don’t own? So yes. The demo I showed, those Frozen videos were not our videos. They were videos that were posted on YouTube, and with the plugin, you can embed the YouTube video as long as you have the link to the YouTube video.

OK. And then I see a pricing question. And as a studio what are the rates per minute, per [INAUDIBLE], per hour of final video. So we charge per minute, and the pricing starts at $9 per minute, but we do offer volume-based discounts. So the more content you have, the more we’re able to bring that price down.

So the next question is, do you have complex description available, such as– such as with medical procedures. So yeah, so our descriptioners go through a rigorous training process similar to that of our captioners. And we have individuals who can describe complex topics, such as medical topics.

And because our system works where our describers can choose what to work on, you can be sure that the people who are going to be doing the describing are competent and confident and that they know the materials well. All right, so that is all we have for questions. We are a little over, but thank you all so much for joining this webinar.