Plans & Pricing Get Started Login

« Return to video

How an Accessibility Strategy Can Unlock the Power of Video [Transcript]

Hello, I’m Sean Brown, and I’d like to welcome you to our continuing series of live webinars documenting creative use of Mediasite all around the world. Today’s webinar is entitled, “How an Accessibility Strategy Can Unlock the Power of Video.”

We have a massive audience joining us today for this special webinar, and before I introduce our two guests that will be presenting today, there’s a few housekeeping items I want to go over with you to help you get the most out of this presentation.

First, if you would like to see live captions of this proceeding, please click on the banner at the top of the video window, and then you will be provided with a separate window that will include the live caption text. Second, if you are using a screen reader and have not already done so, you’re welcome to click Control Z. That will toggle between this view or the screen reader optimized view for maximum accessibility.

Third, this presentation will be available on-demand for you to watch again, or to share with any other people in your organization as you see fit. This URL, when you play it on-demand, will have a full transcript in a separate closed-caption window.

If you have a question at any time during today’s broadcast, there’s a speech bubble underneath the video. If you click on this speech bubble, you’re able to type in a question that will be relayed to myself here in the studio, and I’ll ask our guests at the appropriate time at the end of their prepared remarks.

And finally, under the Information button, you’ll find information about our presenters, as well as supplemental links, including the PowerPoint that is the basis for the prepared remarks of both presenters, as well as another opportunity to launch the live caption link area and other helpful information. And now to introduce my guests.

To my immediate left is Greg Kraus. He is the University IT Accessibility Coordinator at North Carolina State University. He works with developers, faculty, content creators, and administrators to consult on the accessibility of campus projects, provide training, and help set policy.

In the past, Greg was a Senior Instructional Technologist at NC State, where he worked with Moodle and other learning technologies, as well as Mediasite. He is also a software developer having created web applications with an emphasis on accessibility. Greg also founded a software company which creates accessible software for educational institutions.

He has also worked at Meredith College as an Academic Technology Specialist. Greg received his bachelor in Computer Science from Texas Tech– go Red Raiders– and his master’s from Duke University. Greg is currently serving as the leader of the IT Accessibility Constituent Group.

And to my far left is my old friend, Tole Khesin, who is the VP of Marketing at 3Play Media, who is– I am his customer for how this webinar will be captioned every month when we do this show when you watch it on-demand. He also provides closed captioning and transcription services to make video accessible, searchable, and more engaging to institutions all over the world.

Tole oversees all marketing operations, and is responsible for promoting 3Play Media’s brand and driving demand. Tole has authored many papers, and often speaks on this topic of captioning best practices, accessibility laws, and emerging standards. He lives in Boston, Massachusetts, and has been one of the principals at 3Play Media since 2009.

So welcome to you both. I couldn’t have a better panel to talk today about accessibility. And Greg, you go first. And Tole, you’ll go right after him. So take it away, Greg.

Great, thanks. So what I wanted to talk about with my portion is video accessibility. And thinking a bit beyond what we often think about with video accessibility, we think, oh, we have to caption it. So there’s a little CC button. And so we have to caption video.

There’s actually a lot of other aspects to video accessibility that doesn’t often get a lot of consideration. So I’d like to take some time to think about what some of those issues are. But first of all, when we’re talking about accessibility, I think it’s important to understand, what do we mean when we talk about accessibility?

And so one way to think about making web content accessible is, can all people– regardless of any impairment– can they interact with your content? Whether they might have a visual impairment, maybe they have complete vision loss, or they have low vision, where they’re using screen magnification software?

Maybe they have a physical impairment, where they’re not able to use a mouse. Maybe they only use a keyboard to interact with a computer, or potentially speech recognition software, or if they have an auditory impairment.

And oftentimes, we meet those needs with things like captioning. Or if there’s some type of cognitive impairment, where we need to make sure– cognitive impairments can encompass a wide range of disabilities. But it could mean anything from making sure we don’t have distracting items on the side of the page that draws people’s focus away from where they need to be concentrating, effective use of white space. Or can text on your web pages be used with what we call literacy software, where it might read text to them as it highlights the words to them?

So that’s what I mean what I say, is it accessible? Can anybody with any of these types of impairments use the content? They might use a piece of assistive technology to help them out, but are they able to fully interact with this content?

So the three things I’m going to talk about today are, first of all, accessible media players. And then a topic you might not have heard of before, called audio description. And then talk a little bit about paying for captioning.

So what do I mean by accessible media players? Well, one of the common problems we find with media players is sometimes they require you to use a mouse to actually get them to work. If you want to play the video, you have to take your mouse, and drag it over and click the Play button.

But what if you can’t use a mouse? How do you actually get the media player to play? And so with your media player, are you able to play, pause, and fast forward, rewind, and adjust the volume, toggle between full-screen mode, toggle captions?

Or in the case of the Mediasite player, there’s some other information on here. There’s a little information bubble that gives you information on the presenters. There’s a little window for links that you can get to. Can you get to all that without using your mouse?

So anytime you’re watching a video online, just a real good test to see can you actually use the media player without your mouse? Just set the mouse aside and use your Tab key. See if you can move around. That’s one of the first steps we look at in terms of, can this media player actually work for people who have disabilities?

So for instance, here I’ve got a little screen shot of the YouTube player. And over on the far left, you’re going to see the Play button with this little blue box that’s highlighted around it. So one of the important things we want to make sure that works is, if I’m not using a mouse– if I’m using my Tab key– as I press the Tab, it’s going to jump to various links and buttons that are on my page. And I need to know where I am.

And so in this case, I use the Tab key. And I was able to move down to the Play button. So I knew where it was. So then I could press Enter and it would actually play. And then I could press it again and pause the video then.

So it’s not just, can it work with the keyboard? But can you actually follow where you are using a keyboard? And so there’s a lot of media players out there. And when you’re considering what media player to use, that’s a very important factor. It’s not just can I provide captioning? It’s can my other users with other types of disabilities actually interact with the player?

Now I want to talk about a subject kind of beyond captioning. Sometimes this concept I’m going to talk about goes by the term audio description. Sometimes it’s called video descriptions or described video. It’s all basically the same thing.

So what is this? So if you think about the situation of someone who has a visual impairment– maybe they’re blind– and they’re watching a movie. Well, they can hear the audio track just fine, but how do they know what’s actually going on in the video? Someone moves across the stage, or someone performs some action. How do you actually know that just happened?

There’s this thing called audio description, where it’s an extra bit of audio that gets inserted into the video that actually is describing what goes on. And next time you’re buying or browsing movies online, go and look at the theaters, and you’ll see, oftentimes, little icons next to some of the movie showings that say, “this movie has audio description.” And that service is then provided. It’s through an extra piece of hardware that someone would pick up, and they could listen to that.

Sometimes you’ll see this feature being offered if you go to plays. I know there’s one local production company that we go to in Raleigh where I see a little booth set up for people who need the service. And they just go over there and pick up the equipment they need, and people will actually describe what’s going on in the play.

So audio description– it’s actually a very challenging thing to create. And I think to get a sense of what I mean by audio description, I’m going to show you a sample video that I have. And the video I’m going to show you, it’s only about a minute and 20 seconds long. I’m just going to let it play through. And you’re not going to hear anything unusual for probably the first 20 seconds or so. But as it goes through, you’ll start to hear things.


We are committed to the notion that everyone should have an opportunity to participate in higher education, whether it be from the learning perspective, or the research perspective, or an opportunity to work here at this institution. We benefit from that, because we get to enjoy the talents and the skills of those people who come in, and also their perspective, which in many cases will be different from the perspective of others on campus. And so accessibility becomes a very important value at the university.

Words appear– Michael K. Young, President, University of Washington. Images of a teacher and students in classrooms and at computer stations. Text moves on a closed-circuit TV. Words appear– IT Accessibility: What Campus Leaders Have to Say. Tracy Mitrano, Director of IT Policy, Cornell University.

We’re a leading university globally. We want the best talent in the world for our students, our staff and our faculty. And we want to be sure if that talent has a disability, that they know that we are a welcoming community.

And we’re competing with other prestigious and highly accomplished institutions. Pablo Molina– we want to make sure that we can target the right candidates– Campus CIO– to join our community– Georgetown– regardless of their disability status.


So a couple things you noticed about there. One is that there was some extra audio that was going on when there are kind of breaks in the regular talking. And what that is– somebody went in, and they actually described what was going on in that video.

And at the first, there was this big gap when the title scene is going on. And so there was a lot of space where people could stick extra content in. But you might have noticed near the end that extra audio track was inserted just, it almost seemed like, at the very last moment, because they have to fit those descriptions in where there’s gaps in the audio, which can be very difficult, because it’s a balancing act of what is the right amount of information to convey.

You don’t want to do too much. You don’t want to do too little. And you have to work within the given blank spaces that you have.

So creating audio descriptions is a very challenging task. But really, what I want us to start thinking about– as opposed to maybe strictly audio descriptions– but how do you convey the information in your video? So in a case like this webinar, if we had audio description going on, it’s basically we have a bunch of talking heads up here that are just talking.

And so in this case, there doesn’t have to be a lot of audio description. I mean, it could say something like, “three dashingly handsome men sitting behind a desk,” or something like that. But that’s not really all that important to the actual video that’s going on.

And so we have to think, what is the actual message I’m trying to convey? Again, if it’s a talking head, there’s not a lot. But what if I have a situation where I’m doing a math equation in class? So what if I’m writing this equation up on the board and I need to describe this? Do I need to put in audio saying, f of x equals f of 0 plus x, and so forth, as it’s going on, and try to fit that in in the middle of a lecture? That can be really challenging, because most professors don’t pause in the middle of their talking long enough to fit that kind of stuff in.

And a slide like this might have just way too much information to try to convey through an auditory track. There might be other ways to actually convey that information.

So the more important question to ask is, can my audience access the information I’m conveying? So sometimes, the information you’re trying to convey is already present in the audio and the captions. For instance, again, in this webinar, we do have PowerPoint slides going on.

But we are pretty much already restating everything that’s on our slides. And so it wouldn’t be necessary in this case for someone to come on and say, “Text comes on a screen– more important question to ask– can my audience access the information I’m conveying?” The don’t have to say that, because I already said that.

So sometimes, the information you’re conveying is there. Maybe in the case of that math equation, if the instructor is actually verbalizing it as they write out the equation, nothing else has to be done. But it might be the opposite extreme, where there’s so much information that is being displayed that is non-auditory that it might be better just to convey that information in another format.

So going back to that math example, it might be better for someone with a visual impairment just to give them that math in a textual format, like in a Microsoft Word document formatted the right way so it works with their assistive technologies. That way they can still get the math content that the instructor was referring to.

So they can watch the video. When they get to the point they need to process more information, they can go and actually read their text file, and take their time doing it, instead of getting just those little, tiny snippets within the audio description.

So when I was thinking about presenting this topic, one of things that was important for me is for you not to leave this presentation just absolutely freaking out about providing audio descriptions. Because you think, wow, I’ve never even thought about that. I don’t even know where to begin with that.

You’ll find audio descriptions, they’re very timely to make, and they’re very costly to make. And they’re not always the best solution in all cases.

What I really want you to walk away from this presentation with, is beginning to think about how to provide all of your information in an accessible way. It might not be in the media platform itself. It might be you provide alternative formats in some other way. But it’s really starting to think beyond providing captions, but providing the other information that’s going on visually. Is there something important I need to convey in another way?

Now I want to switch gears for a sec here, and talk about paying for captioning. Now Tole’s going to talk a lot about the fundamentals of captioning and what that is. But I want to talk about one of the issues that we have to deal with on my campus. And a lot of campuses are having to deal with this.

So are you going to do it yourself? Now, these numbers aren’t scientific. This is from personal experience. I’ve found that if you’re a nonprofessional, and you want to try to caption your own content yourself, if you get real good, it’ll probably be a 3 to 1 time ratio, but it’s more accurately, probably, a 5 to 1 ratio, meaning a one-hour video will probably take you about five hours to create the captions for it if you’re not trained in this.

It’s a tough job to do. And there are people with specific skill sets that make it go as fast as they can make it go. So that’s just one– if it’s a five minute video, yeah, I’ll caption that myself. But when you’re talking about long videos, that’s one thing to consider, is do you have the resources to do that?

And then another thing to think about is what amount of video do you need to caption? I’m not going to give a lot of specifics about pricing, because that can vary, depending on what exactly you need. But approximately an hour video is going to be about $150.

But when you’re talking about running something like an online distance ed program, and we’re capturing thousands of hours of video– which is not uncommon at all– if you’re capturing 10,000 hours of video, to caption every bit of that is going to be somewhere between $1 to $1.5 million. And 10,000 hours is not out of the question for a program to capture, when you’re looking at the entirety of a program.

And so those numbers become very daunting very quickly for how you actually pay for that. So what I want to think about is, how do we plan for captioning?

So thinking through this, when I look at NC State and the way we’re using some classroom capture systems and other video delivery systems, a single course for a semester can cost us about $8,000 to have captioned. So when you think on a per-course basis, that’s a big chunk of change to be able to put down.

So how do you start planning for those types of numbers? Because when you start looking at the entirety of it, and those numbers of $1 million to $1.5 million, how do we meet those needs to caption all that, but also maintain our commitment to providing accessible education? Because that’s a lot of money.

So how do we do that? The solution we’ve come up with at NC State is we’ve started a captioning grant. And so what this does is, we have money set aside centrally that helps pay for captioning both in what I call a reactive and a proactive fashion. So when we do have a student with a hearing impairment who needs the accommodation of captioning, an instructor can go and apply for this grant. And we’re able to pay for those captioning needs on an as-needed basis.

But we’re not satisfied with just doing the bare minimum, and saying we’re only going to do the people who need this for an accommodation. There’s a lot of other benefits to captioning that we’re going to touch on later. And so we want to encourage people to do the right thing. We want to gain all these other benefits from captioning.

So we also have money set aside for proactive captioning. So it’s a grant process, where you tell us about your course and we start prioritizing, based on how many students are going to take this course? Are you to reuse this content semester after semester? What volume of video are we talking about here?

And we’re then able to go and actually start providing funds to strategically caption courses where it’s going to have the biggest impact across the most users on our campus. So that’s my bit for planning for captioning. And I think there’ll probably be some more questions that come up later after Tole starts going into depth about captioning.


Great, great.

You’re up.

Thanks Greg. That was a really great presentation that, I think, really exposed some of the layers and complexities of implementing accessible media at a large institution like NC State. So for my part, I wanted to do a bit of a deeper dive into captioning. I wanted to start with the basics, what captions are, how captions are made. Then I’ll talk a little bit about some of the applicable accessibility laws and the benefits that come with captioning video.

First, I wanted to start with some recent accessibility data from the World Health Organization and also the United States Census Bureau. So there have been some really interesting data and trends that have come out of these reports.

First of all, there are now over a billion people in the world that have some sort of disability. In the US, there are about one out of five people that are older than 12 have some sort of disability, and many of those relate to hearing disability.

Another interesting thing is that the growth rates, so the number of people that have a disability is outpacing, is disproportionate with population growth. And you might ask, why is that happening? There are number of reasons, but it really sort of comes down to medical and technological advancements.

For example, premature babies are much more likely to survive today, which is obviously a wonderful thing, but the side effect is that they may have some sort of disability. We’re also coming out of a decade of war. We have a lot of veterans coming back. And just another example in that area is that modern armor, soldiers are much more likely to sustain an injury than they were of the wars of the 1970s. And so this is obviously a great thing, but the side effect is that they may come out of it with some sort of disability, such as a hearing loss.

And all of this sort of points to the fact that accessibility is a critical issue that will become even more prevalent in the years ahead. And captioning being an important part of that is really the reason why we’re talking about it today.

OK, so we’ll go to the very basics here. So what are captions? Captions are text that has been synchronized with the media so people can read along while watching the video. Captions convey all of the spoken content, and also the sound effects. So that would include speaker identification and other non-speech elements. Basically, the point of captions is to assume that the viewer can’t hear anything at all. And so anything that’s conveyed through audio that’s integral to the plot needs to be conveyed through the captions.

Captions originated in the early 1980s as a result of an FCC mandate specifically for broadcast television. But as online video has proliferated pretty much everywhere, the need for captions has expanded greatly. And so captions are being applied across all sorts of different media and devices, especially as people become more aware of the benefits, and as accessibility laws become more stringent.

We’ll talk a little bit about some of the most common terminology with respect to captions. First of all, captions versus a transcript– what is the difference? The difference is that captions are time coded. Basically, the way you create captions is you start out with a transcript, and you break it up into caption frames which are displayed on the screen at the right time.

Captions versus subtitles– the difference here is that captions, as I mentioned before, captions assume that the viewer can’t hear anything. So they also include not just the words that are spoken, but also the non-speech elements, and they identify the speakers. In contrast, subtitles assume that the viewer can hear everything, but just can’t understand the language. So subtitles are basically associated more with translating to other languages.

Closed versus open captions– this has to do with the way that the captions are rendered on the video player. Closed captions, which are by far the more common way of doing it, is that it’s a separate track on the video. And the main advantage of that is that it gives users the ability to turn them on or off. So on a video player when you see that CC button, that will enable or disable that track. In contrast, open captions are actually burned into the video. So there’s no way for the user, or anyone, to turn them off.

Post-production versus real-time really has to do with the timing of when the captioning process is done. So for example with this webinar presentation, we have live captioning going on. And so what that means is that, since it’s real time, that means that there’s a stenographer that’s listening and typing as we are speaking. So this is called real time. And there’s usually a delay– usually a five or seven-second delay.

Post-production means that the event has already been recorded, and it’s submitted for captioning. And usually the captions become available a day or a few days later. There are advantages and disadvantages to each type of process.

Unfortunately, with captioning, there are a lot of different formats, depending on what type of media player you’re using. And so on the left here, you’ll see a table with some of the more common caption formats. But this is really just the tip of the iceberg. There are probably 50 different caption formats.

And on the right side, there are a couple of examples, just to show you what the captions look like under the hood. So in the top right is called an SRT. This is a very common caption format that’s used, for example, with YouTube. If you want to add captions to YouTube, you will create an SRT caption format, and you’ll upload it to YouTube.

And so what you’ll see there– there are three caption frames. And in the first caption frame, it says, Hi, I’m Arnie Duncan, the Secretary of Education. And right above that, those are the time codes– the time when that frame appears, and then when that frame disappears. And when it disappears, it gets replaced by the next caption frame. So that’s a very simple one to create.

The one below that is an example of an SCC caption format. And this is much more complicated. For one thing, it’s not readable to humans. It’s actually in hexadecimal representation. And this is a very common format that’s used not only for web video, but also for broadcast and DVD authoring.

So how to associate captions with the video. Once you’ve made the captions, there are a few ways to associate it with the video. The most common way is to create a separate caption file. This is referred to as a sidecar file.

So you might have your video file, which might be an MP4 file, and then you might have your caption file, which might be an SCC file, or an SRT file, or any of the other formats. And what you do is you submit both of those to the video player, or you upload them to a server. And the video player basically just renders the video, and it points to the captions. And the video player is what renders the captions on top of the video.

In some cases, you need to actually encode the captions with the video. So one example is with iTunes. If you’re uploading video content to iTunes, the captions need to be encoded with the video itself. And so it’ll be one asset, but there’ll be a track on that video asset which will be for the closed captions.

And then finally, it’s also possible to burn the captions right into the video. And I feel like, especially with the emergence of online video, people are really moving away from burning captions into the video for the reasons that I mentioned before, that it obscures information on the screen. It also provides a lot of workflow complexities, because now you have to have different videos– one with captions, one without– and that just makes things a lot more complicated.

All right, so we’ll talk a little bit about some of the accessibility laws. In the US, there are three Federal laws, each having a different area of impact. So the first is the Rehabilitation Act, which this was originally enacted in 1973.

And there are two parts to it that impact captioning. So there’s Section 508 and 504. And this applies to all Federal agencies, and really any organization that takes Federal funding. And both of these laws– they’re very broad laws that apply to electronic communications and IT.

504 takes on a bit of a different angle, in that, it’s basically an anti-discrimination law. But they both sort of appeal to the same purpose. The ADA is the Americans with Disabilities Act. That was put into place in 1990. And that actually has five parts to it. This is a very broad law.

The two that impact closed captioning are Title II and Title III. Title II is for public entities, and Title III is for commercial entities. Now, Title III– the one that’s for commercial entities– is the one that’s really had a lot of legal action lately. And it’s been tested through civil litigation.

In particular, the landmark lawsuit a few years ago was the NDA, The National Association of the Deaf versus Netflix, where they cited the ADA on the grounds that Netflix was not providing closed captions for many of their movies for their instant streaming service. And the thing about Title III of the ADA is that, in order to qualify, you need to be considered a place of public accommodation.

And traditionally, that was used mostly for physical structures, such as the requirement for even commercial buildings to have wheelchair ramps. And this was the first time that this law was really used against, basically, just a website. And that’s what Netflix argued. They said, look, we’re not a place of public accommodation. We’re really just a website that streams movies.

And so the court eventually ruled that Netflix did, in fact, qualify as a place of public accommodation. And therefore, Title III of the ADA had to apply. And Netflix settled and ended up captioning close to 100% of their content. And this case has some very profound implications, because if Netflix qualifies as a place of public accommodation, then there’s certainly many other organizations– private organizations– that could also qualify as places of public accommodation, especially if they’re big enough to have a global impact.

There are a couple of other very big cases that are ongoing. One is against Time-Warner. They were sued by GLAAD– the Greater Los Angeles Area Deafness– because CNN does not have captions on a lot of their content. And then recently, just last month, there was a lawsuit filed against FedEx. So this is more in the context of corporate video. But FedEx was sued on the grounds that they did not provide sufficient accommodations to their employees who are deaf or had hearing disabilities.

The most recent law is the CVAA. This is the abbreviation. It’s actually called the 21st Century Communications and Video Accessibility Act. This was enacted just a couple of years ago. And this relates specifically to content that aired on television, but is also being published on a website.

And as of July of this year, the FCC actually ruled that this law also covers video clips. So even in the case where you have, let’s say you have a 30-minute show, but you take a two-minute clip from that show, and you put that on YouTube, or really any website, then that needs to have captions, as well. Just to clarify– the video clipping doesn’t actually kick in until January of 2015. But right now, if you take the entire show, and you put it on a website somewhere, it has to have closed captions.

And the interesting thing about the CVAA, in contrast to prior laws and prior rules with the FCC, is that it’s now the copyright owner who bears the responsibility for providing captions. So previously, it was the cable networks that had the liability and responsibility to provide closed captioning for the shows. But now it’s the copyright owner.

So to give you an example, if you upload a video to YouTube, YouTube gets an unlimited worldwide license to display that video. But you, as the person who uploaded it, you are the copyright owner, and therefore it’s your responsibility to upload the captions for that content.

In July of this year, the FCC came out with actual standards for what caption quality is. This is something that’s really interesting, because in the past, it was a bit of a grey area. It wasn’t really clear what was adequate for caption quality.

And they broke it up into four criteria, four requirements. So the first is caption accuracy. And here, there’s very little leeway. They said that captions need to pretty much be flawless. They provided some leniency for live captioning, which is understandable. But for post-production captioning, it needs to be at least 99% accurate and pretty much flawless.

The second part is with caption synchronization. They said that the words– the caption frames– needed to coincide with the content that appeared in the video. And it really, to sync, really pretty much needs to be flawless there, as well.

The third part is program completeness. So prior to this ruling, there had been a number of complaints that sometimes captioning would drop off toward the end of a program. For example, there might be a scene after the credits that wouldn’t have captioning. And so they came out and said, look, we need to have captions from the very moment that this program starts to the point that it ends.

And then the last part of it is on-screen caption placement. And this has to do with the location of where the captions appear on the screen. So the default location for captions is called the lower bottom third. So that means that they appear sort of centered, or sometimes they’re left-justified. But the point is that they’re in the bottom of the screen.

But sometimes, you have some very important information there. You may have some other text there. You may have some text that introduces the speaker. And those captions can actually obstruct that content.

And so what the FCC said is that in the case where the captions are obstructing critical information on the screen, they need to be relocated. And so one of the things that we actually do at 3Play Media is– and this is a patented process that we’ve developed– is that we actually look at the pixels on each frame. And if we see that there’s a frame where the captions could potentially obstruct the content, we’ll automatically relocate the captions to a different part of the screen.

So I want to talk a little bit about the benefits. So the first and foremost benefit of captioning is that it provides an accommodation to people with hearing disabilities. This is critical.

There are, in the US alone, there are 48 million people who have some degree of hearing loss. This is basically one out of six people age 12 and over who have some level of hearing loss that’s significant enough to interfere with day-to-day communications. This is a big number. And adding captions really helps a lot of people.

The other part of it is that captions really help you understand the content better, even if you don’t have any hearing disabilities. It really helps everyone. And they’re very useful in cases where you can’t turn on the speakers on your computer, or you’re at work, or you’re in a sound-sensitive environment. It’s just much easier to consume the content where you don’t have to rely on the sound.

And there’s a really interesting study that was put out by the Office of Communications of BBC. And they found that 80% of people who use closed captions don’t have any hearing disability at all. And this number I find really, really interesting and impressive– that four out of five people use captions who don’t have any hearing disability. It just goes to show that captions are really useful to everyone.

Video search– so this is something that we’ve developed a lot of tools for to really– once you’ve transcribed and captioned the video– you can really leverage that time-text data in order to make the video searchable. And we’ve built a number of tools, and I know the Mediasite software also has a number of tools, where you can search for a word within a video, and it’ll show you exactly where that word appears within the video.

And you can jump to a specific point. And this is really useful, especially when you have a lot of video content. Just like if you have a lot of lectures and you would just want to search for a keyword and see where it appears. It’s just very, very useful.

And we have actually worked on a study together with MIT OpenCourseWare. And they polled hundreds of their students who have used these video search tools. And one of the piece of data that came out of the report is that 97% of the users said that searchable transcripts enhance their educational experience. 97%. So that’s a big number.

If you’re using video, especially for marketing purposes, and it’s important for people to find that video, then SEO– or Search Engine Optimization– is something that really benefits from transcribing and captioning your video. And we’ve done a number of studies with different organizations.

The biggest one that we did was with Discovery Digital Networks. They did a very large study over the course of six months, where they compared two groups of videos– one with closed captions, one without– and found that on average, over the lifetime of the videos, there was a 7.3% increase in views for the videos that did have captions, as compared to those that didn’t.

And the other thing that’s really interesting is that transcripts and captions are often really repurposed in many different ways. One example that I wanted to share was data from University of Wisconsin. And this is in a particular graduate class. They found that when they made transcripts available on videos, they found that 50% of students used those transcripts as study guides. 50%.

So really captions and transcripts are not just about people who are deaf or have hearing disabilities. They really have much broader applications.

So some interesting data here. So often, when we do presentations, we often run polls to ask people some questions about the state of their captioning. And on the left, there’s some data about what is the percentage of your online course videos that have captions? And so you can see there, it’s kind of a distribution that more or less you’d expect to see. Three quarters of people have done some captioning, but very few have captioned all of their content.

The bar graph on the right is in response to a question– what is your biggest accessibility concern? And so this is really interesting. So on cost budget, which almost 30% said that was a big, big concern. That’s understandable. Captioning is, as Greg talked about before, it’s a big expense item. So that’s understandable.

But the thing that’s really interesting is that the biggest part here that people check off is resource time. And what that really relates to is that people don’t– sometimes they don’t really know where to start. Sometimes they don’t really know how the workflow is set up with their video player platform. And they really just don’t know where to begin and how to make it simple.

And so this is really where I feel like our primary goal with 3Play Media is, just to make the process of captioning as simple as possible. And to that end, we’ve built an online account system that has a myriad of different features in there that lets you see everything that you’ve uploaded. We store all your content indefinitely, so we have 50 different caption formats that you can go and download whatever you need now. Or when caption formats emerge, you can go and download retroactively– even content that you did in the past.

There are all kinds of different turnaround options. If you need to have captions urgently, we can have it back for you within the same day. We have all kinds of different automated workflows. For example, we have a beautiful integration with the Mediasite software, where you can link your 3Play Media account with your Mediasite account.

And then, in order to add captions to video, you literally just press a button. You just indicate which presentation you want to add captions to, and that’s it. In the back end, Mediasite will send us that video. We’ll create the captions, and then send it back. And then it’ll just show up. So it really, really can’t get any simpler.

We also even allow you to import your existing captions and transcripts. And if they’re transcripts, we’ll align them with the videos, and create closed captions. Or if their captions, and you want to export different formats, that is completely available. Or using any of these video search tools, all of that is available. But the goal really is just to make the process as simple as possible.

All right. Sean, I’ll pass it on to you.

Oh my gracious. You two gentlemen have generated more questions in an episode than I’ve ever had before. So a great job, Tole. And great job, Greg. Here we go.

The first question is for me, from my old friend who’s an expert in this area– R.T. Hamilton Brown– who reminds me to remind everyone that this presentation that they just both did is combined into a single PDF that is available under the Information button, or I’m sorry, in the Attachments icon. You can find that in the Information area and download it, which will be very useful. Because a lot of people had questions about both of the statistics that you all cited as very useful.

The first question I want to give to– I’ll put a jump ball up there, but I’ll say this is specifically for Greg, from a friend at the University of Michigan. It says, “Hi, Greg. Ohio State is leading a group to work on a truly accessible media player. You talked about that Holy Grail, if you will. And we at Sonic Foundry try to develop as best we can, and that’s good to know that’s out there.”

And second, he wanted to convey that he was in a conversation last week where people were balking at making audio description a requirement for post-secondary institutions. “How can we make the description process less onerous, in your opinion, so that it’s adopted as widely as it should be?” Either one of you, but start with Greg.

So first of all, I’ll talk about the accessible media player work that’s being done out of the Big 10 institutions. I mean, they’re doing some good work there. This is a known problem, especially among the open source market for video players, that there’s not a lot of great options out there for accessible media players.

I’ve had several conversations with the folks working there, and they’re doing some good work. A lot of times, the more accessible media players are locked up into more proprietary systems, like a Mediasite system, or Kaltura, or something like that. So there is a big need for that kind of open source, I just want to do a one-off video thing.

In terms of audio descriptions, that’s a tough one. Because when you look at the cost involved with audio descriptions, it can be, in order of magnitude, more expensive to get audio descriptions made, at least get them made well for a video than it can for captions. And then you have to look at what the value is you’re getting out of that.

And so I’m always going to fall back to, what is the most important thing you’re making sure you do? And that’s making sure you express the content in accessible ways. And so it might mean putting it in alternate formats.

Now, there are some kind of cutting-edge techniques to make audio description– you can produce it more easily. I’ve worked on a model. And there’s a one player out there a friend of mine wrote called Able Player that will take advantage of this. You can actually create the audio description just like you a caption track. And you can put the text in there, and you can timestamp it. So this bit of text ought to be said at a certain time.

And then we’re able to leverage the screen readers to basically say, OK, screen reader, read this bit of description now to the end user. So that way is a little easier, because there’s not as much post-production work and you can do it yourself. But it’s still a very time-consuming process.

I think, honestly, it’ll be a long time before we really are able to adequately tackle the audio description problem, because we still have so many problems just implementing captioning. We understand the workflow is there, and how to do that. And there’s several dozens, if not hundreds, of companies that will do this for us.

We still haven’t figured that one out yet, in terms of actually implementing it fully on campuses and everywhere. So unfortunately, I think it’s going to be a while before we really solve the audio description problem. I think it’s something that everyone should– I encourage people to look toward doing, especially if it’s appropriate for the video that you’re delivering. But I don’t see a great future for it in the very near term, because we have so many captioning issues we’re working through.

I have a follow-up, as I ask Tole if he has any thoughts about it. But I’m going to put this follow-up [? big, ?] as well, as I ask you if you have any thoughts about audio descriptions. A person from Buffalo State adds, “Are audio descriptions required by the laws you cited? Or is captioning sufficient? What could we do when creating the video to avoid needing audio descriptions?” And obviously, if you any comments on how to make it easier.

Yeah, audio description is coming up more and more with conversations that we’re having with our customers, as well. And it really is, Greg, as you said, it’s an onerous thing. It’s very difficult to do. Not only because with educational content it’s difficult to find the gaps to insert that content, but also it’s just there aren’t really a lot of standards around how that’s done.

It’s, I think, as you mentioned before, it’s really somewhat of an art form to create audio description. And it’s very expensive to do. And so I would agree with that.

As far as on the legal side goes, so the WCAG, I think, of Level AA requires audio description to some extent. But currently, that standard is not referenced by Section 508. So currently, it’s really sort of, at best, a grey area.

Got you. That really makes sense. And I wanted to add to that, because we do research in these areas, as well, in our history and our research property coming out of Carnegie-Mellon University. And just like Tole said, and like Greg has indicated, audio descriptions require a certain expertise with the subject matter that a transcription– your expertise.

You have to be able to hear. You have to be able to type amazingly fast. You have to be able to create punctuated accuracy. But it can be done by folks that aren’t subject-matter experts, necessarily, correct? Is that a fair way to say it?

It can. You’d have to know, at a minimum, basic video editing techniques to be able to do that.

And what you’re describing.

Yeah, it takes a certain level of subject-matter expertise to do it. And then just learning what is actually the right amount. It’s that art form again. There are some groups that are starting to put together some best practices for that.

And the name of the group slips my mind. It’s one of the Canadian groups. I worked with some of their broadcasting. They’ve started to put together some best practices for that. But it’s a very new area for us to be getting into.

Understood. Now Tole, this one’s for you. I didn’t mean to cut you off. This one’s for you.

A lot of people have asked this question in different ways, but they want to know, since you’re a pioneer in this space with 3Play, will speech recognition technology advance to the point where humans are no longer necessary anywhere in the process? Should that be a goal or not, in your opinion?

Yeah, that is, indeed, a question that is asked a lot. And we have these conversations a lot.

Remember, a transcriptionist who works for you is currently transcribing this video, to create the transcript.

Right. Right. So speech recognition is a great technology, specifically in the context of closed captioning for educational content. It’s been a little over-hyped. I think most people have seen the results of speech recognition alone in YouTube, which provides that speech recognition, those captions, on any video that you upload.

And so I can say without a doubt, that speech recognition– the state of that technology right now, is absolutely insufficient for providing closed captions. Because people are really relying on that content. In education, typically– and that is actually, I should say that, just a little background. We use speech recognition as part of our process. The way that it works is we’ll take a video. We’ll–

You use it all.

Yeah, we put it through speech recognition. Then we’ll have a professional transcriptionist who then will go through and clean up the mistakes left behind by the computer. But then we even have another person, a QA person, who will then go through and double-check grammar and punctuation, research difficult words.

So by the time we’re done with it, it’s pretty much flawless. But the speech recognition is that first step. And typically, speech recognition alone produces about 60% to 70% accuracy. Maybe more, if it’s a single speaker without accents or background noise. But that’s sort of what you’re looking at.

And what that means is that one out of three words are wrong. And the thing about speech recognition is that when it’s wrong, it’s wrong spectacularly. I mean, it’s just completely sends you in the wrong direction.

And especially when you’re dealing in education, with a lot of esoteric terms and specialized vocabulary, those are the ones that typically it gets wrong. And those are the ones that people are relying on in order to understand that lecture.

So the other part of that question is the future of speech recognition. So from our point of view, we feel that we’re really on the cutting edge in this space. And we know the technology very well. And we feel that the technology is being refined. And there are minor improvements that are happening.

But in order to get to the point where we can rely only on speech recognition, I personally don’t think that’s going to happen in our lifetimes. I assume some other major, major advance has to happen before that happens.

I have a comment– and this is more for Greg, but Tole, you can jump in. There’s a lot of experts watching today. And a leader who’s helped us in the past, Marsha Orr, talked about scripting. And she said, “I recently heard a strong recommendation to script videos so that the script would be uploadable, and, therefore, facilitate all of the things we’ve been talking about today.” She said, “Yet, I’m an extemporaneous speaker.”

And she’s a great teacher, a fantastic teacher. And she feels restricted with a script. What are your feelings about this? The advantages or disadvantages of changing the way we present and adding a script, so that all of these accessibility issues, and many others, are enhanced? What are your thoughts?

So I totally agree with those thoughts. That I’m an off-the-cuff speaker. I find it very challenging to write out a transcript and speak from that. For those who can do that, I say go for it, because it makes the job of creating captions a whole lot easier.

The hard part of creating captions is the transcription part. The synchronization, there are some tools that are out there that will let you do this. Well, you can actually leverage some of YouTube’s services to do a decent job with synchronization for free. And there’s other companies out there that provide it for a fee, as well.

If you can do it, do it. Personally, because I’m presenting a lot on educational issues, I don’t want to take away from the quality of what I know I would do if I made a transcript, as opposed to just doing it live.

Totally makes sense.

Tole said– and I think this is part of what Marsha was asking, because she’s a pioneer in this stuff– you talked about how, if somebody was working with 3Play, if they had a transcript, they could send it to you, and then you could work to align it, correct?


What if I had a script and it didn’t cover everything that was said? I had a scripted portion, but then the Q&A was extemporaneous? Or I had a script, but I varied from it? Is it a waste of time to give that to 3Play? Can 3Play do something with that?

Yeah, that’s a great question. A lot of our customers upload scripts with the videos. And we can certainly synchronize them. And to address your question about what if the script is a little off– so if it’s a little off, not a problem. Basically, the way that the algorithm works is that it will look for anchor words. And it will try and find their phonetical counterparts. And it will link those up, and it’ll interpolate for other words.

So as long as it can find enough of those anchor words, it works well. If the script goes very different, then I think the algorithm will have a harder time syncing it. But I think if it’s decent, it generally tends to work very well, especially in education, where you don’t have– some of the things that really interfere with the algorithm and that synchronization, is things like music, background music and background noise– those really sort of, like, things tend to drift. But with education, it doesn’t happen very often. So it tends to work very well.

Excellent answer, both of you. So there’s so many questions. I’m gathering them into some categories. But here’s one that got a lot of hits. You talked about grants. You talk about grants on this show, people are going to ask about it.

And the number one question, I think, is best expressed by Brendan from the University of Denver. He says, “Money!” exclamation point. “Greg or Tole, can you talk more about how to engage people on campus to get buy-in to pay for captions? Tole’s benefits slide is great. But do you have any insight on getting buy-in from people with the power to set aside central funds on campus, versus merely the instructional folks who would be more amenable to the benefits that he said?”

What was your process? And I know you’re working on some big stuff.

It’s a long process to do that. So there are several aspects to it. One is, showing some of the benefits that you gain from it. The way we happen to secure some of our funds came through some education technology fees, which our student groups have some say over how those are spent.

They collect fees from Student Senates and things like that. Got it.

So I created a little tutorial video demonstrating one of things Tole talked about, was searchability. So I had a class of about 25, 30 hours of video. And I played the role of the student.

And I just did this little two-minute demo of, oh, I remember the professor said this one phrase, but I don’t remember where it was. And there’s all these 30 hours of video. And this is using Mediasite, because we do use quite a bit of Mediasite on campus.

I was able to go to type that phrase into the search box. And then within a second, it showed me exactly where that phrase was said in the video. I could click on it, and it took me right there.

So you brought in the stakeholders. You said, all students all the time–

So that’s one aspect.

It can benefit everyone.

And when students see that, that all of a sudden is a lot more appealing to them than, oh, we’re doing this for this other group of people that doesn’t impact me. So that was one thing.

It’s also key to talk to the people who ultimately control the money, and show them what is the real need. I think that was probably one of the things that worked well for us on campus, was putting some numbers down. And we talk about we need to caption all our video, but do you have any idea how much video you’re actually captioning?

And when we looked– we have a fairly robust distance education program with a lot of video being captured. And so we looked at what the numbers actually were, and what those impacts were. And we didn’t just talk about the big, multimillion dollar number for doing everything.

But it was showing, on a per-class basis, we’re talking about taking an $8,000 hit. And you know, a lot of the campuses, you have colleges that are resource rich and you have colleges that 98.9% of their budget goes to faculty salaries.

And so when you look at how that $8,000 hit is going to impact various colleges, if we have someone with a hearing impairment going through, for instance, something like a Humanities Department, that might not have as many resources, and say, well, they’re now going to be taking eight courses over the next three or four years. And how much of a hit that is. Really trying to couch this, in terms of it is the university’s responsibility to provide these features. And so we, as a university, need to come up with a solution. And try to find a way to make that so we can all do it together.

I want to get Tole in on this. And this is going to wrap up to be about our last question. But since everybody provided their email addresses with their questions– and you can keep asking the questions– these guys have agreed to help me get back to you offline, too, if we didnt’ get to your question on the air.

But to you, Tole, you have started and founded a leading business in this area. Who are your customers on campus? Who are your customers at corporations? What are their job titles usually? And that’ll help our audience who is asking this know who to go to to make those decisions and write that check.

Yeah, absolutely. So there are basically two groups of people that we work with at educational institutions. Often, we actually work with the multimedia coordinators. That is a big contingent. We also work with administrators, the IT administrators. And sometimes we work with–


Yeah, exactly, and sometimes we work with faculty. We’re seeing an increasing number of institutions that are centralizing the captioning.

That’s a huge question. You hit about 15 people there. Go ahead.

So we are– just a couple of institutions that we’re working with– with University of Wisconsin, Harvard University, Brown University. Those are a few examples that come to mind, where they have actually centralized the captioning process.

You see that as a trend.


Because a lot of people are asking, is it central? Is it decentral? Is it the responsibility of the individual faculty member, the department, or the school?


You’re saying– to put words in your mouth– it went from being decentralized and program-based to being more centralized, like you said a university responsibility?

Well, can I add a nuance to that?


So centralized funding, yes. Decentralized implementation of the caption– so they come and get the money from me. And I basically just authorize them to spend funds. They are responsible for contacting a transcription/captioning company. They are responsible for incorporating the captions, because I don’t have the staff to do the work for them. But I can allocate money so we can transfer it back to their departments.

Last word.

And Sean, just one last thing, coming back to the question about sources of grants and funding for captioning. So we have a number of our customers take advantage of grants. And what we’ve done is we basically aggregated all of the different sources that our customers have taken advantage of, and we actually put them together into a whitepaper.

If you go to our website, there is a website on sources of grants and funding specifically for captioning. So that may be useful.

Excellent! An excellent answer, excellent answers, excellent presentations. On behalf of our audience, I’d like to thank both of you for hitting a very important topic. I think we can do another whole hour on this, just drilling down further. And I’d like to thank all of you for joining us today. Thank you very much, and we will see you the next time.

Interested in Learning More?