Strategies for Deploying Accessible Video Captioning – Transcript
JOSH MILLER: My name is Josh Miller. I’m one of the co-founders of 3Play Media, where we focus on premium transcription and closed captioning and subtitling services. We also view web captioning as an opportunity to really supercharge your video for the viewer experience, much more than just for those with hearing impairments.
We have a really great panel here. We’ve got people who are video experts who have all tackled the accessibility challenge in different ways. So we’ve got Tom Aquilone from Lockheed Martin, Wendy Collins from Infobase Learning, which owns Films on Demand, Piyush Patel from Digital Tutors, and Ben Labrum from Oracle, who oversees Oracle on Demand.
So I’m going to start by actually going through a very quick overview of some of the applicable legislation that affects web video from a captioning prospective. We’ll then certainly dive in with our panelists. And we’ll leave some time at the end for questions.
All right. So the first that some people might hear a lot about is Section 508, which is a fairly broad law that requires all federal electronic and information technology to be accessible with disabilities, including employees and the public. So for video, this means that captions really do have to be added to that content when it goes up online. If you’ve got audio content, a transcript is really all you need.
Section 504 entitles people with disabilities to equal access to any program or activity that receives federal subsidy. Web-based communications for educational institutions and government agencies are both covered by this. Section 504 and 508 are originally from the Rehabilitation Act of 1973. Although Section 508 wasn’t added until 1986. Many states also have enacted legislation that mirror these two federal laws.
The next one is the ADA. And the Americans with Disabilities Act of 1990 covers federal, state, and local jurisdictions. It applies to a range of domains, including employment, public entities, telecommunications, and places of public accommodation. The Americans with Disabilities Act was actually broadened in 2008, in terms of the definition of disability, to match with Section 504.
What is interesting about the ADA is that’s actually the law that was cited in the recent Netflix lawsuit, where they were successfully sued for not having enough of their content captioned. Netflix argued that ADA only applies to physical places and therefore should not apply to their streaming video service. But the judge ruled that the ADA does, in fact, apply to online content and that Netflix qualifies as a place of public accommodation.
This has pretty interesting implications for web content. It’s vague. It introduces new challenges. And they didn’t actually go into exact detail as to why Netflix is a place of public accommodation. But the thinking is enough people have easy access to it. Therefore it should be accessible.
And the more recent law, which is often referred to as the CVAA, which is the 21st Century Video Communications and Accessibility Act, that was signed into law in October of 2010. And what that is basically saying is that any content that aired on television– so traditional television– if it aired on television, it must be captioned as good or better online than it was on television.
So there are a number of milestones to phase this in. Some have already passed, which is just the basic, ongoing, when a show is online after being aired on television, it should be captioned, very simply. One of the next milestones has to do with cut-up contents or clips that go up online. So if you alter that original show, it now also has to be captioned. And then starting in January, some of the devices and media players will have to accommodate stylistic changes so the viewer can actually change color, style, and really make a viewing experience that they prefer.
One of the platforms that has already done a really good job of this, if you want to see what this really means, is YouTube. So they actually allow you to change the color of your captions, change the size, the location, as the viewer, not the publishers. The viewer has complete control over what those captions look like.
So accessibility is clearly a growing concern, otherwise there wouldn’t be new laws about it. This is some data from a 2011 WHO report on disability. It states that more than 1 billion people in the world today have a disability. And nearly one in five Americans age 12 and older experience hearing loss severe enough to interfere with day to day communication. The other interesting conclusion is that the number of people requiring accessibility accommodations is actually rapidly on the rise.
So why is this? It’s actually pretty standard things that we’re dealing with every day. It’s an aging population. It’s better medical care. People who may not have survived a premature birth are now able to survive and have nearly a normal life, which is great.
We also just had a decade of war. People are coming back with disabilities. So the numbers of people who need some kind of accommodation are just on the rise.
So we look at captioning as having a number of different benefits because we’re focused on web captioning. That’s one of the beauties of what we’re talking about today is that the internet really does open up a lot of other possibilities other than just focusing on those with a specific need. It really does open up options for everyone viewing the content.
So I’m not going to go into too much detail now, because our panelists are going to be able to talk about it much better in real life. So we’re going to dive in. I have asked our panelists to introduce themselves by telling you what type of content they work with, who is the audience, who pays for the content, if it’s different from the viewers, so that we all understand how this is going up online, and how much of their content today is getting captioned. So we’re going to kick it off with Tom.
THOMAS AQUILONE: Only broadcasters can do live captioning. We can’t afford to do live captioning. Closed captioning takes too long.
These are the kind of things that we heard. And my name is Tom Aquilone. I work at Lockheed Martin, just outside Philadelphia, near Valley Forge, Pennsylvania.
We do have a portion of our webcasting that is live captioned. We have challenges avoiding the cost for that stuff. We have challenges doing a really good quality webcast. And doing both of those together are something that we really concentrate on doing and doing well.
The kind of webcasts we do are for our executives, for town hall meetings, for compliance trainings, for program managers to brief their employees, that kind of thing. We do around 400 webcasts a month. And Josh asked for us to tell you a little bit about how many we caption. We only caption about 10% of those at the highest level, some because of cost, some because of the audience and things like that. And we do webcasts to Lockheed Martin employees all around the world.
Who pays for our webcasting? Well, it is our presenters or our presenters’ organizations. If it’s an executive vice president or a communications organization or a program, program directors, they pay for closed captioning. They also pay for our webcasting services.
And then I already talked about the percentage, I guess. Who pays for it in the organizations, too? Our infrastructure for webcasting is paid for by the corporation. But each individual webcast, the production team, the captioning services, things like that, the Mediasite platform we use for webcasting, the Vitac live captioning we use for each and every one of those webcasts that have been identified are paid for by the presenter and their organization.
So I think I’ve covered the bullets you asked us to cover, Josh.
JOSH MILLER: Perfect. Thank you. And Wendy?
WENDY COLLINS: Good morning. I’m Wendy Collins with Infobase Learning. We have a web-based video platform that you can see on the screen there used to deliver educational video content to schools, colleges, and libraries around the country. So that makes our users predominately students, faculty, and librarians.
The people paying for the service are typically the institutions themselves, or the media librarians that are buying campus-wide access to what we like to think of as Netflix for educational content– so documentaries, educational instructional videos, really runs the gamut from biology videos to Shakespeare and everything in between. The platform has about 20,000 titles on it right now.
Last year at this time, I would have had to tell you that we were only captioning about 50% of those. This year, I’m happy to tell you that we’re capturing 100% of our new titles coming online. And we’ve been working with 3Play to go backwards. And I think we’re done. So we’re at about probably not 100%– 98%– because we’ll never be captioning some silent movies that we have and some foreign language films. So there is a small slice that we’re not captioning today, and probably never will.
JOSH MILLER: Thanks. Piyush?
PIYUSH PATEL: Hi. My name is Piyush Patel. I’m the CEO of Digital Tutors. We create a product that teaches people who make movies and games. So we focus on a very niche sector of the visual effects post-production world, everything from 3D to color correction and all the applications in between.
We have three customers who pay the product. We have freelance artists who our transitioning jobs or building up skills, large studio installations where they deploy it across all of their staff, and then schools and government institutions that have art schools or 3D schools that want to augment their in-classroom experience with video-based content.
We have about 10% of our library closed-captioned. And we’ve got an initiative to caption all new content coming out. We have a Monday morning release. So 8 to 10 courses go live every Monday. And we’re trying to get those in the chamber within a small period of time to get captioned.
JOSH MILLER: All right. Ben?
BEN LABRUM: Hi, everybody. My name is Ben Labrum. I’m with Oracle Training on Demand. And what we did is really simple. We just went in to our Oracle University classroom and recorded our best instructors, put it on the web. And our students pay the same price for a 90-day subscription to Oracle Training on Demand that they would pay to go to your typical instructor-led training for five days. You’re there all week.
So that’s our model. It’s really simple. And we caption everything.
JOSH MILLER: Great. All right. So let’s dive right in. Tom, let’s get started with you. Why are you captioning your content?
THOMAS AQUILONE: We’ve had a number of requests by hearing impaired folks to provide captioning. It started out as a few individuals and then followed up through our communications organization with quite a number. And so for some time, we’ve been kind of scrambling how can we meet that need? And we started out with American Sign Language dual webcasts and then discovered the current process we’re using where we have a live webcast available and then follow that up with very quickly turned around closed captioning.
JOSH MILLER: Makes sense. Great. So, Wendy, you mentioned that you are now capturing pretty much all your content. But, obviously, at some point it was a 50% mark. So at what point in Infobase or Films Media Group’s life did accessibility become such a focus? How did you start executing that? How do you choose which half is getting captioned, which half isn’t?
WENDY COLLINS: So it’s been an interesting journey. I’ve been in this educational content business for a while. And I saw really two major transitions. And they were kind of synced up with format changes.
So the first transition was VHS to DVD. And that, educationally, took longer than the consumer market, because schools had installed VHS players. They didn’t want to buy new equipment. So we were a little bit behind the consumer market on that front. But once that change really started to happen, we saw a demand go up by our customer base. And it really coincided with some of the laws that you referenced in terms of when schools started to abide by the letter of the law, of 508 and ADA, and mandate it.
We had a large revenue stream from California. It’s a big state, lots of schools. And right around that same time– 2002, right when the DVD format was becoming the predominant format– they put their foot down and said we won’t buy any educational content unless it’s captioned. So all of a sudden what had been like, oh, it’s a nice thing to do, became a mandatory thing to do.
But we were still left with decisions about which titles to caption, because it was a lot of money back then to do this on hard copy. So everything from we looked at sales history to we looked at what our customers are asking us for. Key producers, so if we got content from the Discovery Channel or BBC, we captioned that. Whereas some are smaller producers lost out because we weren’t willing to invest in that content. So it wasn’t good. We were making probably bad choices in terms of what we were captioning.
Fast forward to when the online media began to transition as the dominant format and we took our content online, we saw that format shift really increase demand for, I think, two reasons. One, because it’s just out there more. And people have an expectation because YouTube does it or because of Netflix. There’s an expectation of any online content that it’s going to be captioned.
But secondly, our business model changed. So we went from selling individual titles, where you’re an educator or you’re a librarian. And you buy a specific DVD from us. And you could ask us to caption that one DVD if you had a student in a class that needed it to be.
Now all of a sudden, we went to an online subscription model where we’re offering access to 15,000, 20,000 videos that, if one’s captioned, why can’t they all be captioned? And the expectation really shifted when that business model shifted. So once you got off physical media and got into online, totally different game changer.
So at that point when that started to happen, we were still making tough decisions because of the way we were doing our captioning– the old traditional, send it out, get it transcribed, get the files back, build the DVD, put it online. And going to the new model, the speech-to-text recognition, really allowed us to make the decision to go all in.
JOSH MILLER: And Tom, you mentioned that the decision to captioning is often with the content producer. Is that right? Do you get involved with how the captioning decision gets made? Or how does that work in your organization?
THOMAS AQUILONE: Well, for privacy reasons, the communications organization can’t share out, oh, we have this many people that are hearing impaired. But we do get those requests. Those requests come down the line. And so we know who some of those people are. And if a president, a vice president, a director or manager has somebody in their organization we know about, we’ll say, hey, you really should address this need. But on a higher level, we have begun asking them all the time, because it’s the right thing to do.
JOSH MILLER: Great. So Piyush, at what point in your business did you decide captioning was a priority? And why? I mean, you have a pretty interesting type of content with a very focused group of people. How did this come about? And how did you get it going?
PIYUSH PATEL: So the content is very technical. And to be able to search that content is a huge plus. So although we do have a small percentage of customers who you are hearing impaired or challenged, the real motivation behind that was, yes, we want to get into some of the public schools, but it now makes our content much more searchable and rich. So twofold.
JOSH MILLER: Great. And, Ben, I think you probably have a pretty similar story. Maybe you could tell us a little bit about– you mentioned this– you went from a classroom model to this online model. Maybe tell us a little bit about how that all happened.
BEN LABRUM: Right, right. Well, we have instructors around the world. And we have students around the world. So it’s just like Piyush was just saying. So say we record an instructor in Atlanta, Georgia, somewhere in the South. And then there’s a person in India who’s more used to an English accent. They might not follow as well without the captions.
So like you said before, it’s just a way to add additional value. And I know this panel is about accessibility, but it’s really just a byproduct for us. We did it because the search was cool and because the bouncing ball interactive transcript was cool. So, yeah, the captions are great, but accessibility is like a happy accident for us.
JOSH MILLER: That’s really– yeah, Wendy, go ahead.
WENDY COLLINS: I just want to chime in and say I couldn’t agree more. We did it for the right reason. And now we’re finding ourselves building on the benefits that it brings.
Now that we’ve got 100% of our content captioned, we can introduce a full transcript cross-search. Whereas before, we were relying on metadata. We were relying on somebody tagged it or somebody wrote a little description about it. Now our customers have the ability to search every word of every video in our platform.
And just to build on the comment about the international audience, we also have an international audience. And what we did was take the transcripts and add Google Translate to the transcripts. So its real-time translatable into, I think, 63 languages they support. So the plug-in from Google Translate works with the plug-in from 3Play, and magic happens with a single click of a button that we can have not only English accessibility, but multi-language accessibility.
BEN LABRUM: You bring up an interesting point with the Google translator. It’s not always, when you do a straight computer translation–
WENDY COLLINS: It’s not perfect.
BEN LABRUM: It’s not going to be perfect. Do you ever go back and have to fix something that someone says, hey, this came out and said the wrong thing.
WENDY COLLINS: No, we put a disclaimer up that it’s as-is. And I think people understand that in terms of it’s an English-centric product. And it’s just a value-add for our customers that might not want it. But the cool bouncing ball factor works in different languages, which is really neat to see.
THOMAS AQUILONE: And you guys both made excellent points. We found an additional value of English as a second language, not just translation, but helping those people that might be challenged in reading or hearing–
WENDY COLLINS: The other way.
THOMAS AQUILONE: –a little bit, but now all of a sudden they get that reinforcement. And it, again, adds value.
BEN LABRUM: Yeah. I hate to hammer this but the last thing is it’s really like a cognitive reinforcement, too, right? Because if you’re paying someone with a $50 bill, you don’t want them to try to give you a change for a $20. So you say, here’s a $50. And hopefully, they don’t try to short change you. So when you read it and you hear it at the same time, it helps you remember better.
JOSH MILLER: That’s a great point. So, Wendy, you also mentioned that you were dealing with the accessibility laws from a while back. And you started talking about the feature set that’s been enabled. So how has your view of the accessibility piece changed? Or how much is that still a driving factor at this point?
WENDY COLLINS: I think like we’ve all said, it sort of is a byproduct, like it’s almost become secondary just because now we have it and now we’re focusing on the features. But it’s definitely necessary. I mean, we couldn’t do business in some big states such as California. Again, as an example, they would not subscribe to our service. Any community college, any college institution, they have a mandate. They cannot buy any media that doesn’t have captions.
So it’s really important to us for that factor. But that’s just become a been there, done that. It is what it is. And it’s great. But we’re really focused on building out the features on top of it.
JOSH MILLER: Makes sense. And so Tom, in your role where you’re dealing with internal buyers, what is the hardest part of overseeing that type of process? And especially where accessibility is something you’re trying to push, what becomes some of the big challenges for you in that process?
THOMAS AQUILONE: Well, I started out by saying that live captioning can’t be done except by broadcasters. So we had a big bias inside the organization. They would see sporting events. They would see big events on television, or even the news, and when they see live captioning, they say, oh, that’s for them. And so getting past that bias.
Unfortunately, for us, we tried to do it ourselves for a while. And that meant high cost, long turnaround times. And the information, the webcast, was kind of stale after we took a week or two to come up with a closed captioning file. So that impression, overcoming that impression with our customers– that it was expensive and it took too long and only professionals out there in the broadcast world can do it– are the big hurdles.
And each time, we overcome that. I mention about the cost and we have to remind them they may have somebody in their organization that could benefit from this. But when they hear what the real cost is and how quickly they can have it, move of them say, why not?
But it’s that bias. It’s us trying to befuddle ourselves to get it done as what we thought was cheap at the beginning by keeping it in-house probably was a disservice to our customers and to ourselves, struggling to do that when there’s professionals out there that can keep the cost down for us and turn it around very quickly.
JOSH MILLER: Yep, that’s great. So let’s keep going on this budget question, because this is an interesting one for people. Ben, how do you guys do this? How do you justify the budget of the transcripts, captions? How do you account for that?
BEN LABRUM: It’s good question. The way we do video, we hire a crew to come in. So it’s already not cheap. It’s not like we’re doing like you guys do, a lot of screen cap, Piyush. We do full video. And we have to have the crew there all day. So it’s already an expensive thing. But we really believe that this is the future. And so we’re making this investment already.
I figured it out. And transcription works out to be roughly like 7% of our overall production cost, not counting our internal headcount. So for us, it was baked in from the beginning, because really, we all know from being at Streaming Media East and our experiences that it’s more complicated than that. But it’s not really that big of a deal to just have a video on a web page. If we’re providing this premium content at a premium price, we need to add value any way we can. And that’s how we arrived at doing transcription.
JOSH MILLER: That’s great. Does anyone else want to add to that before we move on? All right. So this is good, though, because we’re talking about a process, right? So Piyush, you guys put out an enormous amount of content. So how do you guys look at captions and put practices in place to make that scalable?
PIYUSH PATEL: Right. So the key word is scalable. We started this adventure seven, eight years ago. We bought the foot pedals and the software and hired people to write. And it’s just not scalable. So we abandoned that. We went to computer-based and hardware-based. And that didn’t work out well.
And so we’ve built a whole pipeline that takes our content, shrinks it down, because the person transcribing it doesn’t need a huge video. That allows us to upload it to the servers where your guys can pull it down and transcribe it. And then it’s all automated. So the minute it’s done, it gets fired back into our system, checked off, transcript is available. And off we go. We’ve built some pipelines in place to make that process very efficient and seamless from human interaction.
JOSH MILLER: Great. Excellent.
WENDY COLLINS: We have, just in terms of workflow, interesting metrics data that still boggle my mind. When we were doing the old-fashioned way– the foot pedal– we would send it out. And the typical turnaround time from when we had the master in-house– because we’re licensing this content rather than producing it– we wouldn’t get the captioning back for two months, typically two months. And then we still had to build the DVD and get it out and make it available.
So from a two-month turnaround to an average of two to three days from when we have the master in-house now, same workflow. And we end up having the captioning available for our customers online. So I mean, you can’t even measure that. That’s a ridiculous amount of workflow compression. And the advantage is our customers get the content sooner. And they get it with the benefit of captioning.
JOSH MILLER: Right. And Tom, you’ve been at this for a while as well.
THOMAS AQUILONE: Yeah, and what Piyush said is exactly right. We tried to do it in-house, and tried exactly the road that you talked about. And now our process is we don’t even need to make a phone call. All we have to do is have the captioner listen in to the webcast. Either they can listen over the web, but there’s some delay there. But if they can connect by a phone call, then we get a link. And that second link opens up. It’s live. They can position anywhere they want on their page.
We have the interface that we typically use with our engineers and our presidents and vice presidents that has a PowerPoint slide on one side. There are audio and video on the other. And then this other window you can position anywhere on the frame.
And those two things, a phone call and the link so they can open up the window, makes life so much simpler. And then about two hours later, they give us the closed captioning file. And we’re ready to go for the next month, year, two years.
JOSH MILLER: Great. All right. So we tried to cover a bunch of different topics just now to hopefully stimulate some questions. So we will open it up for questions. So please raise your hands. I’m going to repeat the question just so that everyone can hear it, because we do not have a microphone to go out into the audience. Yes, go ahead.
AUDIENCE: I just wondered if any of the panelists had any experience with speech-to-text apps? I mean, obviously, they are not quite there yet. But are they good enough?
JOSH MILLER: Great. So, thank you for projecting your voice. I think everyone heard, which is great. But for the video, just the question is about speech-to-text, and experimenting with some of the speech-to-text technology an applications out there, and whether any of them can be used effectively.
THOMAS AQUILONE: So we’ve tried them. Great question. We thought this was going to be the answer. And we tried about six different methods. And we came up with, again, doing it in-house, taking almost as much time as starting from scratch. And so doing it– caption– if it’s 80% there, it’s still gibberish. So you really got to be pretty close. 90%, 95%– 100% we’d like to be. And we get pretty darn close right now. So it hasn’t been fruitful for us.
WENDY COLLINS: We experimented with a few different tools out there. And we’re not happy with the results either. Again, serving the education market, they’re pretty critical of typos or gibberish. And so we had to be really careful.
One of the things we really like about the 3Play technology is that it’s pretty close to perfect. But it gives us tools to make instant corrections if we find an issue with it. And that, I think, of any system should be a requirement. Whatever speech-to-text or partial recognition software you’re using, make sure it has an ability to go in and make a change, because perfect is hard to achieve.
PIYUSH PATEL: And I can add to that, too. Our content’s highly technical content, so the time it takes to build out the dictionary. We experimented with a piece of hardware in November.
It’s a hardware-based speech to text. And the first sentence was– and I kid you not– it was something like, Britney Spears found a squirrel. And it was like, what? Wrong video. And it just thought these were words that it was trying to hear and understand. And the dictionary was a probably pop culture dictionary.
JOSH MILLER: Yeah, and I’ll just add what we do is a hybrid approach, where we use speech technology. But we clean it up with humans. Speech technology has improved quite a bit over the years. It really comes down to the application.
Closed captioning and transcription is very, very difficult to apply pure technology to without any other process. So there are applications out there that pure technology makes a lot of sense. Yeah, Peter?
AUDIENCE: My name’s Peter Crosby. We’re 3Play partners. And Wendy, your experience with Google Translate as being kind of good enough, how does everyone else handle translations for training or for technical or sensitive issues, compliance issues? How do you handle translations?
JOSH MILLER: Yeah, so question about how Google Translate might work in some cases, or not in others, and how other people have approached the translation question.
THOMAS AQUILONE: I’ll start. In rare cases, we have done translation, but rare. And when we do, we’ll have a service do that, and that will make sure that is accurate.
PIYUSH PATEL: Yeah, and from a software training perspective, it’s very difficult because we could change the transcript into, say, Japanese. But the application they’re using has a Japanese UI. And they’re looking at an English UI reading Japanese words. There’s just a huge disconnect. And plus, not having a native speaker, they don’t pick up on the subtle comments and thoughts that an English speaker would have talking to another English speaker. So for that reason, we don’t convert the text.
JOSH MILLER: There’s a question here.
AUDIENCE: Yeah, Wendy, before you mentioned the full-text search. Do you find that to be a double-edged sword? You get people who are very dissatisfied when their word comes up [INAUDIBLE] once of a 50-minute video?
JOSH MILLER: So the question is about full-text search. And can that be a double-edged sword of sorts. So if someone is searching for a particular keyword and maybe not finding what they’re hoping to find, is that potentially an issue?
WENDY COLLINS: So we really wrestled with this and tried a lot of different approaches. And where we landed was, because we’re delivering full videos– these are typically 40-, 60-minute videos– we do offer them segmented into smaller bites. But we opted to do the full transcript search on the full transcript of the video. And we return relevancy back for the full title.
So if you do a search for the word biotechnology, it’s going to tell you this video has the word biotechnology in it the most times. If you then think that looks interesting to you and you click through to view that video, in the transcript, we highlight everywhere the word biotechnology is. So we’re not returning sentence level or single-word level results. We’re rolling it up into the full title for that exact reason.
The second thing I want to say is that it’s an option. It’s not our default search. We make it an option for our users. So they can choose to search our title metadata, our segment level metadata, or the full transcripts. So we give them three choices to find the right fit for them.
JOSH MILLER: Go ahead.
AUDIENCE: Hi, Josh, I’m with T-Mobile. You’re our great partner. I can’t say how great of a turnaround time you guys do for us. So definitely use them.
JOSH MILLER: You want to just come up onstage?
AUDIENCE: And I work with internal video at T-Mobile and closed captioning that. My question is, I guess, more for Thomas and Ben. For scripted type of production, we have to closed caption, primarily because all of our frontline don’t have speakers on their machines. So they can’t hear video. All right?
But we’re getting into employee-generated video. And my question is, do you guys feel compelled to closed caption that as well? Or is there a tipping point where, well, only for produced video we’ll do closed caption. But employee-generated video, no.
JOSH MILLER: So this is a really interesting question that’s probably come up in a number of workplaces. So more and more enterprises are enabling their employees to generate their own content and put it up online to share with the company. So the question is, how do you handle that? How does that play into this captioning decision-making process?
BEN LABRUM: I would just tell them to caption it themselves. I know we’re with 3Play here and everything. But I do have a colleague. And we’re letting her borrow our video service.
She has a few videos that she’s produced, doing screen cap and voice-over with them. And there’s just not that much content. And she wrote it all. She did it herself.
So she can just type it up. And you can put in the time codes pretty easily. It’s just an XML file. And that would be one idea if you don’t want to pay, especially if you have a ton of this random stuff coming in. It may be that maybe you need to put that back on the person who created it. That’s my first reaction. I don’t know. Tom?
THOMAS AQUILONE: The first iteration of user-generated content at Lockheed Martin is coming out of Sunnyvale for us and our space systems company. And there, they have what they’ve termed, for lack of a better choice, SpaceTube, like YouTube. And so they have user-generated stuff.
And they made the exact same decision that Ben talked about. If this is important to you, if you feel that there’s value in adding a transcript, because you put this together yourself, you may have that source material. So that’s basically the answer. But I got to say the scripts are wonderful when we do our stuff, because then the material lends itself much easier in the professional productions for captions.
JOSH MILLER: Yeah. You have a question?
AUDIENCE: I was just wondering how the transcription services at 3Play deals with multiple speakers?
JOSH MILLER: So that’s aimed for us. I mean, I don’t want to spend too much time on what we do. But we have a process that does involve technology. And then there’s a human cleanup process. So that is part of the human cleanup process.
WENDY COLLINS: I just wanted to address the question about UGC content. Because even though we’re not delivering to corporate space, we do have a small component of UGC that can be uploaded to our system. And we made the decision to not caption, to not do anything.
And I don’t think it was the right decision, because there’s an expectation. Once they commingle content, and especially because we’re now at 100% of our content being captioned, the user may not be the same person who’s uploaded it. And they don’t understand why it’s not captioned.
So they, ultimately, are calling us to complain, even though it’s not our content. It’s just commingled with our content. So we’re actually considering allowing that to pass through the 3Play process as well because–
AUDIENCE: That’s very interesting.
WENDY COLLINS: –it’s a burden for us to have to explain why it’s not captioned. There’s this expectation that once it’s part of a universe, that it should all be the same.
AUDIENCE: I mean, you guys are pushing it back to the user. But I don’t know what tools are even available. Once they generate it and upload it, I don’t know what’s there for an employee to go and caption their own type of–
THOMAS AQUILONE: I don’t remember the name of one in particular. But I was just out at Mediasite’s users conference, Unleashed. And we had a session on captioning. And there were a number of educators that have shareware or freeware that can do it. And it’s only tedious to the point where how accurate do they want it to be and how much time do they want to invest in it.
So there’s tons of tools out there. And depending whether it’s going to be a DVD or if it’s web-based or whatever, but I’m told that it’s easy to do. Take that in quotes.
PIYUSH PATEL: You got to look at the cost, though. I mean, you’ve got highly paid, highly technical people. And you’re telling them to write up an XML file. I mean, what’s the cost-benefit, right? It’s just like, send off the file and get it done.
JOSH MILLER: You go ahead.
AUDIENCE: An observation and then a question. That’s all great. And that means a lot of people aren’t going to do those videos once they find that out, because then they’ve got to learn how to do that.
I’m at the State Department. And new forms of video-type content are coming out every day. And just like the people you’ve heard of in the big time encoding, where they’ve got 40 formats to deal with, we have the same thing coming out all the time. So just an observation. We don’t know what to do with that, either.
So we’re at the point where we have to decide because we’re government. And we’re supposed to be doing everything. But what is everything? What is video anymore? So it’s a big question.
JOSH MILLER: Interesting.
AUDIENCE: The question is, has anybody had any experience with teaching systems specific speakers– corporate executives and so on– having the systems taught with their particular speech patterns and word recognition? And has that ever worked for anybody?
JOSH MILLER: Great. So first, the observation that I think is a real issue is the fact that video technology is changing constantly, very fast. There are many, many formats to deal with, many devices to deal with. That adds another challenge for this accessibility issue as well.
And then the question was, in terms of experimenting with some of the speech technology out there, in terms of training the system to– so if you do have certain key executives who are speaking on an ongoing basis, being able to try to leverage some of the technology out there by training the systems.
THOMAS AQUILONE: Just tried it, and tried it, and tried it again. And it’s so inconsistent and nowhere near 50% accurate, in my experience. So we had to go another way.
But I want to comment on what you first asked about, because we have government customers who are required to caption everything. And so even though we as a corporation aren’t required to caption everything, our customers say to us, why aren’t you doing that?
So I’ve put UGC in a different box right now. It’s not actually my decision. But as we look at it, we’ve put UGC in a different box because we are trying to do everything we can to encourage our leadership, our managers, and then lower down our compliance training, all that stuff, to be captioned. So we’re in the same area as you on that.
JOSH MILLER: Yeah?
AUDIENCE: Talk about the extension of where this goes, we’re obviously talking about hearing impaired and the captioning. But then last year, they required a video descriptor, where if someone’s visually impaired, you actually have an audio that’s inserted that will tell you what’s going on.
Are any of you guys involved with that? And are you seeing it gain acceptance? And is anyone really requiring it other than the mandate that went into effect last year?
JOSH MILLER: Great. So the question is about the new audio description requirement, which is part of the accessibility law as well. And that’s for visually impaired people. And just so that everyone understands what that is, it’s literally an audio track that is describing what is happening in the video so that someone who can’t see what’s on the screen can actually hear a description of what’s happening within the video. So the question is, has anyone experimented with that?
PIYUSH PATEL: We did. We did back, I’m going to say, five or six years ago on DVD. We got a grant. We did about 100 of our titles. Without the grant money, it would not have been possible to do that. It would have been cost prohibitive.
We saw no return on that investment. We got no requests for that type of content from our customer base. And we ultimately stopped pursuing it. So it has not been driven by our customers as a need.
THOMAS AQUILONE: We have talked about that. And because people can see what’s going on, but not hear it, at least they’re not left with absolutely nothing. But unfortunately, on our side of it, if it’s a talking head of a leader of a company, it doesn’t communicate a whole lot when you see it.
JOSH MILLER: In the back?
AUDIENCE: So we’re a government contractor down in DC. And live streaming meetings have become a huge part of what we do. And we actually use a system very similar to what Tom described, where have a captioning vendor who dials in on the phone line, back up. They’ll watch the feed. They give us a plug-in that goes into the micro-site that’s got the stream. And it’s got the pop-up and all that.
But we’ve had government customers ask us, why can’t there be a little captioning button in the bottom corner, like a YouTube video? I know there are systems out there. We haven’t seen one that I think is up to what we need it to be. But have any of you worked with systems like that?
JOSH MILLER: So the question’s about live streaming content and adding captioning to the live content. And right now, a lot of times what you see if there’s captioning, you open up this window. And it’s this window on the screen as well. It’s not the traditional little CC button that’s on a lot of archived or video on demand. So the question is, has anyone experimented with improving that experience?
THOMAS AQUILONE: So we tried to get that function to work right away. And there was talk with different developers and different manufacturers. And we arrived at this situation where we have another web browser pop up, and you can position it. But at least with the Mediasite product, once we have a SAMI file, that CC works. And it works great. And it’s built into the interface.
JOSH MILLER: And that’s after the fact? Is that on demand?
THOMAS AQUILONE: Yeah, closed captioning is after the fact. Yeah.
JOSH MILLER: Got it. Has anyone else experimented with that? Any other questions? Yeah?
AUDIENCE: So we’ve been talking about the business case. So I have a question that’s a little bit more technical, related to HTML5 and closed captioning and the challenges with that. Are you guys all Flash-based closed captioning? Or are you doing HTML5 closed captioning?
JOSH MILLER: Great. This is a really good question. And it goes back to what was brought up before, which is that there are many different media formats, many devices to deliver content to. How are people of addressing this HTML5 issue? Especially iPhones, iPads– it’s almost like a whole other even type of HTML5, in a way, just with the HLS encording, if anyone’s played with that. So the question is, how have you people addressed that issue?
THOMAS AQUILONE: Well, we’re Windows media-based and Silverlight-based. And it works great that way. We’re right on the cusp of moving over to mobile devices and some of these other things. So we’re waiting to see how it’s going to work out.
WENDY COLLINS: Our primary delivery is Flash and H.264 flavors. But we are very plugged in to the HTML5, coming soon. We don’t deliver that way yet. But we’re doing a lot of experimentation and having some good results with it. Nothing that we’ve deployed yet to our customer base. But we’re actually really having good results with some early testing.
AUDIENCE: Closed captioning?
WENDY COLLINS: Closed captioning, correct. I mean, one of the advantages, again, just going back to, remember how I said the education market doesn’t move so fast? It took them a while to go from VHS to DVD. So we’ve got a little bit of time before the mobile revolution really hits the education market. It’s coming. We know it’s coming. But we’re not there yet.
PIYUSH PATEL: Yeah. And we’re going to modify the playback experience and just put our own bar in. And we can control that experience that way.
WENDY COLLINS: But let me just say that all the files that we’re getting, we’re collecting them with the intent that they’re going to be reusable, not one and done. We want to make sure that we had access to those files so that we can do things differently as new technologies emerge, as new formats come out.
AUDIENCE: Which caption format are you using?
WENDY COLLINS: I think right now– you might now better than I do.
JOSH MILLER: Well, it’s an SRT file with the Kaltura player.
WENDY COLLINS: Right. It’s the SRT. But we’re extracting other formats and archiving them for future use as well.
AUDIENCE: OK, so no WebVTT?
JOSH MILLER: Not yet. I mean, the WebVTT, for people who are interested, it’s an emerging standard that will likely be the HTML5 standard. It’s very similar to what today looks like an SRT file. The big difference with WebVTT is it will be able to handle more style information and things like that that are going to be key requirements of some of the legislation. So it’s a more user-friendly format, certainly. But it isn’t really being used too heavily yet.
Any other last-minute questions? All right. Well, thanks, everyone, for joining today.
WENDY COLLINS: I think we got one in the corner.
JOSH MILLER: Oh, yeah. Yeah. Sorry.
[QUESTION FROM AUDIENCE]
JOSH MILLER: So that’s a great question. So the question for everyone is, how do you handle this search function? So we keep hearing about how this is one of the added benefits of the whole captioning process. So how is that actually being put together from an infrastructure standpoint and then deployed?
THOMAS AQUILONE: So with our Mediasite system, that’s part of the content management system. So when you do the search, and you have captioning, it goes through all that.
PIYUSH PATEL: Yeah, we grab the file and then store that within our database. And that becomes part of the full text search process.
WENDY COLLINS: Same here. Were storing it in a SQL backend and searching accordingly.
JOSH MILLER: And one of the nice things about captioning in terms of search is that it is more of a pure tech search. There are things that can be done to connect it with the video. But luckily, because of the format that it’s in, it doesn’t have to be too complicated.
Great. Well, thank you to our panelists. You did a great job, really appreciate all your time and insight on this. They’ve been nice enough to offer their contact information. So if you do have follow-up questions, they’ve offered their email addresses. So feel free to reach out to them and ask them after today.
I’ll be up here if you have questions about 3Play. Happy to answer any questions about how we do things. And I think we have a few minutes to stick around as well. So thanks, everyone, for joining us today.