Leveraging Closed Captions and Transcripts with Kaltura
CJ JOHNSON: All right. I guess we’re ready to kick things off here for the “Leveraging Closed Captions and Transcripts” session. My name is CJ Johnson. I’m a founder and head of product for 3Play Media, here with special guest stars Matt Bochniak from Johns Hopkins University and Wendy Collins from Infobase Learning.
So just the agenda for the session, I’m just going to give a quick intro to 3Play Media, who we are, what we do. Then I’ll hand it off to Matt to talk through his adventures in accessibility with Johns Hopkins. Then he’ll hand it off to Wendy who will talk through some of the interesting things you can do with captions to leverage it for search and user engagement activities. Then we’ll end with a Q&A.
So 3Play Media. We started about seven years ago. We were a spin-out from MIT. Our focus is in using natural language processing tools, such as automatic speech recognition, natural language processing, to create closed captions and transcripts for media. We use that as sort of a start to the process. One of our specialties is in cleaning up speech recognition to make it over 99% accurate so that your captions, your subtitles, your transcripts aren’t sort of garbled messes of text. They end up being very accurate. We output them in a number of formats for a number of different use cases across the web world and across other media, such as iTunes and DVDs.
We’re based in Cambridge, Massachusetts. We have 800+ customers at this point in time across higher ed, enterprise, government, and other online media entities. We’ve enjoyed a very good relationship with Kaltura over the past five or six years. We’re actually one of the first apps on their app exchange and over time have grown to over 50 mutual customers, creating integrated solutions across the two platforms which has been a fantastic experience.
A quick overview of our products and services. I mentioned captioning is sort of the core element of what we do. We create timed text for media. That timed text can be derived into captioning formats such as SRTs, SECs, DFXPs, if you guys are familiar with that– that’s used across the web– DVDs, iTunes, and transcripts for SEO. And we also do translation, so you can have subtitles in other languages.
If you have transcripts already that aren’t synchronized, we have a transcript alignment service so you can synchronize the text to the media. And you get all those outputs in the same way as if we did it from scratch. If you have a lot of media to process, or if you have systems in place and developers on your team, you can leverage our APIs to create automated workflows so that it can be a very manageable and automated process end to end.
And finally, we offer for user engagement video search plug-ins. I’ll show you an example of that. This is actually an example on one of Wendy’s sites. This is an interactive transcript. It plugs into the Kaltura player. And with one line of code on your website, you can create this widget that actually synchronizes the text word-for-word to the video so that as you watch it, the text will play along as the person is speaking. You can search through the video for a keyword and click a word in the document in order to seek the video to that point.
In terms of our integration with Kaltura, if you’re using the Kaltura KMC, you can actually input some Kaltura credentials into your 3Play account. It actually links the two accounts so they sync up to each other. That way when you go into the KMC, for each entry, you can just tag it or put it in a category called 3Play. And then those videos are pulled into the 3Play system. They’re captioned automatically. And we post it back to you.
We actually created a MediaSpace module for anyone using MediaSpace. That’s going to be launched in the next month or two. That is actually pretty similar to the KMC in that users of MediaSpace, when they upload a video, will be able to click a button to send it to 3Play for captioning. When it’s processed, it’ll automatically post back. Interactive transcripts will be available.
For large organizations, we have a role-based approval workflow. So that when someone requests captions, you can actually have an administrator who will approve the process for captioning.
So use cases and benefits of time-synchronized transcription– there’s accessibility for deaf and hard-of-hearing people. It’s great for people who have English as a second language. It allows flexibility for noise-sensitive environments, search, navigation, SEO, and can be used for translation.
So the question is– if you use your mobile phone, you can answer the question– what percentage of your videos have captions or transcripts? All right. So we have some responses. Looks like a lot of people have captions. Does anyone in here not have captioning for any video? One person. [INAUDIBLE]
All right. Great. So looks like the vast majority of people here are using captions. That’s wonderful to see. I’m going to hand it off to Matt now who’s going to talk about his experiences at Johns Hopkins in accessibility.
MATT BOCHNIAK: So my name is Matt Bochniak. I am the multimedia, I guess, supervisor for the Graduate School of Engineering at Johns Hopkins. We’re a part-time graduate engineering program. We have 11 fully online master’s degrees that you can get through Hopkins fully online.
It’s a worldwide program. So we have students all over the world. We know that because of how they register. And we also know that by our usage. With Kaltura, we can track where they’re coming from. And just last year– I pulled the numbers before coming up here– we had almost 2,000 students enrolled in fully online classes.
Update on that number, we put another 100 in from last week. So we’re at 5,700 videos in our current catalog right now. So we’re starting to develop for the fall class. So we’re seeing about 100 classes a week of new content going into Kaltura.
The average runtime is about 15 minutes per video with our program. And I’ll show you a quick little example of a video right now. This is from our systems engineering program. And 3Play did the closed captionings.
-Wow, they’re having a pretty bad day. This is CJ Utara, introducing you to Module 12. In this module, we’re going to be talking about the people aspects of effectively managing a project. In this presentation, we’ll be concentrating on conflict in project management.
Conflict typically has a negative connotation. It’s often looked at as a bad thing. But it can also be constructive as well. So in this presentation, we’ll concentrate on both aspects, destructive and constructive conflict, and how the project manager can effectively use it to–
[END VIDEO PLAYBACK]
MATT BOCHNIAK: Basic stuff. I mean, 3Play definitely does all of our closed captionings. They do a good job with it.
How many of you are in the education realm? Wow, so there’s a bunch of you guys. That’s good. Because I’m dealing with the same problems you guys are with closed captionings.
So our current policy right now in our division of Hopkins is that we closed caption videos in which a student that is enrolled makes a formal request for accommodations. That’s generally when we do our closed captioning. So right now, we have a handful of students. We know who they are. We track within our student record system. We see what classes they’re signed up for. And generally a week before that class runs, we submit those courses to 3Play for 3Play to transcribe and closed caption.
We also have an open door policy with the faculty. No faculty member has requested it just yet. But if the faculty member requests it, we will also have them closed captioned or transcribed.
In the future, it’s going to be mandatory that we have all the videos closed captioned. I mean, it’s just a matter of time, guys. I ran the numbers before coming here. And it was around $325,000 to have our full catalog transcribed. So it’s a big expense that we’re looking down at that we’re going to have to figure out how to manage this moving forward.
So the strategy that we have come up with within our own department is that it’s delegated amongst all of our instructional technology staff. So how our division is set up is we have our faculty member working with instructional designers to basically design their course in the online realm. Once when they start working and videos start coming in, then we start sending them to 3Play. And 3Play will do the closed captioning and transcribing. There’s transcripts, then, we put into Blackboard as well as the video. So students will have the ability to download the transcription or watch the videos with closed captioning.
I think one of the benefits of having courses closed captioned is the benefit that the student can both hear the instructor talk as well as read on the screen. So you’re getting both, two ways of that data coming into the student. And I think that facilitates learning a little bit better.
So that’s one of the benefits. But it comes with a cost, obviously. And I think that everybody in higher ed is, like you, definitely looking at that. So, CJ?
CJ JOHNSON: Thanks a lot. So we have a second poll here. Same process as before. What is your main reason for adding captions? As we saw before, a lot of you are doing captioning. So we have accessibility, search, SEO, and other. Text your responses, please. And if there are any others out there, if you’d shout them out, just so we can hear some other ideas.
All right. Looks like we have a pretty clear winner in accessibility. It’s kind of the main purpose of captioning. Nice to see some people starting to leverage them for search and SEO. And no others out there? There’s the three categories, OK. Very interesting.
So I’ll hand it off to Wendy now so she can talk through how she is using captions not just for accessibility, but for some of these other categories– search, SEO, and user engagement.
WENDY COLLINS: If there had been an all option, I would’ve clicked it. Because I think what I’m going to talk about is, from my company’s perspective– and I’ll tell you a little bit about who we are in a second– we started off going down the transcript road for accessibility. And it quickly evolved into a lot of other benefits that we’ve seen using that technology. So I would’ve checked all.
So my name is Wendy Collins. And I work for a company called Infobase Learning. And we work in the education space with a lot of different types of media. Video is just one of those. But in order to deliver video to our customers, we’ve created a platform that’s built on top of Kaltura and integrated with 3Play, obviously.
It’s called Films on Demand. Any Films on Demand customers out there? Anybody using our platform to deliver educational videos? OK, so come talk to me later.
But we’ve been around for about eight years– nine years, actually. I’m proud to say that we launched the Films on Demand platform one month after YouTube. So we missed it by a month to say that we beat YouTube to the streaming video space race.
But we’re very proud of how our platform has evolved. Certainly YouTube has gone on an unmistakable trajectory. But our platform is pretty impressive in its own right.
We currently have about 20,000 full-length videos that we serve up across all of our different content areas. And we break them down into segments to make them very usable in different learning management systems or different environments. So we’ve got about 350,000 video bites, if you will, that range from two minutes up to about five minutes. And they’re pretty amazing nuggets in and of themselves.
We represent about 850 different producers on our platform. And I’ll show you some examples in a second. We span across 40 different subject areas, which makes it probably the most comprehensive video platform in higher education.
We’ve surpassed 100 million views over the last few years. And we’re working with about 2,000 schools and about 13 million different users. So that’s just some of the interesting statistics.
Here is a quick snapshot of our content library. These are the 40 top-level subjects that we offer. As you can see, it’s all across the board. We have a tremendous amount in some of the larger subject areas like psychology, health, English, business. And then we’ve got a lot of other smaller repositories in the different subject areas that you see there. So it’s pretty comprehensive.
Here’s a snapshot of some of the key producers that we represent, some of the big names like the BBC, NBC News, PBS, TED Talks, the History Channel, on and on. Like I said, we’ve got over 850 producers. And this is just a snapshot of some of the marquee ones that we represent.
In terms of the video platform, we break it down into really five different key areas that we work on. The first is playback in the first column here. And these are the things that make users want to watch the video. We’ve got the player features the make it easy like dynamic bitrate switching, HD quality. We’re really mobile-optimized at this point. And we provide different viewing experiences for widescreen versus the 4-by-3 technology.
And all of these things I’m talking about here are all built, like I said, on top of the Kaltura engine. So we look at integration and how we can take those video segments that I talked about and allow our users to plug them into their different systems that they use. And you can see some of the systems and the way that we do that there. In terms of the third column, tools, these are all the functionality that we’ve wrapped around our videos to make it easy to use in the different systems and for our different users.
The two I wanted to talk mostly about are the last two columns. We put a lot of emphasis on search and how we can help our users find and use the content that they need. And obviously accessibility, which is why we’re here today. In the search area, we’ve done everything from basic search all the way down to building out an XML gateway that allows different third-party systems to integrate with our content to pull it up into a federated search engine, for example, or different discovery tools that are out there.
On the accessibility side, we define accessibility two different ways. We define accessibility in terms of making sure that our content is viewable anywhere, anytime, on any platform. And we also define accessibility in terms of making sure that everybody can understand and comprehend it with the captions and the transcripts.
So I just wanted to give you a little bit of context about the platform as a whole. And then I wanted to kind of switch gears and talk about the transcripts and the captioning piece of it. This is a screen. You saw an early version that CJ showed you as well. But every video in our platform has a screen that looks like this.
Obviously, the video playback is very prominent. But so is the option for the closed captioning and interactive transcripts on the side. And we’re pretty proud of the fact that our transcripts get a lot of use, a lot of traction, a lot of visibility within our platform.
In terms of sort of the back story for how we started this process and where we are today, back in 2001, I think many of you in the room probably remember VHS. It was still the primary format. And we were captioning only about 30% of our content at the time in terms of the VHS format.
And the typical turnaround time for that was six months. So we’d actually ship the VHS tapes literally across the country, get a master back, and be able to service our customers with that six months later. We would spend about $500 a title. And we would only do about– 100 of our 600 new titles every year would have captioning because it was so cost-prohibitive.
In 2006, the first format change came with DVD. And we started to recognize the value of having our content captioned, especially in the education space. So we bumped it up to about 50% of our entire library. We made that investment.
We got the turn time down to about four months, I think mostly because we then started FedExing our content instead of slow shipping it. So we were still physically shipping masters across the country. The price point stayed about the same. And we increased just a little bit what we were able to do every year because it was still pretty cost-prohibitive.
In 2009, the streaming world changed a lot of things. We were still doing DVDs. But it was starting to turn to the streaming world. We started captioning about 65% of our content. So the universe was growing. We got our turnaround time down to about two to three months, our cost down to about $400 a title. But we were still only captioning a portion of our library. So we kept digging a bigger and bigger hole for ourselves as we went forward.
Fast forward to 2012 where we started working with Kaltura and 3Play and the streaming realm. And now our library stands at about 98% fully captions and transcripts. The final 2% is really things that aren’t easily transcribed– silent movies, kind of hard to do. So we leave a little room there for not being able to say we’re 100%. We’ve got our turnaround time to about two to three days. We could do it faster– I should say, 3Play could do it faster– but we kind of meter that on our end to kind of keep the flow of content coming into our system.
You can see our average cost per title. And we’re now going 100% all in with our new content coming into the system. We’ve upped our ingest process. So we’re doing about 5,000 new titles a year. And they’re all at that 98% mark in terms of being captioned.
So it’s a pretty good story on our front to be able to tell how we’ve moved into that realm. What is also pretty good story is how we’ve started to evolve, as I said at the beginning, our concept of the value of captioning and the value of these transcriptions. So like everybody else, we sort of started off in what we call Phase 1. We were just thinking about captions and transcripts, and the value from the ADA compliance level.
We immediately got an initial benefit out of the printing element. This is something that we didn’t really think would get a lot of use. And it turned out that the first year that we rolled out captionings in our online system, people just want to print stuff. It’s amazing. We live in an online world, but we’re still paper-based in a lot of ways. So whether for research purposes or educational takeaways, we were seeing a tremendous amount of printability of the transcripts, which was kind of a surprise to us. But it sort of came along for the ride when we introduced the captions.
And then we sort of went down this next path with search enhancements. And this is something that we had in the back of our minds when we started with the captioning online. And it’s definitely proven to be worth our investment in that area.
We’ve seen three different ways that searchability has really come into play with our transcripts. The first one there is to be able to search within a video. So if you look back on the screenshot or you’ve seen some of the presentations today, you can simply now search all the words, all the text within any video in the collection. High usage again.
The second searchability aspect comes with the option to be able to search across all of your videos. So we’ve got 20,000 videos. And we’ve created these segments to try to help our users be able to find what they’re looking for. But it’s still not as good as being able to search every word of every video in the 20,000. So we’ve really seen a big benefit to our users to be able to search across all of our content.
The third relates to SEO in a different way. We think of SEO not in terms of how can we get noticed on Google because we’re not a public site. We are a subscription service. But we’ve really leveraged the transcripts and the captioning to improve our internal SEO, to help the right videos rise to the top in our search engine, which we were unable to do when we were just basing it on our metadata, which was a title and a three-word description for a 60-minute video. It just wasn’t weighted correctly. Now we can really weight the relevancy and improve the SEO within our own site.
So we’ve started down what we’re calling Phase 3, which is really not the final frontier, but maybe the next frontier in terms of our platform. We’ve taken the transcripts, and we’ve said, what else can we do with them? The first thing that we started to do, or we are doing, is we are converting them in our system on the fly to over 50 languages in our platform using Google Translate, which is a free plug-in. I’m sure everybody knows that. So we’re not reinventing the wheel here.
But what we’ve done is we’ve connected it up to the transcripts. And so real time, you are watching a video, and the transcript is playing alongside of it in French or Spanish or Portuguese. And the user doesn’t have to do anything other than click a button.
So it’s pretty powerful. And it does everything that the regular transcript does. You can click on a word. You can search. You can do all that. You can print. But it now has the ability to service a whole different population of ESL users or foreign students.
The next one we’ve done is to take our transcripts and use them as the foundation for allowing our users to create their own clips. So we let our users, if they don’t like how we’ve segmented the video into little chunks, they can now make their own. And the way that they identify the in and the out points is to actually use the transcripts. So we’ve tied that into a functionality that lets our educators who are using our system hone in on the two minutes, the 30 seconds, the 10 seconds, whatever it is that they want to have to plug into their learning management system, they can now specify those in and out points. And the transcripts really allow them to hone in on just the pieces that they want.
The last piece here is something that we’re working on currently. For eight years now, we’ve been indexing our content manually. So this is a story, I think, that many content providers can sympathize with. Indexing is very time-intensive, very manual to do to get it right. And we have been now experimenting using the transcript content to be able to autogenerate tags or indexes into our system. So we hope to be able to supplement, if not replace, how we’re currently doing our indexing using the transcripts, running them through one of several different programs that we’re experimenting with to be able to output both semantic and real text tags, and have that be a part of our world of how we’re helping our users to find our content. So we’re not indexing or taxonomizing our content, but extracting those tags from the transcripts.
So all of this would not be possible if we didn’t start with the first two there. And I think that’s the story that I wanted to try to tell, that you have to kind of think about the value. Matt talked about the cost. It is an investment to do transcripts and captioning. Whether you’re a university or you’re a publisher servicing a university, it’s an investment. But if you start to think about everything that becomes possible after you make that investment, it’s pretty powerful.
So I guess at this point, I’ll turn it back over to CJ.
CJ JOHNSON: All right. Thanks a lot. So I guess we’ll do a microphone share for any questions that folks have for any of us. Yes?
AUDIENCE: You mentioned the price [INAUDIBLE].
WENDY COLLINS: So can everybody hear me if I just talk? I guess I should probably talk in here. So our average duration over the years has changed a bit. And when we started, it was about 45 minutes per title because we were doing mostly broadcast videos. And once you take the commercials out, they’re really about 45 minutes.
Now, currently, we’re about 38 minutes per title. We’ve sort of gotten more content that’s coming in short form, which is sort of consistent with the web consumption of content in general. So about 38 minutes per title now.
CJ JOHNSON: Any other questions out there? Nobody? All right. Well, with that, then, thanks very much to Matt and Wendy. Thanks, everyone, and enjoy the rest of the conference.