Quick Start To Captioning – Webinar Transcript
JOSH MILLER: All right. Welcome, and thank you for attending this webinar on closed captioning. My name is Josh Miller. We have about 30 minutes to cover the basics of captioning. And we’re going to try to make the presentation about 15 minutes, and then leave the rest of the time for your questions.
The best way to ask questions is by typing them in the questions window in your control panel in the bottom right corner there. We’ll keep track of them, and address all the questions at the end, so feel free to type them in at any point. And certainly feel free to email or call us after the webinar– we’re happy to have a more in depth conversation if that would be helpful. And if anyone is following along on Twitter, the hashtag for this webinar will be #3PlayCaptioning, as you see on the screen here.
So today, we’re going to give you an overview of closed captioning for web video, including some of the applicable legislation. We’ll talk about the services that we provide as well, and go over just the basic process and workflow step by step.
What Are Closed Captions
So what are closed captions? We’re going to take it from the beginning. Captioning refers to the process of taking an audio track, transcribing it to text, and synchronizing it with the media. The idea is that the captions are actually showing up at the same time as the appropriate image. And closed captions are typically located underneath a video or overlaid on top of the lower part of that video.
In addition to spoken words, captions convey all meaning and include sound effects. And this is a key difference from subtitles.
Captioning Vs Transcription
So some basic terminology. Captioning versus transcription. A transcript is usually a text document without any time information included. On the other hand, captions are time synchronized with the media. You can make captions from a transcript by breaking the text into small segments called caption frames, and synchronizing them with the media such that each caption frame is displayed at the right time.
Captioning Vs Subtitling
Captioning versus subtitling. The difference between captions and subtitles is that subtitles are intended for viewers who do not have a hearing impairment, but may not understand the language. Subtitles capture the spoken content, but not the sound effects. For web video, it’s possible to create multilingual subtitles, and certainly you see that with many movies and some TV shows as well.
Real quick– there was one question that we wanted to address real quick, and that is that this webinar will be posted online, and will have captions with the posted version, as well as a transcript. So for anyone who is interested in seeing a captioned version of this, it will be posted online, and we’ll be sending an email out with a link for that.
Closed Captions Vs Open Captions
Closed captioning versus open captioning. The difference here is that closed captions can be turned on or off by the viewer. Open captions are actually burned into the video, and cannot be turned off. Most web video players support closed captions, meaning that there is a little CC button or some way of turning those captions on and off.
Post Production Vs Real-Time Captions
Post production (offline) versus real-time (live). Post production means that the captioning process occurs offline, or often after the fact, and could take up to a couple days to complete. Real time captioning is done by live captioners– basically, often court stenographers. And there are obviously advantages and disadvantages of each process. And depending on the type of content and the way it’s going to be published is how you would make the decision as to what’s best.
How Are Captions Used
So how are captions used? Although captions originated with broadcast television in the 1980s, nowadays, captions are being applied across many different types of media, especially as people become more aware of the benefits and as laws become increasingly more stringent. Since every video player and software application handles captions a little bit differently, we’ve actually created a number of how to guides, which you can find on our website, under the How it Works section.
Accessibility Laws – Section 508
To talk a little bit about some of the accessibility laws that come into play, Section 508 is a fairly broad law that requires all federal electronic and information technology to be accessible to people with disabilities, including employees and the public. For video, this means that captions must be added any time it’s posted on a website– that is, any website that is any federal content. For podcasts and audio files, a transcript is sufficient.
Accessibility Laws – Section 504
Section 504 entitles people with disabilities to equal access to any program or activity that receives federal subsidy. Web based communications for educational institutions and government agencies are covered by this as well. And section 504 and 508 are both from the Rehabilitation Act. Many states also enacted similar legislation to Sections 504 and 508, and often reference them specifically.
Accessibility Laws – CVAA
The 21st Century Video Communications and Accessibility Act that was signed into law in October of 2010 expands closed caption requirements for all online video that previously aired on television. This law is often referred to as the CVAA. And the idea of expanding legislation beyond just television content is also being discussed, and was actually part of the original bill that was proposed, but it was trimmed down.
So a quick update on the timeline for content owners to implement processes to adhere to the new captioning rules with regard to the CVAA. So again, this is content that previously aired on television, and then is going up online.
So the date that has already passed, obviously, is September 30, 2012. And that requires all pre-recorded programming– meaning shows like sitcoms or dramas, anything that was not aired live– that does not get edited before it goes online. So if a full show goes right up online as it was aired on television, that has to have captions now. That’s the way the rule reads. So any show like that is supposed to have captions online, regardless of the site that it’s being posted to.
So although the primary purpose for captions and transcripts is to provide an accommodation for people with hearing disabilities, which is absolutely critical, people have really discovered there are many other benefits as well, especially with the internet as the medium. So captions improve comprehension and remove language barriers for people who know English as a second language. This is especially important, and something that we hear quite a bit, with educational content.
Captions compensate for poor audio quality or a noisy background, and allow the media to be used in sound sensitive environments, like a workplace or a library. Chances are very good that if someone’s sitting at their desk, they really shouldn’t have the speakers on. So it makes sense to have another way of following along what’s going on.
From a search engine optimization point of view, captions make your video a lot more discoverable, because search engines are able to index it better. The search engines and the internet in general really is a text based world, and having the text equivalent of what is being said in the video content really makes it able to be crawled properly.
Once your video has been found, captions allow it to be searched and reused. This is especially important with long form video. For example, if you’re looking for something in a one hour lecture, you can quickly search through the text instead of having to watch the whole thing, or using that scroll bar to find the exact point that you’re interested in. We actually provide a number of search and interactive tools that we talked about in some other webinars, and we certainly have plenty of information on our website, and definitely would encourage you to take a look at that. But the text itself is quite a valuable tool for video navigation.
And then finally, transcription and captioning is really the first step or requirement to translate video content into other languages. So now with the internet being able to distribute video content all over the world, that becomes something that’s worth paying attention to.
There are many different captions formats that are used with specific media players. The image on this slide shows what a typical SRT caption file looks like. Here, there are just three caption frames displayed. You can see that each caption frame has a defined start and stop time for that particular frame, where that chunk of text would show up.
Once a caption file is created, it needs to be associated with the corresponding video file. The way to do that depends on the type of media and the video player platform that you’re using. For sites like YouTube, all you have to do is actually just upload a captions file to the video page that you’re logged into. Many other video platforms make the process easy as well. We offer a captions plugin that works with many web video players. Essentially, when you embed that video player on a site, and you embed the captions plugin, the plugin will kind of latch on to the video player to create that captions effect. And it even works with video players such as Vimeo, which does not actually support captions at all. So we’ll talk a little bit more about that in a second, but definitely another tool that can be used.
I will talk a little bit about us, 3Play Media, as a company, just to give you a bit of a picture of who we are. The inspiration for 3Play Media started when we were doing some work in the spoken language lab at CSAIL, which is the computer science department at MIT. We were approached by MIT OpenCourseWare with the idea of applying speech technology to closed captioning for a more cost effective solution. We quickly recognized that speech recognition alone does not suffice, but it did provide a starting point. And so from there, we developed an innovative transcription process that uses both technology and humans, and yields high quality transcripts with time synchronization.
So we’re constantly developing new products and ways to use transcripts, specifically time synchronized text. And we do that a lot with input from our customers, so we definitely value feedback, and value learning more about how video content is being used.
Overview of Services
Our focus is to provide premium quality transcription and captioning services. We can also translate into many different languages, and we have some unique interactive tools that I mentioned that all use that time synchronized text that we create to enhance search navigation for video content.
Accuracy & Quality
We use a multi-step review process that delivers more than 99% accuracy when it’s all done. Even in cases of poor audio quality, multiple speakers, difficult content, or accents, we’re able to achieve a very high accuracy rate. Typically, about 2/3 of the work is done by that speech recognition process, the computer. And the rest is done by transcriptionists. And it’s that gap filling that is actually really the hard part, and where we’ve built a lot of technology around to facilitate that process.
And that makes our process much more efficient, really, is that we’re using the human intelligence much, much better. But really, what it also does is it allows our transcriptionists a little more flexibility in the process, because they’re editing rather than transcribing from scratch. And so they’ll spend more time on the finer details. So for example, they’ll diligently research difficult words, names, and places, and will put more care to ensure correct grammar and punctuation throughout a transcript.
We’ve also done a lot of work on the operational side of the business, such as making it possible to match transcriptionists’ expertise to certain types of content. We have about 400 transcriptionists on staff, and they cover a broad range of disciplines. So if you send us tax related content, we can match that content with a transcriptionist who has a financial background.
And without exception, all of the work is done by professionally trained transcriptionists here in the United States. Every transcriptionist goes through a rigorous training program before they ever touch a file, and many will even go through background checks, and all will sign a confidentiality agreement to make sure that we’re protecting the content as much as possible.
Captions Text Editor
One thing we’ve found is no matter how hard we try, certain proper nouns or vocabulary can be very difficult to get right the first time. We’ve actually built the ability to allow you, the user, to make a change whenever you want. That means if a name is misspelled, or if you decide you want to just redact a portion of the text, you can just go in, make that change, press Save, and the changes will immediately propagate through all of the output files– that’s transcripts, captions, everything that you’ll have access to. So you never have to actually reprocess an entire file just because a single word may need to be changed.
While we’ve built many tools that are self service or automated, much of our success as a company really is based on fact that we give our customers a lot of attention. We expect to walk people through the process, through the account tools. We enjoy building those relationships. And it’s really through those conversations that we learn about what other features, what other platforms to connect to, all of the different tools that we’ve built– that’s how we find out what’s worth developing. So we really do value feedback.
So real quick, a little bit about how the process works, and how you can get started. Getting an account set up is very quick. Everything is very flexible, in terms of the administration of it. You can pay by credit card online. You can get an invoice sent to you. So whatever your organization works best with, we can usually accommodate. We also have a number of security measures built into the account, so you can set privileges for different types of users as well.
Once your account’s set up, certainly the next step would be upload the video content to us. There are a number of ways that you can do that. You can use the secure web uploader that is standard within the account, you can use FTP, or we have an API as well.
We’ve also built a number of integrations with the leading online video platforms and lecture capture systems, so they’re usable right out of the box. And that includes things like Brightcove, Mediasite, Kaltura, Ooyala, Echo360, and a number of others. So if you’re using one of these platforms, the process is much, much easier.
And we really aim to make the captioning workflow as unobtrusive as possible. That’s what this is all about. That’s why we offer these integrations. So we give you the ability to automate as much of the work flow as possible. And really, all the captions and tools are compatible with as many different web video players as we can make work.
And another thing to note is that the account system is completely web based. There is no software to install to make this work.
Captions and Transcript Outputs
So after you’ve uploaded the content, it’ll go through processing, and then you’ll have access to your captions and transcript files for download. The standard turnaround that we offer is four business days. For more urgent work, we also have one or two business day options, so that you always have that option on any upload to choose between.
When the files are complete, you’ll receive an email alert, so you know exactly when to come in and download your files. You’ll also be able to see a status update at any time if you do log in.
And then you can download any format of caption or transcript file what we offer. You can do that as many times as you want, whenever you want. And certainly, if you need to delete files after processing, we can do that as well. So if you want to make sure that the content is completely locked down, we have ways of doing that.
And certainly, while logged in, once the files are complete, you can access that editing interface that I mentioned, and make any changes. And like I said, any changes you make will be available for download immediately.
There are a number of other features in the account system that are helpful through this process. We can talk about that separately. We’ll focus on the captioning pieces right now.
We also, I should note, are keeping an eye on some of the emerging standards for HTML5 that the W3C is discussing. And so we’re involved in some of those discussions, and we’ll be tracking any format changes that we need to make.
The captions plugin is a free tool that I mentioned before, which allows you to add closed captions or multilingual subtitles to almost any video. It works with a number of video players that don’t support captions, as I mentioned, such as Vimeo. It also makes your video searchable and SEO friendly. So to install the captions plugin, you’re just inserting an embed code into the web page. The plugin will automatically communicate with that video player you’ve embedded, and then the captions data and search are all hosted by us, by 3Play Media, but you can even self host it as well if you want. So it’s really, really easy to install it and get going.
The captions plugin does work out of the box with a number of different video players. I think there are at least a dozen different video players that it already works with. Again, things like Brightcove, YouTube, Kaltura, Vimeo, JW Player, Flowplayer, Wistia, Ooyala. There are several others.
The caption plugin does make the content searchable as well, as I mentioned. What you’ll see on the right there is a little magnifying glass button. So you can actually search the captions text, to then navigate through the video based on the text of the captions.
And as I mentioned, the captions plugin does support multiple languages. So if you do go through the process of creating multiple language tracks, it’s very quick to add those tracks here as well. And what you see is an option to switch between languages when they’re available.
All right. So we’ve included a couple URLs here for you to actually reference in the future, if you have any questions or if you’d like to gather a little more information about us, or about the process. We have quite a bit of information just about the process in general– not necessarily even about using 3Play Media, but just captioning in general. So I’d encourage you to check that out. We also have a number of video tutorials about the process under that How it Works page, so I definitely would encourage you to take a look at that.
We’re going to take just a minute to gather some of the questions. If you do have a question, now is a great time to type it in. And we will be back in a second to start answering some of the questions.
All right. Some great questions here. So let’s start with the transcription and captioning process that we’ve created. It is a bit unique, so it is a little different. What we do is we take the content, we put it through speech recognition. We then edit that content, so we’ve actually built an editing platform, if you will, where someone is matched with the content, and they work on it. They essentially edit the existing transcript. They add the speaker identification, punctuation, correct all mistakes left by the speech recognition engine, add some of those non-verbal cues that I mentioned as well. And all that’s done in this one editing process, where the human goes through the entirety of the file.
That’s the second step. The third step is then another QA check, where someone will go through again, looking for any possible mistakes. We have a way of identifying possible spots to check. And so that’s kind of the three step process that we have. So it’s speech recognition, editing, QA.
The speech recognition portion is something that’s often asked about. Basically, what we see– because we deal with a number of different types of content– what we see is an accuracy rate usually in the 60s percentage, so say 65%, 66% accuracy, which essentially means a third of the words are wrong. So that’s pretty important to keep in mind, because that basically means that for closed captioning, even for SEO, that’s a lot of inaccuracy to be considered even close to acceptable. So we wouldn’t ever use that alone.
The instances where we see extremely high accuracy rates for speech recognition are professionally trained speakers with professional recording equipment and single speaker, no background noise whatsoever, and probably reading off of a script. Those are the cases where you see 90%, 95% accuracy. But it definitely tops out. Because the speech recognition engine itself has to be trained on a certain voice for it to be really accurate if those conditions don’t exist wonderfully.
So as you might imagine, lectures, even produced video content, as soon as you add a second speaker or sound effects, that accuracy rate goes down. So that’s what we’re really trying to account for, we’ve built in a number of tools to improve some of that output, but there’s a limit to that. Our focus is really how can we make the editing process easier and more efficient for a human to correct it properly.
One thing to clarify in terms of what we offer. Live versus recorded content– we don’t offer any live captioning. We only offer captioning for recorded content. Live captioning is done completely manually by what is usually a trained court stenographer, so someone who is typing very, very fast. That’s why you sometimes see some inaccuracies with live content. But those people are very specially trained, and are very good at what they do.
There are pros and cons to that. If you’re putting that content up online after the fact, it might not be appropriate, because there are some time delays sometimes, and it may be a little hard to accept as accurate captioning once the content’s recorded. But certainly for live, it’s a very good solution. But it’s not something we offer.
A number of questions about translation in multiple languages. We offer the ability to translate once we’ve created the English captions or transcripts. We have a process to do that, so that the times synchronization is kept intact and everything. What we don’t offer, though, is transcription of non-English content. We really only can handle content that starts in English.
In terms of what we offer with the captions formats and transcripts, we showed a screen where there are a number of different formats, we charge based on the duration of the content. It’s a one time fee per file that then includes all the interactive plugins that we’ve mentioned, and all of the output formats that you saw there. So you could download the same captions file five times. You could download every single format. You’re not going to pay any extra. It’s really just that one time fee for the processing.
We do support a number of video players and platforms that we may not have mentioned specifically in this webinar, such as Camtasia, Panopto. We do have some documentation on those on our website under the How To section or How it Works section, so we do have some guides on how that works. So it’s definitely compatible with those as well as others.
Some questions about caption support, specifically with different players and the plugins. So obviously, we offer that captions plugin. Many different video players already have captioning support. Ultimately, it becomes a choice. If the captioning support is built in natively, it’s really kind of what works best for you in terms of the different functionality choices.
We’ll tell you that our captions plugin offers a little bit more functionality in terms of the actual search and things like that. But at the same time, many of the video players have built in support themselves for good reasons as well. So if you were to go to full screen size, our captions plugin would actually get cut out of that. So if that’s something you care about, maybe you’d want to use the native caption support. So it really does depend. It’s a publishing choice, and something we could definitely talk through in more detail.
Just to walk through some of the integrations, such as Kaltura, Brightcove, even Mediasite, Echo– those all are full end to end integrations, meaning you can link your video account with your 3Play account, so that the workflow is basically automated for you, so you don’t have to re-upload files. So in the case of Kaltura or Brightcove, you can actually tag files in your Kaltura account, if you will, which would send that file to us. We would transcribe and caption it, and then post the captions back to the appropriate file for you. That’s all you’d have to do.
It’s very similar with Mediasite or Echo360. There’s a quick process to link the two accounts, and then you would basically initiate a captioning request from your Mediasite account or from your Echo account. And that’ll send it to us. We would transcribe and caption it, and post it back automatically. So those end to end integrations really do make the process a lot easier.
And when I say the set up process is quick, it really is. We give you the exact instructions on how to do it. It really should take 5 to 10 minutes at most.
By default, we would host those captions files. We’re not changing the way you host the video at all, just the captions. And then there is the option to self host the captions. And we have instructions on how you would download the caption files from us, put them on your own server, and then point the plugin to those files on your server rather than on our server.
But one thing to note is we don’t offer the plugins without our transcription and captioning service. So our focus really is this captioning and transcription service. We offer the plugins as kind of an additional tool when you use our service. So we don’t offer the plugins as a standalone product.
There’s a couple questions about captions formats, especially with regards to a more standardized universal captions format. There is quite a bit of talk about WebVTT format, is what’s called. That’s still evolving quite a bit. That’s headed up by the W3C. The idea there is that standardization is a good thing. It makes it easier for people to publish captions, which means more video will have captions, which is good for everyone.
That format is very similar to a number of formats that we already work with, that are pretty common to web video. But the very, very specifics of the template are still being discussed. But certainly, that’s something we’re keeping an eye on, and even are involved in those discussions. And as soon as there is more of a decision on what that will look like, we’ll certainly offer that as well.
There were a number of questions about the accessibility laws, specifically Section 508, and when captions versus transcripts are actually required. Basically, the deciding factor is image. So if there’s an image– meaning if it’s video– then you really do need the captions. Because closed captions are timed with the video, meaning the text will convey the right meaning of what’s happening with what’s happening on screen. So you do need that timing function to go along with it.
Whereas with audio, a transcript will suffice, because there aren’t other cues that go along with the audio. It’s much more straightforward. It’s a little more one dimensional, I guess. You don’t have that aspect of needing to know exactly what’s happening on screen. So a transcript will suffice in the case where it’s audio only.
Also some questions about what if you’re starting from a transcript– which is a great question, because it certainly happens. As of right now, we only offer a full transcription and captioning service. It’s all one process. We are in the process of building out a transcript alignment service. So really, by the end of this year at the very latest, if not sooner, we’ll be able to allow you to upload a video and the transcript, we’ll synchronize those for you, and you’ll have all the same output options and access to the plugins that you currently have when we do the transcription and captioning from scratch. So that’s very much something we’re working on. It’s a great question, because it certainly happens quite often that you’d already have the text to start from.
All of our pricing– because that’s certainly been asked– all of our pricing is based on the duration of content. We actually post our pricing on our website. I believe you have to fill out a quick form to be able to get the full breakdown of exactly how it works, meaning per minute or per hour, everything is prorated to the exact duration of each file. We also have volume discount details there and a little bit more about use of the plugins. So there’s a full breakdown that you can actually see on our website. I think what you’ll find is that for both transcription and captioning, what we’re essentially charging is basically the same as transcription from a number of other providers.
Other interesting question has come up is support for iOS devices. Our plugins actually do work with iOS devices, certainly the iPad. Unfortunately, the iPhone makes things a bit more difficult, since it defaults to a full screen QuickTime player.
We do offer a captions format that’ll work with iOS video. As of right now, it is a bit more complicated in terms of how to publish that. We do have a guide that’s very detailed and walks you through exactly how to do that. But there is talk, also, amongst Apple and other video platforms about how to make that process easier. Certainly, part of that CVAA law does have a piece about caption requirements and support in devices. So we should be seeing more from Apple on that soon.
We are actually a bit over time. I apologize. We are absolutely available to talk more if you have other questions. I’m sorry if we didn’t get to everything. Thank you all for joining us. Thank you for your questions. Please do feel free to reach out if you have other questions. We’re happy to talk.