« Return to video

Live Professional Captioning Launch Party [TRANSCRIPT]

KELLY MAHONEY: Welcome to everyone here. My name is Kelly Mahoney, and I’m on the marketing team here at 3Play Media. I want to thank everyone for joining us today to celebrate the launch of our live professional captioning service. We’re very excited to have you here.

And we’ll just go ahead and jump right into it. We have likely all heard by now that there has been a global pandemic happening. And live virtual events have exploded in popularity across industries as a result. And while this has been a very exciting and very convenient phenomenon, many organizations are still struggling to make these events completely accessible to all audiences.

So to better understand what kinds of barriers organizations face when it comes to live captioning, we’ve been listening to customer feedback and frustrations for the last year and a half. And for the same amount of time, we’ve been perfecting our product and working on a solution, which we’re very excited to share with you today.

So with that enthusiasm, we’re going to go ahead and dive into our presentation. Now it is my pleasure to introduce our keynote speaker for today’s event. Jill Brooks is the former president and chief operating officer of the National Captioning Institute and is still an all out accessibility trailblazer. At 3Play, she’s been instrumental in leading the development of our live professional captions. So without further ado, please join me in welcoming 3Play’s senior director of live operations, Jill Brooks.

JILL BROOKS: Thank you, Kelly. I’m really happy to be here today to talk about the future of live closed captioning. But before we get into that, I’d like to take a look at live captioning’s past and present.

And as Kelly mentioned, I spent nearly 25 years at the National Captioning Institute, which was a trailblazer, being the first to perform real-time closed captioning 40 years ago, all the way back in 1981. And remembering back then obviously was a very different time technologically, there was not a way to instantly transport information the way that we are so used to now.

The closest way to do that perhaps and the most newfangled at the time, arguably, was the fax machine. So not surprisingly real-time captioning was a very labor-intensive process. Trained stenographers underwent additional months and years of training to learn how to caption. They would listen to the audio portion of the broadcast and stroke or write what they heard on a specialized steno machine.

And for those of you who aren’t familiar, a steno machine has 22 keys that are stroked in various combinations, similar to piano chords, that represent phonetic sounds. The phonetic steno input would be turned into English caption data and transmitted through a complex technical center that resembled maybe what you would think of as an olden day operator switchboard.

And that was connected via telephone lines to the encoder at the broadcast site. The caption data would then be sent out on line 21 of the television signal. So great, live captions have been created. They have been sent. But how could the end user see them? Well, they needed one of these decoder boxes. You can see here in the slide. It looks kind of like a set-top box, an old cable set-top box.

And you could just go down to your local Sears and buy one, bring it home, wire it up to your TV set, making sure not to disconnect your actual cable box or your VCR in the process, and there you go. You have access to captions, but only on a limited basis. You would also have to check in the TV guide to see which programs actually had closed captioning because they were very few at that time.

And in the spirit of full disclosure, I have to say I had one of those decoder boxes, and I never got it to work. So I will blame that on user error.

And although my description of this was a little bit rudimentary, at the time, it was cutting edge and really allowed for media accessibility where none had existed before. And that was essentially how live captioning was done for the next 20-plus years. There were some advances. Most notably was the development of the decoder chip that essentially shrunk that set-top decoder box down into a chip that was eventually required to be built into all television sets.

There was also a federal mandate that required the vast majority of broadcasts to be captioned, including live broadcasts. And it was that mandate that precipitated the first big change in how real-time captioning was produced. In the early 2000s, several things were happening in the industry all at once.

Demand for live captioning was increasing sharply. And enrollment and graduation rates from steno court reporting schools were falling. So it boiled down to a supply and demand issue in two different ways. There was a lot of demand for trained or trainable steno captioners to keep up with the increasing supply of programming. And yet while that would indicate normally that rates for the service would increase, the opposite happened because now, due to the mandate, there were customers in the market who did not have the financial resources of the bigger companies, and yet they were required to provide captioning.

So how do you solve this problem of a decreasing labor pool and increasing demand? It wasn’t simply a matter of offering higher salaries to obtain captioners because there just were not enough to meet the industry demands. So interestingly, at that time, there was a brand new technology emerging, voice recognition technology.

And again, to set the landscape, this technology was in its infancy. And we could probably fill up the rest of our webinar time today with funny stories of the mistakes that Alexa or Siri have made just today. So you can imagine the state of voice recognition technology 10 years before Siri or Alexa even existed.

So nevertheless, how we solved the supply and demand problem at that time was to investigate the possibility of using voice recognition technology for real-time captioning. And this was a challenge that I took on personally in my career. And to make a very long story short, what I found was, yes, voice recognition technology could be used to create live closed captioning of a similar quality as steno. And here’s how we did it.

We started with a user-dependent speech recognition software and speaker-dependent, meaning that the software was trained to an individual voice and manner of speaking. And through repetition or reinforcement or correction of the chosen words, it would help to improve the recognition accuracy of the system over time. And that refinement of the voice profile in the software was ongoing and never-ending.

But the software was only one component. The person also had to be trained to do what is called voice writing, which is a methodology I developed for creating real-time captions using voice. Live voice writing is similar to steno captioning in that the captioner listens to the audio of the broadcast or the event at the same time that the audience hears it, but instead of keying what they hear onto a steno machine, the voice writer repeats every word into a microphone into a specialized software. And from there, the caption data goes on its journey and is rendered as text on the screen.

So a voice writer is listening, speaking, editing, punctuating, reading, and correcting simultaneously. And all of this is happening from the time the word is spoken to the time that you see the word appear on the screen just a few seconds later. The introduction of voice writing as a way to create real-time captioning represented a true paradigm shift in the industry, which had been operating the same for over 20 years.

It helped to solve the supply and demand problem by creating a much larger potential labor pool than trained stenographers. And thus it expanded accessibility. People with various backgrounds could be trained in the discipline of voice writing.

But speaker-dependent voice writing also had its problems. As I mentioned, training the software to an individual person’s voice was time-consuming. And even then, the word recognition accuracy was not 100%.

So voice writers were trained on ways to improve upon it by adding words to the software’s vocabulary and creating fake words to represent terms, phrases, names, prefixes, suffixes. So you can see on the slide, there are some actual real examples of this technique of what was spoken into the system in order to create the desired sentences and listening to them was as if they were speaking a foreign language.

So these techniques improved recognition accuracy, but it created another problem. Voice writing was being performed by a group of people who were highly trained in a unique skill and using specialized software. Sound familiar?

This solution was just not scalable enough. So you might be wondering, what about ASR? Automatic speech recognition. You know, I’ve seen that before. It seems OK. And you’re right. ASR certainly has a role to play in captioning, but it is not the solution.

ASR is a good option if and when human-produced captions are not available and depending on the audience and type of event that needs captioning. In terms of providing accessibility, providing an equal experience, it simply is not robust enough at this time to achieve that.

So there are some notable differences between ASR and human captioning. On a positive note, ASR will caption a word for every utterance, which human captioners cannot do, particularly with fast-paced dialogue. Instead, they will omit certain unimportant words in order to preserve or present the most comprehensible captioning possible.

On the flip side of that, ASR will caption a word for every noise, whether it is the spoken word or not, whether it is relevant or not. Whereas a human captioner can convey important non-speech audible information, such as laughter, applause, whistle, buzzer appropriately to increase the comprehension of what the viewer is seeing. And obviously any irrelevant noises are just not captioned.

Some ASR systems will insert random punctuation. Others provide no punctuation at all, leading to one never ending run-on sentence. And human captioners can insert proper punctuation to, again, enhance the user’s understanding of what is being said.

With the exception of some few meeting platforms, ASR cannot detect who is speaking. Human captioners will indicate every time there is a change of speaker and attempt to identify that speaker by name. But the most notable difference may be that ASR recognition accuracy plummets in situations with multiple speakers talking over one another, a lot of background noise, speakers with accents, any type of non-speech audible information. Whereas human captioners can use context to discern what is being communicated in those situations.

And finally, latency, or the time between when the word is spoken and when it appears as a caption on the screen, is generally lower for ASR than for human captioning. But now that you understand all that the human captioner is doing in that time, I think that that makes sense.

So the biggest difference I see between ASR and human captioning is comprehensibility. I have a slide here of an example of actual captions of a news broadcast clip. One was captioned by a human captioner and the other with an off-the-shelf ASR system. I should point out this was not produced by 3Play’s auto captioning system, which consistently measurably outperforms the one that produced this example.

The speaker in this clip had an accent and was speaking extemporaneously, so did not speak in complete, well-constructed sentences. And I think this serves to illustrate how the ability of the human captioner to add proper punctuation and employ judicious editing can greatly increase the comprehensibility of captions.

So here we are another 20 years later, and although there have been some advancements in that time, like the ability to connect to clients VIP, federal closed captioning quality mandates, for example, not much has changed in terms of how live captions are ordered, created, and delivered. Live captions are produced by steno, voice writing, and to a lesser degree ASR. Many captioning companies are finding it difficult to train captioners fast enough to keep up with the demand.

And the captioning process is kind of clunky and outdated, usually requiring customers to call or email their coverage requests into the captioning company. And then they’re left to wonder or worry if their event will be covered or to wonder and worry whether a technical glitch anywhere in the delivery pathway will leave their program without captions and worry that they may face an ADA complaint or even an FCC fine.

Meanwhile, as Kelly said at the beginning, content is being produced at a dizzying pace. So it sounds like it’s time for another paradigm shift to me, which is why we are all here today at the LPC launch party. And it’s why I’m so excited to have joined the team here at 3Play to help develop the live professional captioning service.

3Play’s unique approach to live captioning leverages technology to simplify the process through the entire lifecycle for both customers and captioners, while delivering the quality of human-generated captions. My mantra at 3Play is always keep it simple– simple to order, simple to caption, simple to deliver and display live captions on nearly any platform and any device.

Simplification allows the 3Play live professional captioning solution to be infinitely scalable. And it reduces the human touch points and thus the potential failure points. I don’t want to step on Craig or Steph’s toes here, and I’m sure they will explain this in much more detail. But what is different with 3Play’s offering and what will trigger this paradigm shift is the scalability that is achieved through the automation and the simplification of the process.

Ordering is done through a Customer Portal. So no need to call or email, eliminating that potential failure point at the order taking phase. All connection information and event-specific words and terms and event instructions are input with the order and automatically inserted into the captioner’s job page, from which the connections are made, again, automating the process and eliminating potential failures.

And 3Play’s live captioning tool, following the successful models used in our recorded captioning and audio description services, is intuitive and easy to use. So no need to use those nonsense words or spend hours training the system to a particular voice in order to get it to do what you want it to do.

And the human captioner is not responsible for establishing or monitoring those connections. That’s all automated. So in short, captioners can simply focus on captioning. But best of all, 3Play has solved the continuity problem. If there is an interruption in service of any kind, the captioner’s internet goes out, a power outage, an emergency of any kind, the live professional captioning will failover immediately to live auto captioning with no interruption in the service.

So simplification, scalability, increased accessibility. That is the future of live captioning.

KELLY MAHONEY: All righty. Thank you very much, Jill. So with that before we move on, we’re going to go to our next [INAUDIBLE] here. We would like to know what is the biggest difference between human-generated captions and ASR? The options are speed or latency, comprehensibility, or voice.

All right. We’ve got most of our responses in, a few more, a few more. It seems like most people have gotten it correct. I’m going to go ahead and end the poll and share the results. The correct answer was comprehensibility. So we’re glad you’re paying attention, and you’re staying engaged.

So after learning a little bit more about what live captioning is and how it works, like I said, at 3Play, we spent a lot of time figuring out exactly what barriers prevented organizations from incorporating live captions into all of their events. Ultimately, we were able to outline identifiable pain points in the traditional live captioning process, and we use this information to shape our efforts in pain management, if you will.

So now for a closer look at the solution that our product offers, I’m happy to introduce 3Play’s senior vice president of sales and account management, Craig Herman.

CRAIG HERMAN: Thanks, Kelly. I really appreciate everyone attending and engaging here on this Wednesday afternoon. So as Jill has kind of kicked this off and has gone deep into where we have come from and where we are going, really what I hope to do here is I bridge Jill and Steph– Steph is going to dive into the actual product and show how you order and how our platform works. What I really want to just kind of talk about is really mostly around the why.

What did we see– 3Play has been around for 13 years. And what did we see in the live market where we really felt we could make a difference? And so we started to really go out to everybody that’s here on these calls and a lot of our other customers and started to get some feedback on machines versus humans.

So basically, the overall preference when we asked folks live auto versus human, what was the preference? And it predominantly came back, over 49% with human-produced live captioning.

So there’s definitely a strong preference. And this was, again, early– or in 2020. And so we started working through what would we be able to put together based on what we’ve done with audio description, what we’ve done with pre-recorded captioning and things along those lines, and make this a product that could really disrupt the market.

And when Jill was talking about scalability, that’s a huge piece. But there’s other pieces that go into it as well. And so as we started to really look at this and started to get into our beta and worked with some folks that are on this call today in the early stages, we really started to focus in on a few different areas. And I think where I’d start, we’d like to start is accuracy.

As most folks who work with 3Play, as 3Play’s known in the industry to deliver 99.7% accuracy for our prerecorded content. Now, obviously, prerecorded content, live auto captioning or live professional captioning wildly different when it comes to percentages and accuracy. But the baseline of auto captioning is really where it all starts.

So on the prerecorded side, we have been doing and captioning for 13 years. And in that, we have built up an AI that is extremely intelligent when it comes to speech recognition. So working off of all of that data over time, we’ve become a leader on the live auto captioning piece, or the ASR.

So our ASR puts us at the top at over 90%, which is extremely high. The challenge, though, is that even though it’s the highest when you look at other speech or engines, like Microsoft or IBM and others, it still kind of leaves some things out. And we kind of– what’s the difference between eating Grandma and actually having dinner with Grandma? In this case, a comma.

And this is where a small– the 5% can really have an impact, where it can change dramatically the entire context of what we’re trying to get across. So when we look at what we’re able to do by combining both our auto captioning, but also now our live professional captioning on top of that, we can actually get to a 95%.

So with 95%, this really helps us with– when you start to think about background noise for live events, which is obviously very, very common, heavily accented speakers, multiple speakers, where I’m trying to track who is speaking when and what– again, ASR is a limitation there, but also making sure that even nuanced languages we’re able to interpret and also provide them as accurate as possible.

So the difference between 90% and 95% is really where our focus is as we continue to kind of drive and really continue our quest for the most accurate captions, whether it’s prerecorded, whether it’s live. It doesn’t matter to us. We want to be the most accurate solution out there for everyone that’s on here.

So when we start to look at this, you know, what else did we see out in the market? We saw low accuracy obviously, but also we saw a few other things. One, complex and manual processes, folks emailing a week or two in advance to set up a session, a live event with a captioner, sometimes getting an email back, sometimes not. Sometimes someone shows up. Sometimes there isn’t.

And I’ve got to also kind of figure this out with a number of different providers. Many times I’ve gotten multiple live partners, live captioning partners. But I’ve also got other ones that I use for prerecorded captions. I’ve got other ones that I’m using for audio description.

So I’ve got to manage a bunch of different partners. I got to manage a bunch of different vendors. And again, that’s pretty time consuming and can be complex and manual. And then what we’ve kind of talked about is really the captioning quality.

And this was across the board when we were talking to customers trying to get a sense of where the challenges were in the space. And it is, hey, as live events, as Kelly talked about, as live events are becoming more and more and forever now probably, hybrid, where the virtual experience has to be as good as the live experience, where I’m sitting there face to face. This is now the new standard. And so having punctuation missing, having inaccurate captions is just not really accessible for right now.

So as we continue to look at this, we built out our platform specifically with this in mind. So when we look at why 3Play is different, right, and I think if you– we’ve talked a lot about accuracy and quality, but then we start to talk about a few other things. One, the ease of use, no emailing and going back and forth.

Within the platform itself, you can actually jump inside and order the same way that you’re ordering captions today for prerecorded captions, same way you’re ordering translations, audio description, things along those lines. You’re using the same interface.

And with that, you’re using the same partner. So there’s economies of scale. There is the ability to work with what I feel is the best account management team in the space on a daily basis and helping both with improving events, as well as supporting events. And then the other piece is using automation, using our ASR to provide a failsafe.

So things happen. No one is going to be 100% when it comes to making sure that there is a live captioner there or there isn’t a break in technology, the connection. Things along those lines are going to happen. But what we’ve done is actually added a layer of ASR. So our 90% ASR is going to kick in if that live captioner loses connection for whatever reason.

So to the person that’s sitting and watching, for the most part, it’s going to be kind of unrecognizable that we’re running into this. As we were kind of going through this, we have a customer that does live events. They had this exact situation happen, where basically it was in a conference that accessibility was forefront. So making sure that they had captions for the actual people at the event was a requirement.

And an API token got mistyped in. The person couldn’t get into the session. And if you’ve ever been to a live conference or a seminar, you know that they’re pretty on top of each other when coordinating time. And so in this scenario, we lost about 20 minutes waiting to try to get the person connected to be able to caption the event for everyone that was in attendance.

And as you can imagine, that pushed the entire agenda out. Now, being able to supplant ASR while you’re fixing that in the background or fixing the problem in the background would have easily taken that problem out, and everything would have continued on time.

So as we move forward here– and I’ll key up Steph– we’re super excited that we have an opportunity here to provide a solution that is really going to change the industry when it comes to live captioning and making it accessible for everyone. And one of our universities, North Idaho College, was one of our first clients as we started to really launch into live captioning. And they really gave us a great review on some of the things that they felt right out of the gate, so I think some of the things that we’ve talked about here really showed through for the folks at North Idaho.

And again, we’ve continued to launch. We’ve had great feedback as we’ve moved forward. We’ve also continued to build on the product and the scalability. So we’re super excited for 2022 to be part of this market. And I think we’ve got a really bright future as we start to work with clients.

So I’m going to stop talking. I’m going to let the product do some of the talking with Steph, who leads our engagement on the live side. And I will throw it back to Kelly.

KELLY MAHONEY: Thank you so much, Craig. Yeah, so before we move on just yet, we have another poll for you. And this time, we would like to know what is your biggest pain point when using live captioning today. Is it the scheduling aspect of things, the reliability, accuracy, customization, or captioner performance?

So that poll is live. Go ahead and tell us, what is your biggest pain point when using live captioning today? And if you have not yet used live captioning, just imagine what do you think might be your biggest pain point. All right. Let’s see.

Perfect. All right. It seems like most people have responded. I’m going to go ahead and end that poll, and I can share the results. It seems like accuracy is the top pain point for people, which makes sense because accurate captions are very important as we have just gone over.

So you have heard all about how our live captioning service works, why it’s beneficial, and what you can use it for. But if you’re anything like me, you’re a visual learner. So we’re going to show you now exactly what it is that we’ve been talking about.

So for a live demonstration of our product, I’m pleased to introduce 3Play’s senior product manager, Steph Laing.

STEPH LAING: Cool. Thanks, Kelly. Can you hear me OK? Excellent. So as Jill and Craig mentioned, we were seeing other professional captioning solutions in market that have really complex and manual processes, often requiring scheduling an event management over email, which really isn’t so scalable. And we’re also seeing frustration around the lack of transparency and reliability of these services.

So you don’t know if your event has been matched to a captioner without exchanging an email with another person. And as your event is starting, you don’t know whether your captioner is going to show up. And if the captioner is disconnected for any reason, captions are lost, and it’s really hard to recover from that.

So our goal in designing our service was to address these common issues, to provide a really easy to use, reliable, and transparent solution that can serve all of your captioning needs from live captioning all the way through post-production. And our captioning service is built on a consolidated platform, and that serves as the foundation for both delivery of our automated, as well as our professional captions.

So diving in, I’m going to attempt to share my screen. All right. Here we go. So diving in, here is our live captioning dashboard. So this is where you can see and manage all of your upcoming events and in-progress events.

So I can search and filter by platform and by date, and I can also see whether my upcoming events have been matched to a captioner, or in the case of auto captions, just whether they’ve been scheduled. And for my in-progress events, I can see the stream and caption status.

So I scheduled this test event to demonstrate what it looks like when 3Play is not receiving a stream. So these red warning signs are expected. It’s our interface telling me that I need to do something in order to get captions.

So from here, I want to dive into scheduling. And before we dive into the workflow itself, I wanted to note that all of our live scheduling workflows are also completely available via API for those looking for added efficiency. Cool.

So to review scheduling, I am in the Live Captioning section of the 3Play platform. I select Schedule Captions. And then from here, I select the platform on which I would like captions delivered.

So 3Play has integrations directly with YouTube, Zoom, and Brightcove. We also have the ability to generate captions from any stream as long as it’s in an RTMP format. So we don’t actually need to be integrated with the video platform itself in order to generate captions from it. So this could be compatible with platforms like Facebook or Twitter or Twitch.

So from here, I’m going to find my Zoom account and then select the event for which I want professional captions. So I’ll select test meeting 5. And once I choose the event, I’m going to select the service type, either auto or professional captions, which in this case is professional captions. And by selecting professional captions here, I’m actually selecting in effect both professional and auto captions.

And what I mean by that is all of our professionally captioned events have 3Play auto captions running as a persistently available failover during an event. So if the professional captioning feed is lost, 3Play intelligently fails over to auto captions. And then we regain the feed– if we regain the feed from the professional captioner, we then automatically transition back to the professional captioning feed. So at no point should you be without captions for your event.

So you’ll notice on this page that there are three start times, the event start, the stream start, and the caption start. The event start time represents the audience-facing event start, and we use this time to key off our captioner check-in time, which is 20 minutes prior to that audience-facing event start. The stream start time indicates when 3Play is going to start listening for a stream from your event.

So if you’d like to verify that everything is set up and connected appropriately prior to that audience-facing event start, we definitely recommend selecting a stream start time in advance of the event so you can get everything set up and ready to go for that event start. And then the captioning start time represents when captions will start for your event.

So for professional captions, that is the same as that audience-facing event start. But we do train our captioners to check in 20 minutes prior and to deliver test captions prior to that event start time. So on the next page, we’re capturing all of the relevant detail that we need for our partners to be able to accurately caption your event.

So this estimated event duration lets our questioners know how long your event is. At the end of the estimated event duration, you can elect to continue with professional captions, or you can elect to continue with automatic captions for any over 10 minutes. And then the event type here, this helps our captioners prepare for the format of the event.

So for example, if it’s a sporting event, our captioners know to probably expect fast speech, and they might study a team roster leading up to the event. And under event instructions, this is really a place for you to note any context around the event that would be helpful for our captioner to have prior in preparation. So this might include information like speaker notes, number of speakers, or even anticipated audio quality. And the speaker names here will automatically populate shortcuts for our captioner in the captioning interface.

Word lists added here serve two functions. They prime the speech engine that generates our auto captions that are persistently running in the background for our professionally captioned events. These words also populate in our captioning interface, and it gives our captions an opportunity to study those words and also to create shortcuts for them as needed within our captioning interface. And the goal really is to ensure that the words that you care about most are captioned accurately.

And last but not least, we have some advanced settings. These govern general captioning latency, along with some failsafe settings that govern how long we wait for your stream to start, how long 3Play– I’m sorry, the maximum event length in case you forget to start streaming, or rather, stop streaming, and if you become disconnected from your event for any reason, how long you’ll be able to reconnect your event. And we also have a profanity filter here that allows you to reduce the number of F-bombs in your event.

And from there, I will select a schedule. And scheduling is complete. So navigating back to my dashboard, I can see test meeting 5 is successfully scheduled. And if I click into this, I can see further detail about my event. So I can see my stream status. The event hasn’t started yet. And I can also see that a captioner has not yet been matched to my event, which is expected, given I just scheduled it.

I also have access to an external web page here. And this is a URL that I can share with my audience in advance of the event. This URL, or rather this web page, is also fully responsive. So it works on both desktop and mobile devices. So if for whatever reason, you are not seeing captions on the video platform itself, this is an excellent fail back to send to your audience to access captions.

So that’s a quick overview of scheduling. With that, I’ll hand it back to Kelly. And if you guys have any questions, we’re happy to answer those at the end.

KELLY MAHONEY: Thank you so much. Let me just reorganize myself here. Unpin you, all right. So before we move on, once again, we have our final poll. I’m just going to go ahead and get that up and running. This time, we are asking you what is the most exciting feature about live professional captioning. What are you most excited about, everything that you’ve just heard? Is it the easy scheduling, the reliable failover, the customization tools, or the in-platform visibility?

All right, seeing lots of engagement. This is exciting. I’ll leave it up for just one more minute. Seems like most people who want to respond have responded.

All right, so it seems like most people are excited about the reliable failover, which is great, because yeah, a lot of the time if you lose your captioner, it can be a huge hassle to try and reconnect. But we’ve solved that problem for you.

So we’re glad to see everyone so engaged and so excited about our live professional captions and all the features that we’re offering. And I just want to take this moment to say that if you have been viewing captions throughout this presentation, then you’ve also been watching how our product works in real time.

So if you’re interested in using something like this for your organization’s virtual events, we’re actually just going to go one slide back. And we have a link that we’re going to send in the chat to a live captioning interest form. There you can share a little bit with us about your live video practices and needs, and we’ll be in touch.

The 3Play team will reach out with next steps, which could include a live testing of the service or a walkthrough of scheduling or our account system. Whatever you need, once you’re set up, you’re good to go, and you can start ordering LPC for your events directly through your 3Play account.

So like I mentioned before, we reserved the last few minutes here or so for a Q&A session. But before we take questions, I just want to give a quick plug encouraging you to take our state of captioning survey, which we will also send the link in the chat to, as well as in the follow-up email. This is a survey that allows us to predict captioning trends for 2022 and beyond. And you could be a part of our research study. So we would love if you would help us out with that.

We’ve received some really great questions already, so we’re just going to dive right in. We encourage you to keep asking them, and we’ll get to as many as we can in the time that we’ve been allotted. One of the first ones we received was asking the best ways to live caption on social media platforms, like YouTube, Facebook, or Twitter. And I just wanted to say that a lot of platforms offer automatic solutions for live captioning, so there may be something automatically built in, which is an affordable and convenient solution.

However, this ASR software is often proprietary and is not usually market tested. So we always recommend using a solution that includes the use of a highly trained professional. Another follow-up question similar to that one was whether our solution can be integrated into a social media platform for a live event.

In this case, I believe it would be a matter of the use case, what you’re trying to do on social media, if it’s recorded content or a live event. But I will pass this on to our team, sort of open the floor for Steph, Craig, and Jill. If any of you can speak to that better than I can, please go ahead.

STEPH LAING: For sure, Kelly. I’m happy to take it. So as long as we can access the audio from your event, we can generate captions for it. And like Kelly was saying, it does depend on how you’re looking for captions to be delivered. We offer 608 captioning coding. And I know that Facebook does support 608 encoded captions.

So if we can get access to your stream, we can encode captions into that stream and pass them on. And if you’re looking for other platforms that may not be able to accept an RTMP or don’t support 608 encoded captions, we can have our captioner actually join the event audio, generate captions, and we deliver those captions to our second screen. So there are a number of different ways that we can generate and deliver captions both in player and via a second screen.

KELLY MAHONEY: Awesome. And someone else is actually asking sort of in that same vein. Is there the capability to stream to multiple platforms at one time, like to YouTube and Facebook simultaneously, or would it be charged as two separate events?

STEPH LAING: So we don’t have the ability to simulcast through 3Play. However, our platform is compatible with applications and solutions that do simulcast. So if you choose to use an application like restream.io, you can send the stream from 3Play to Restream, and then Restream can then simulcast to whatever platforms you want.

KELLY MAHONEY: All right. Another sort of technical question– Steph normally does the heavy lifting in these Q&A sessions. But someone is asking a little bit more about censorship. I know you mentioned that when you were doing the product demo. They’re wondering whether ASR or human-generated solutions, how do they interpret words that are interpreted as profanity?

STEPH LAING: Sure. We have two levels of profanity filters that we can apply to the output. If you’re interested in what words are filtered, I’m sure that we can send those to you. We unfortunately can’t add any profanity, but we can filter it out in a couple of different ways.

KELLY MAHONEY: Amazing. Another question that we’ve been getting is regarding recorded content. I’m not sure who would be best for this one, but just people curious about how quickly recorded content would be available if you use live captioning for a live event. Would it be essentially an immediate turnaround with the ability to distribute that recording to people, or would it take a little while for things to process?

STEPH LAING: So it kind of depends on where you’re choosing to save the video. I can speak to– so you can choose to save the event through 3Play, and that will allow you to upgrade to our 99.7% accurate transcription post event. If for whatever reason you forget to save the event in 3Play, you can always upload your local file to 3Play to allow that upgrade to take place.

The transcript that is generated for the live event is available within minutes of the event itself, and you can download it in a number of different formats. So if you choose to use that live transcript, should be available pretty immediately. Through our integrations with video platforms, we’re also in some cases posting that transcript back to the platform itself.

So for Brightcove, that’s an example of that. And then if you’re looking to upgrade that transcript post event, we offer a number of different turnarounds, from two hours all the way to 10 days. So depending on what you’re looking for, we can accommodate.

KELLY MAHONEY: Fabulous. Someone else is requesting if they have a particularly awesome experience with a certain captioner, are they able to request a particular captioner come back and be their regular returnee?

STEPH LAING: We don’t have a way to request that through our interface today. But if you are excited about a particular captioner, happy to get that feedback, and we can make an effort to have that captioner work on your events.

KELLY MAHONEY: Yeah. Let us know. If anyone here has an exceptionally wonderful experience with a live captioner, please let us know. We’d love to pass it on to them, but then also could maybe build that functionality out in the future. I don’t want to assign work to the development team.

Someone is asking about cost. Is live professional captioning cheaper than hiring a live captioner for an event of the same length? Craig, I know you come from the sales side of things. Would you have any insight on the cost comparison of live professional captioning? I think as a baseline response, human-involved, human-generated captions will always tend to be more expensive, but they are more accurate as well.

CRAIG HERMAN: Yeah, I think the question is really having someone on site I think might be where they’re coming in. And absolutely, because we do– we can trade off between multiple live professional captioners, and they can come in from anywhere. Typically, we’re able to do it significantly cheaper. But as Kelly said, compared to live auto, obviously there’s more of a cost when you increase because of the accuracy. Absolutely.

KELLY MAHONEY: Awesome. Thank you. Another question. Someone’s asking about if an organization has the same type of event on a regular schedule, can you repeat that type of meeting automatically and sort of set up live captions to continue happening, or would it be something you would have to manually set up each time?

STEPH LAING: This question is about support for recurring meetings scheduling, and we are in the process of building that out. It’s not available today, but definitely something that we anticipate being available in the future.

KELLY MAHONEY: Awesome. Potentially along the same lines, someone’s asking is this service available for English events. If yes, are there non-English events in the pipeline as a capability in the future?

STEPH LAING: Yes, today we support English only, but definitely have non-English on the roadmap.

KELLY MAHONEY: Absolutely. All right, another, I guess, more technical question someone asks. When requesting service, is there a way to send those supplemental materials, like notes or a word list, as an attachment rather than having to copy and paste everything into that info box?

STEPH LAING: Soon. Not yet. But yes, we do anticipate building the ability to allow you to attach files. But for now, it’s text only. People have gotten around this by uploading materials or decks to, say, Google Drive or Microsoft OneDrive, so that’s a way to not have to copy and paste and to give us access to, say, a larger file.

KELLY MAHONEY: Absolutely. And one last question about something that may or may not be on the roadmap. Is captioning for traditional broadcast on the roadmap? I don’t know if we’re expanding quite that far just yet, but we can hope. We can hope. Let’s see.

All right, and then it looks like the last question we have in our queue for Q&A. Someone asks, what kind of interfacing do you have to do with individuals from the deaf and hard-of-hearing community to understand their needs as the primary beneficiaries of captions? Honestly, I think the easiest answer here is audience outreach.

If you are able to get in touch with your audience or those members of your audience who would be using captions, just speaking to the people who are using the captions is the most beneficial part of research that you could do. We also have a lot of information published on the 3Play blog about people who use captions who are not just deaf and hard-of-hearing. People who speak English as a second language can also find closed captions very helpful, as well as students.

Transcripts can be derived from captions that are used in live events and studied later, so for lectures or things like that that students would want to go back and review. So we will be sending all of these live captioning resources that we’ve talked about in an email. You have a lot of information coming your way, so just stay tuned. But everything that you need, anything that you’ve missed will be in that email.

Thank you all very much for joining us today, and we hope you have a wonderful rest of your day. Bye, everyone.

Localization

Accessibility

Platform

Live Professional Captioning Launch Party [TRANSCRIPT]