The “Human Touch” in Live Captioning Ensures Accuracy & Accessibility
Updated: December 6, 2021
Closed captions are an important factor in making video accessible for all audiences, including live streamed events. After the pandemic gave rise to a renewed popularity of virtual events, more organizations are looking for live captioning solutions – but keep in mind, not all live captions are created the same. Almost more important than providing live captions, it’s crucial to provide a high level of accuracy in live captioning.
What are live captions?
Live closed captions are time-synchronized text that appears in real time and can be used for a multitude of reasons, such as virtual events, meetings, online classes, or performances.
It’s important that virtual events provide live captions in order to create a fully accessible experience for those who may be d/Deaf or hard of hearing, but captions have the added ability to improve audience engagement and overall comprehension. Additionally, the business benefits offered by captions hold the potential to boost search engine optimization and enhance the user experience.
However, to properly harness the benefits of live captions they must be accurate. The accuracy rate of live captioning often depends on the method by which they were created, which could be through the use of automatic speech recognition (ASR) or a human captioner.
Automatic vs. human-generated captions
There are several different ways to incorporate live captions, but the two most popular are live automatic captioning and live human captioning.
Live automatic captioning
While automatic captions are more readily available and less expensive (generated through popular meeting platforms like Zoom), their accuracy rates are notoriously low.
Live automatic captions do not involve a human captioner and are written using artificial intelligence (AI) like ASR. Because of this, the likelihood of errors in punctuation, speaker identifications, and grammar greatly increases. In addition, AI doesn’t have the same capacity for contextualization as a human being – meaning that when ASR misunderstands a word, there’s a possibility it will be substituted with something irrelevant, or omitted altogether.
While there is currently no definitive legal requirement for live captioning accuracy rates, existing federal and state captioning regulations for recorded content state that accessible accommodations must provide an equal experience to that of a hearing viewer. This condition, coupled with their tendency toward low accuracy means that live automatic captions alone are not sufficient to provide an equitable experience for d/Deaf or hard of hearing viewers.
Live human captioning
By comparison, live human captioning is significantly more accurate and reliable. While neither AI nor human captioners can provide 100% accuracy, the most effective methods of live captioning incorporate both in order to get as close as possible.
There are two primary ways to include humans in live captioning workflows: CART and voice writing. Communication Access Real-time Translation (or CART for short) employs a skilled transcriber operating a stenotype keyboard to produce captions in real time, or as close to it as possible. The process of voice writing, on the other hand, consists of a few more components:
- The original speaker at the live event
- A highly trained voice writer
- Specially tuned ASR software
Whichever method is used, the human touch is irreplaceable in producing accurate, real-time captions. Once again, the lack of common standards for live captioning makes appropriately measuring accuracy a somewhat subjective endeavor. However, there is a generally accepted formula used to evaluate accuracy:
Accuracy = (Total # words captioned – Incorrect words captioned) / Total # words captioned x 100
For example, if a captioner writes 10,000 words during a live event and 200 of those words were incorrect, the resulting accuracy rate would be 98%.
While this is a useful working definition, the number of “incorrect words captions” in this equation doesn’t account for punctuation mistakes, words that are omitted, or substitutions. As seen in the example above, these types of errors can impact the understanding of a d/Deaf or hard of hearing viewer. To remedy this oversight in calculation, the FCC’s Report and Order on closed captioning quality specifies:
No matter how you calculate accuracy, it’s undeniable that human involvement in your live captioning workflow increases your chances of producing accurate, comprehensible, and engaging captions at your next virtual event.
Want to learn more about live captioning?
Why You Need Live (Human) Captions to Stream Events
We believe in using closed captions to create accessible content for all – and live events are no exception! Born out of necessity during the pandemic, streaming events quickly became the new normal due to the convenience and flexibility offered by virtual alternatives. …
U.S. Laws for Video Accessibility: ADA, Section 508, CVAA, and FCC Mandates
Are you breaking video accessibility laws and don’t even know it? If you produce or distribute videos in the United States, your content may be subject to federal regulations regarding accessibility for people who are d/Deaf and hard of hearing. Accessibility laws…
Dog Training and Machine Learning: What They Have In Common
Although sometimes it seems we’re eerily close, machines haven’t replaced us yet. Yes, machines can make faster and more complex decisions, but it’s pretty easy to break one. Also, machines still can’t process logic that they haven’t been taught. Try out some…