The “Human Touch” in Live Captioning Ensures Accuracy & Accessibility

November 18, 2021 BY KELLY MAHONEY
Updated: December 6, 2021

Closed captions are an important factor in making video accessible for all audiences, including live streamed events. After the pandemic gave rise to a renewed popularity of virtual events, more organizations are looking for live captioning solutions – but keep in mind, not all live captions are created the same. Almost more important than providing live captions, it’s crucial to provide a high level of accuracy in live captioning. 

What are live captions?

Live closed captions are time-synchronized text that appears in real time and can be used for a multitude of reasons, such as virtual events, meetings, online classes, or performances. 

It’s important that virtual events provide live captions in order to create a fully accessible experience for those who may be d/Deaf or hard of hearing, but captions have the added ability to improve audience engagement and overall comprehension. Additionally, the business benefits offered by captions hold the potential to boost search engine optimization and enhance the user experience. 

However, to properly harness the benefits of live captions they must be accurate. The accuracy rate of live captioning often depends on the method by which they were created, which could be through the use of automatic speech recognition (ASR) or a human captioner. 

 Why you need live (human) captions to stream events ➡️  

Automatic vs. human-generated captions

There are several different ways to incorporate live captions, but the two most popular are live automatic captioning and live human captioning.

Live automatic captioning

While automatic captions are more readily available and less expensive (generated through popular meeting platforms like Zoom), their accuracy rates are notoriously low

Live automatic captions do not involve a human captioner and are written using artificial intelligence (AI) like ASR. Because of this, the likelihood of errors in punctuation, speaker identifications, and grammar greatly increases. In addition, AI doesn’t have the same capacity for contextualization as a human being – meaning that when ASR misunderstands a word, there’s a possibility it will be substituted with something irrelevant, or omitted altogether.

 

Errors can affect viewer understanding

Omission errors can drastically change the meaning of a sentence! Consider the following example: 

“The flash flood warning for Suffolk County has been lifted.”

“The flash flood warning for Suffolk County has NOT been lifted.”

However, industry standards find it acceptable to omit stammers, false starts, and filler words like ‘um’ or ‘ah.’

While there is currently no definitive legal requirement for live captioning accuracy rates, existing federal and state captioning regulations for recorded content state that accessible accommodations must provide an equal experience to that of a hearing viewer. This condition, coupled with their tendency toward low accuracy means that live automatic captions alone are not sufficient to provide an equitable experience for d/Deaf or hard of hearing viewers. 

 Learn more about accessibility laws 📑  

Live human captioning

By comparison, live human captioning is significantly more accurate and reliable. While neither AI nor human captioners can provide 100% accuracy, the most effective methods of live captioning incorporate both in order to get as close as possible.

There are two primary ways to include humans in live captioning workflows: CART and voice writing. Communication Access Real-time Translation (or CART for short) employs a skilled transcriber operating a stenotype keyboard to produce captions in real time, or as close to it as possible. The process of voice writing, on the other hand, consists of a few more components:

  • The original speaker at the live event
  • A highly trained voice writer
  • Specially tuned ASR software

Whichever method is used, the human touch is irreplaceable in producing accurate, real-time captions. Once again, the lack of common standards for live captioning makes appropriately measuring accuracy a somewhat subjective endeavor. However, there is a generally accepted formula used to evaluate accuracy:

While this is a useful working definition, the number of “incorrect words captions” in this equation doesn’t account for punctuation mistakes, words that are omitted, or substitutions. As seen in the example above, these types of errors can impact the understanding of a d/Deaf or hard of hearing viewer. To remedy this oversight in calculation, the FCC’s Report and Order on closed captioning quality specifies:

[We should] consider “accuracy” of captions to be a measurement of the percentage  of correct words out of total words in the program, calculated by subtracting number of errors from total number of words in the program, dividing that number by total number of words in the program and converting that number to a percentage.<span class="su-quote-cite">Federal Communication Commission</span>

No matter how you calculate accuracy, it’s undeniable that human involvement in your live captioning workflow increases your chances of producing accurate, comprehensible, and engaging captions at your next virtual event.


Want to learn more about live captioning?

The lowdown on live captioning: understanding live captioning quality with link to download guide

3Play Media logo

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.



By subscribing you agree to our privacy policy.