The “Human Touch” in Live Captioning Ensures Accuracy & Accessibility

November 18, 2021 BY KELLY MAHONEY
Updated: December 6, 2021

Closed captions are an important factor in making video accessible for all audiences, including live streamed events. After the pandemic gave rise to a renewed popularity of virtual events, more organizations are looking for live captioning solutions – but keep in mind, not all live captions are created the same. Almost more important than providing live captions, it’s crucial to provide a high level of accuracy in live captioning.

What are live captions?

Live closed captions are time-synchronized text that appears in real time and can be used for a multitude of reasons, such as virtual events, meetings, online classes, or performances.

It’s important that virtual events provide live captions in order to create a fully accessible experience for those who may be d/Deaf or hard of hearing, but captions have the added ability to improve audience engagement and overall comprehension. Additionally, the business benefits offered by captions hold the potential to boost search engine optimization and enhance the user experience.

However, to properly harness the benefits of live captions they must be accurate. The accuracy rate of live captioning often depends on the method by which they were created, which could be through the use of automatic speech recognition (ASR) or a human captioner.

Why you need live (human) captions to stream events ➡️

Automatic vs. human-generated captions

There are several different ways to incorporate live captions, but the two most popular are live automatic captioning and live human captioning.

Live automatic captioning

While automatic captions are more readily available and less expensive (generated through popular meeting platforms like Zoom), their accuracy rates are notoriously low.

Live automatic captions do not involve a human captioner and are written using artificial intelligence (AI) like ASR. Because of this, the likelihood of errors in punctuation, speaker identifications, and grammar greatly increases. In addition, AI doesn’t have the same capacity for contextualization as a human being – meaning that when ASR misunderstands a word, there’s a possibility it will be substituted with something irrelevant, or omitted altogether.

Errors can affect viewer understanding

Omission errors can drastically change the meaning of a sentence! Consider the following example:

“The flash flood warning for Suffolk County has been lifted.”

“The flash flood warning for Suffolk County has NOT been lifted.”

However, industry standards find it acceptable to omit stammers, false starts, and filler words like ‘um’ or ‘ah.’

While there is currently no definitive legal requirement for live captioning accuracy rates, existing federal and state captioning regulations for recorded content state that accessible accommodations must provide an equal experience to that of a hearing viewer. This condition, coupled with their tendency toward low accuracy means that live automatic captions alone are not sufficient to provide an equitable experience for d/Deaf or hard of hearing viewers.

Learn more about accessibility laws 📑

Live human captioning

By comparison, live human captioning is significantly more accurate and reliable. While neither AI nor human captioners can provide 100% accuracy, the most effective methods of live captioning incorporate both in order to get as close as possible.

There are two primary ways to include humans in live captioning workflows: CART and voice writing. Communication Access Real-time Translation (or CART for short) employs a skilled transcriber operating a stenotype keyboard to produce captions in real time, or as close to it as possible. The process of voice writing, on the other hand, consists of a few more components:

The original speaker at the live event
A highly trained voice writer
Specially tuned ASR software

Whichever method is used, the human touch is irreplaceable in producing accurate, real-time captions. Once again, the lack of common standards for live captioning makes appropriately measuring accuracy a somewhat subjective endeavor. However, there is a generally accepted formula used to evaluate accuracy:

Calculating caption accuracy

Accuracy = (Total # words captioned – Incorrect words captioned) / Total # words captioned x 100

For example, if a captioner writes 10,000 words during a live event and 200 of those words were incorrect, the resulting accuracy rate would be 98%.

While this is a useful working definition, the number of “incorrect words captions” in this equation doesn’t account for punctuation mistakes, words that are omitted, or substitutions. As seen in the example above, these types of errors can impact the understanding of a d/Deaf or hard of hearing viewer. To remedy this oversight in calculation, the FCC’s Report and Order on closed captioning quality specifies:

[We should] consider “accuracy” of captions to be a measurement of the percentage of correct words out of total words in the program, calculated by subtracting number of errors from total number of words in the program, dividing that number by total number of words in the program and converting that number to a percentage.Federal Communication Commission

No matter how you calculate accuracy, it’s undeniable that human involvement in your live captioning workflow increases your chances of producing accurate, comprehensible, and engaging captions at your next virtual event.

Want to learn more about live captioning?

Why You Need Live (Human) Captions to Stream Events

by Kelly Mahoney in Uncategorized

We believe in using closed captions to create accessible content for all – and live events are no exception! Born out of necessity during the pandemic, streaming events quickly became the new normal due to the convenience and flexibility offered by virtual alternatives. …

Updated December 9, 2021

U.S. Laws for Video Accessibility: ADA, Section 508, CVAA, and FCC Mandates

by Rebecca Klein in Video Accessibility

Are you breaking video accessibility laws and don’t even know it? If you produce or distribute videos in the United States, your content may be subject to federal regulations regarding accessibility for people who are d/Deaf and hard of hearing. Accessibility laws…

Updated August 30, 2023

Dog Training and Machine Learning: What They Have In Common

by John Slocum in Industry Trends

Although sometimes it seems we’re eerily close, machines haven’t replaced us yet. Yes, machines can make faster and more complex decisions, but it’s pretty easy to break one. Also, machines still can’t process logic that they haven’t been taught. Try out some…

Updated May 19, 2021

Localization

Accessibility

Platform

Further Reading

Why You Need Live (Human) Captions to Stream Events

U.S. Laws for Video Accessibility: ADA, Section 508, CVAA, and FCC Mandates

Dog Training and Machine Learning: What They Have In Common