Editing Auto-Captions, Auto-Captions Only, or Re-Captioning

August 25, 2017 BY PATRICK LOFTUS
Updated: February 2, 2021

Nobody’s perfect. Not even computers…

While artificial intelligence still isn’t that great at understanding human speech, reliance on automatic captioning is pervasive. In fact, YouTube recently announced that their auto-captioning feature, which relies on articial intelligence for speech recognition, has added automatic captions to over 1 billion videos.

Automatic captions, or auto-captions, rely on automated speech recognition software (ASR) to convert spoken audio into text on-screen. The problem is that this kind of technology cannot be relied on alone to achieve captions and transcripts that represent an accurate reproduction of the spoken audio, thereby excluding people who cannot hear it.

For our multi-industry report, the 2017 State of Captioning, we surveyed over 1400 people and asked how captioning is done at their employer’s organization.

One of the questions we asked in our survey was, “Do you use automatic captions?” Data from the results yielded the chart below:

Column chart showing use of automatic captions: 50.97% don't use automatic captions; 26.68% do, but with cleanup; 18.81% do for some videos; 3.53% do for all videos.

Most organizations (50.97%) do not use auto-captions at all. Of the remaining 49.03% of organizations that do use auto-captions, more than half (26.68%) clean them up in post-production. However, a shockingly high 22.23% of respondents said their organization relies on auto-captioning only for all or some of their videos.

To give organizations an idea of which process works best for their needs, we’re going to take a look at the major pros and cons of publishing videos with “cleaned-up” auto-captions, relying on auto-captions only, and re-captioning videos entirely in post-production.

Editing Auto-Captions

The most efficient captioning processes combine both speech recognition and human editing.

Speech recognition software is often free or very inexpensive. YouTube is probably the best-known source of auto-captions because it automatically adds them to user-uploaded videos and has a free interface that allows users to edit those captions for timing, spelling, and punctuation accuracy.

Image: YouTube’s free auto-caption editing interface.

If one wanted free, accurate, high-quality captions, using YouTube they would have to:

Upload all their videos to YouTube.
Let the ASR produce a rough transcript.
Go into the editor and clean up the spelling, grammar, punctuations, and speaker labels.
Edit the time codes to align caption frames with the spoken audio.
Boom. Voilà. Free closed captions! Repeat steps 1 through 3 for each video.

So, when planning on using a workflow that relies on editing auto-captions, consider these pros and cons…

Pros:

Free in most cases
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Editing can be a huge time commitment
All your videos will have to be on YouTube

Using Auto-Captions Only

As stated earlier, auto-captions are free. That is why they are so attractive and why 22.23% of people’s organizations use them as a video captioning solution.

They are also infamously inaccurate. Not only is it frustrating those of us who cannot or do not want to hear the audio, but it could also put your organization in a lot of legal trouble.

Organizations that rely exclusively on ASR to caption their videos risk excluding deaf and hard of hearing users who want to watch that video content, too. Harvard and MIT are currently engaged in a lawsuit brought about by the National Association of the Deaf (NAD) for using inaccurate captions (made using only ASR) on their online course videos.

So, when planning on using a workflow that relies solely on auto-captions, consider these pros and cons…

Pros:

They’re free
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Proper spelling, grammar, punctuation, time-synchronization, and speaker labels will be lost
You exclude those who cannot hear and those who do not want the audio on
You can be put your organization at legal risk for discriminating against deaf and hard of hearing users

Re-Captioning in Post-Production

This method works best if you’ve used live captions — which usually don’t have time stamps to align with the spoken audio — or have a video publishing process that isn’t compatible with using YouTube’s auto-captioning feature every time.

Editing Live Captions

There are some great live captioning services out there, and many of the good ones are very accurate. However, using a live captioning service and editing those captions in post-production is often expensive and fraught with difficulties.

Column chart showing how organizations caption a recorded video after a live-streamed event. 17.84% of organizations use live captions for the recording of a video; 30.48% edit and republish the live captions; 51.67% get them recaptioned.

Chart representing answers to the question, “Do you re-use live captions?”

Because live captions don’t translate well in post-production, most organizations (51.67%) get video recordings of live events re-captioned.

So the best two options are to either caption your videos in-house, or outsource the work to a premium captioning vendor to ensure quality standards are met.

So, when planning on using a workflow that relies on re-captioning in post-production, consider these pros and cons…

Pros:

Highest caption accuracy possible
With captioning vendor, significant time is saved
Peace of mind knowing your caption quality will be consistent

Cons:

Not free if using a captioning vendor
Heavy time commitment for in-house captioning

Captioning needs are increasing year after year.

So, for organizations that want to both accommodate the needs of those who watch video without sound and those who cannot hear, it is in their best interests to seek the most efficient and scalable captioning solution that works for their purposes. Whether it’s in-house, through a third-party service, or a combination of both, what matters most is that your process produces captions that are accurate and high-quality.

—

Read our full report, the 2017 State of Captioning.

If you’re looking for a vendor that leverages both speech recognition and human editing to produce high-quality captions, check out our pricing today!

ADA Update: Title II’s Final Rule Clarifies Captioning and Audio Description in Higher Education

by Elisa Lewis in Video Accessibility

ADA Title II Revisions: What You Need to Know [Free Webinar] The US Department of Justice’s (DOJ) final rule on Title II of the Americans With Disabilities Act (ADA) brings much-needed clarity for public universities and community colleges regarding web content and…

Updated July 11, 2024

Integrated Video Publishing Solutions for Higher Education

by Kelly Mahoney in Video Accessibility

[Free webinar] The 3Play Way: AIM Integration for Real-Time Captioning In today’s world, efficiency is everything. If you’re in charge of technical implementation or managing student accommodations in higher education, you already have enough on your plate. The last thing you need…

Updated July 2, 2024

Press Release: 3Play Media Study Reveals Automatic Speech Recognition (ASR) Engines are Fine Tuning After a Year of Massive Improvement

by Elisa Lewis in Industry Trends

June 20, 2024 09:59 AM Eastern Daylight Time 2024 State of ASR Report BOSTON–(BUSINESS WIRE)–After a year of profound improvement in accuracy, ASR providers are doubling down on improving the accuracy of their solutions and focusing on their differentiation, according to the…

Updated June 24, 2024

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.

Product

Why 3Play?

Learn

Company

Further Reading

ADA Update: Title II’s Final Rule Clarifies Captioning and Audio Description in Higher Education

Integrated Video Publishing Solutions for Higher Education

Press Release: 3Play Media Study Reveals Automatic Speech Recognition (ASR) Engines are Fine Tuning After a Year of Massive Improvement

Subscribe to the Blog Digest