Editing Auto-Captions, Auto-Captions Only, or Re-Captioning

August 25, 2017 BY PATRICK LOFTUS
Updated: February 2, 2021

Nobody’s perfect. Not even computers…

While artificial intelligence still isn’t that great at understanding human speech, reliance on automatic captioning is pervasive. In fact, YouTube recently announced that their auto-captioning feature, which relies on articial intelligence for speech recognition, has added automatic captions to over 1 billion videos.

Automatic captions, or auto-captions, rely on automated speech recognition software (ASR) to convert spoken audio into text on-screen. The problem is that this kind of technology cannot be relied on alone to achieve captions and transcripts that represent an accurate reproduction of the spoken audio, thereby excluding people who cannot hear it.

For our multi-industry report, the 2017 State of Captioning, we surveyed over 1400 people and asked how captioning is done at their employer’s organization.

One of the questions we asked in our survey was, “Do you use automatic captions?” Data from the results yielded the chart below:

Column chart showing use of automatic captions: 50.97% don't use automatic captions; 26.68% do, but with cleanup; 18.81% do for some videos; 3.53% do for all videos.

Most organizations (50.97%) do not use auto-captions at all. Of the remaining 49.03% of organizations that do use auto-captions, more than half (26.68%) clean them up in post-production. However, a shockingly high 22.23% of respondents said their organization relies on auto-captioning only for all or some of their videos.

To give organizations an idea of which process works best for their needs, we’re going to take a look at the major pros and cons of publishing videos with “cleaned-up” auto-captions, relying on auto-captions only, and re-captioning videos entirely in post-production.

Editing Auto-Captions

The most efficient captioning processes combine both speech recognition and human editing.

Speech recognition software is often free or very inexpensive. YouTube is probably the best-known source of auto-captions because it automatically adds them to user-uploaded videos and has a free interface that allows users to edit those captions for timing, spelling, and punctuation accuracy.

Image: YouTube’s free auto-caption editing interface.

If one wanted free, accurate, high-quality captions, using YouTube they would have to:

Upload all their videos to YouTube.
Let the ASR produce a rough transcript.
Go into the editor and clean up the spelling, grammar, punctuations, and speaker labels.
Edit the time codes to align caption frames with the spoken audio.
Boom. Voilà. Free closed captions! Repeat steps 1 through 3 for each video.

So, when planning on using a workflow that relies on editing auto-captions, consider these pros and cons…

Pros:

Free in most cases
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Editing can be a huge time commitment
All your videos will have to be on YouTube

Using Auto-Captions Only

As stated earlier, auto-captions are free. That is why they are so attractive and why 22.23% of people’s organizations use them as a video captioning solution.

They are also infamously inaccurate. Not only is it frustrating those of us who cannot or do not want to hear the audio, but it could also put your organization in a lot of legal trouble.

Organizations that rely exclusively on ASR to caption their videos risk excluding deaf and hard of hearing users who want to watch that video content, too. Harvard and MIT are currently engaged in a lawsuit brought about by the National Association of the Deaf (NAD) for using inaccurate captions (made using only ASR) on their online course videos.

So, when planning on using a workflow that relies solely on auto-captions, consider these pros and cons…

Pros:

They’re free
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Proper spelling, grammar, punctuation, time-synchronization, and speaker labels will be lost
You exclude those who cannot hear and those who do not want the audio on
You can be put your organization at legal risk for discriminating against deaf and hard of hearing users

Re-Captioning in Post-Production

This method works best if you’ve used live captions — which usually don’t have time stamps to align with the spoken audio — or have a video publishing process that isn’t compatible with using YouTube’s auto-captioning feature every time.

Editing Live Captions

There are some great live captioning services out there, and many of the good ones are very accurate. However, using a live captioning service and editing those captions in post-production is often expensive and fraught with difficulties.

Column chart showing how organizations caption a recorded video after a live-streamed event. 17.84% of organizations use live captions for the recording of a video; 30.48% edit and republish the live captions; 51.67% get them recaptioned.

Chart representing answers to the question, “Do you re-use live captions?”

Because live captions don’t translate well in post-production, most organizations (51.67%) get video recordings of live events re-captioned.

So the best two options are to either caption your videos in-house, or outsource the work to a premium captioning vendor to ensure quality standards are met.

So, when planning on using a workflow that relies on re-captioning in post-production, consider these pros and cons…

Pros:

Highest caption accuracy possible
With captioning vendor, significant time is saved
Peace of mind knowing your caption quality will be consistent

Cons:

Not free if using a captioning vendor
Heavy time commitment for in-house captioning

Captioning needs are increasing year after year.

So, for organizations that want to both accommodate the needs of those who watch video without sound and those who cannot hear, it is in their best interests to seek the most efficient and scalable captioning solution that works for their purposes. Whether it’s in-house, through a third-party service, or a combination of both, what matters most is that your process produces captions that are accurate and high-quality.

—

Read our full report, the 2017 State of Captioning.

If you’re looking for a vendor that leverages both speech recognition and human editing to produce high-quality captions, check out our pricing today!

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

by Jena Wallace in Industry Trends

3Play’s Patent Playbook blog series tells the stories behind our patented technology. Learn how 3Play Media’s Research and Development (R&D) teams are spearheading innovation in accessibility tech and creating breakthroughs in the media accessibility industry at large. Captioning Best Practices for Media…

March 22, 2024

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

by Jena Wallace in Industry Trends

Human-In-The-Loop AI Dubbing Artificial intelligence (AI)-based dubbing solutions are emerging as a way to make videos accessible globally. But can AI deliver the quality necessary? Enter human-in-the-loop (HITL), a critical part of any successful AI dubbing workflow. In this blog, we will…

Updated April 4, 2024

What Is Dubbing? Everything You Need to Know About Dubbing Videos

by Jaclyn Lazzari in Video Accessibility

3Play’s Revolutionary AI Dubbing Service Dubbing is a common practice in the film and video industry, yet many people are still unsure of exactly what it is. That’s because dubbing preferences vary significantly by country and are shaped by the cultural landscape.…

Updated April 4, 2024

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.

Product

Why 3Play?

Learn

Company

Further Reading

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

What Is Dubbing? Everything You Need to Know About Dubbing Videos

Subscribe to the Blog Digest