Editing Auto-Captions, Auto-Captions Only, or Re-Captioning

August 25, 2017 BY PATRICK LOFTUS
Updated: February 2, 2021

an old 1950s-style wind up robot with a speach bubble that says, I think therefore I ham
Nobody’s perfect. Not even computers…

While artificial intelligence still isn’t that great at understanding human speech, reliance on automatic captioning is pervasive. In fact, YouTube recently announced that their auto-captioning feature, which relies on articial intelligence for speech recognition, has added automatic captions to over 1 billion videos.

Automatic captions, or auto-captions, rely on automated speech recognition software (ASR) to convert spoken audio into text on-screen. The problem is that this kind of technology cannot be relied on alone to achieve captions and transcripts that represent an accurate reproduction of the spoken audio, thereby excluding people who cannot hear it.

For our multi-industry report, the 2017 State of Captioning, we surveyed over 1400 people and asked how captioning is done at their employer’s organization.

One of the questions we asked in our survey was, “Do you use automatic captions?” Data from the results yielded the chart below:

Column chart showing use of automatic captions: 50.97% don't use automatic captions; 26.68% do, but with cleanup; 18.81% do for some videos; 3.53% do for all videos.

Most organizations (50.97%) do not use auto-captions at all. Of the remaining 49.03% of organizations that do use auto-captions, more than half (26.68%) clean them up in post-production. However, a shockingly high 22.23% of respondents said their organization relies on auto-captioning only for all or some of their videos.

To give organizations an idea of which process works best for their needs, we’re going to take a look at the major pros and cons of publishing videos with “cleaned-up” auto-captions, relying on auto-captions only, and re-captioning videos entirely in post-production.

Editing Auto-Captions

The most efficient captioning processes combine both speech recognition and human editing.

Speech recognition software is often free or very inexpensive. YouTube is probably the best-known source of auto-captions because it automatically adds them to user-uploaded videos and has a free interface that allows users to edit those captions for timing, spelling, and punctuation accuracy.

a screenshot of YouTube's caption editor interface
Image: YouTube’s free auto-caption editing interface.

If one wanted free, accurate, high-quality captions, using YouTube they would have to:

  1. Upload all their videos to YouTube.
  2. Let the ASR produce a rough transcript.
  3. Go into the editor and clean up the spelling, grammar, punctuations, and speaker labels.
  4. Edit the time codes to align caption frames with the spoken audio.
  5. Boom. Voilà. Free closed captions! Repeat steps 1 through 3 for each video.

So, when planning on using a workflow that relies on editing auto-captions, consider these pros and cons…


  • Free in most cases
  • Works well with very clean, simple audio
  • Captions can be downloaded in a few common formats


  • Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
  • Editing can be a huge time commitment
  • All your videos will have to be on YouTube

Using Auto-Captions Only

screencap of YouTube video showing gibberish automatic captions

As stated earlier, auto-captions are free. That is why they are so attractive and why 22.23% of people’s organizations use them as a video captioning solution.

They are also infamously inaccurate. Not only is it frustrating those of us who cannot or do not want to hear the audio, but it could also put your organization in a lot of legal trouble.

Organizations that rely exclusively on ASR to caption their videos risk excluding deaf and hard of hearing users who want to watch that video content, too. Harvard and MIT are currently engaged in a lawsuit brought about by the National Association of the Deaf (NAD) for using inaccurate captions (made using only ASR) on their online course videos.

So, when planning on using a workflow that relies solely on auto-captions, consider these pros and cons…


  • They’re free
  • Works well with very clean, simple audio
  • Captions can be downloaded in a few common formats


  • Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
  • Proper spelling, grammar, punctuation, time-synchronization, and speaker labels will be lost
  • You exclude those who cannot hear and those who do not want the audio on
  • You can be put your organization at legal risk for discriminating against deaf and hard of hearing users

Re-Captioning in Post-Production

This method works best if you’ve used live captions — which usually don’t have time stamps to align with the spoken audio — or have a video publishing process that isn’t compatible with using YouTube’s auto-captioning feature every time.

Editing Live Captions

There are some great live captioning services out there, and many of the good ones are very accurate. However, using a live captioning service and editing those captions in post-production is often expensive and fraught with difficulties.

Column chart showing how organizations caption a recorded video after a live-streamed event. 17.84% of organizations use live captions for the recording of a video; 30.48% edit and republish the live captions; 51.67% get them recaptioned.
Chart representing answers to the question, “Do you re-use live captions?”

Because live captions don’t translate well in post-production, most organizations (51.67%) get video recordings of live events re-captioned.

So the best two options are to either caption your videos in-house, or outsource the work to a premium captioning vendor to ensure quality standards are met.

So, when planning on using a workflow that relies on re-captioning in post-production, consider these pros and cons…


  • Highest caption accuracy possible
  • With captioning vendor, significant time is saved
  • Peace of mind knowing your caption quality will be consistent


  • Not free if using a captioning vendor
  • Heavy time commitment for in-house captioning

Captioning needs are increasing year after year.

So, for organizations that want to both accommodate the needs of those who watch video without sound and those who cannot hear, it is in their best interests to seek the most efficient and scalable captioning solution that works for their purposes. Whether it’s in-house, through a third-party service, or a combination of both, what matters most is that your process produces captions that are accurate and high-quality.

Read our full report, the 2017 State of Captioning.

If you’re looking for a vendor that leverages both speech recognition and human editing to produce high-quality captions, check out our pricing today!

Pig View Pricing CTA>

3Play Media logo

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.