The Problem with Using Auto-Captions in Education

April 2, 2020 BY JACLYN LEDUC
Updated: March 30, 2021

apple on top of books

Educational institutions have a responsibility to ensure accessible learning materials for their students, including providing accurately captioned videos.* The question is, are institutions that rely on auto-captions providing truly accessible videos?

In the 2020 State of Captioning study, which measures captioning behaviors across industries, 64 percent of respondents reported using auto-captions for educational content. Yet, institutions that solely use auto-captions for recorded videos are not necessarily delivering on accessibility.

As the use of online videos in classrooms continues to grow, institutions must strive to meet video accessibility standards. So, do auto-captions meet quality standards? How accurate are they? Why might auto-captions be problematic for recorded video content?


How Accurate Captions Impact Student Learning 👩‍🎓

The Auto-Caption Dilemma

Auto-captions use automatic speech recognition (ASR) technology to transcribe audio and synchronize text with videos to create closed captions. Platforms like YouTube offer auto-captioning tools for free, which is why auto-captions often appeal to so many.

 Typically, solely using ASR to generate auto-captions for recorded videos is detrimental to the accuracy of the captions. 



At its best, ASR can typically generate captions that are about 80-90 percent accurate. Still, many conditions must be present to reach that accuracy: little or no background noise, minimal grammatical errors, limited mispronunciations, and excellent audio quality. If any of the critical audio conditions aren’t met, auto-caption accuracy can drop as low as 50 percent. That means that out of every ten words, five could be captioned incorrectly. With that low of an accuracy rate, viewers must follow along with confusing and often inaccurate information.

 The key to 99% or higher caption accuracy is human interaction. 


ASR can be a useful gateway tool for accurate captions since it does the bulk of the transcribing work. Once an auto-captioning tool generates a rough transcript for recorded video content, a human should conduct quality assurance and make necessary edits to ensure caption accuracy.

Caption Accuracy Goes a Long Way

The importance of digital media in education is increasing as more educators use videos in their curriculum. Educational videos may come in the form of recorded lectures, study materials, and recorded presentations. All of that content must be accessible, and yes, that means it should be accurately captioned.


How to Create an Accessible Presentation 🖥

For educational content, especially, accuracy is critical. Students who are deaf and hard of hearing rely on captions to consume video materials for their courses. Still, they aren’t able to do so effectively, if at all, without accurate captions.


The other thing to consider is that accurate captions benefit all students, and 80 percent of people who use captions are not deaf or hard of hearing. Many students without disabilities utilize captions as a tool to improve their learning, comprehension, and focus. One study by the University of Florida St. Petersburg found that 42 percent of students use closed captions to help maintain focus.

At the very least, accurate captions are a powerful learning tool that helps all students. At the very most, accurate captions are what allow students with hearing disabilities to have equal opportunities in learning environments. These two reasons should serve as a strong motivation for institutions to rethink their auto-caption use and to ensure high-quality, accessible captions for all recorded video content.

Caption Accuracy as a Legal Standard

Recent legal cases have deemed accurate captions as a legal expectation for educational institutions.

pen and pencil

In 2015, the NAD filed a class-action lawsuit against Harvard University for allegedly violating the Americans with Disabilities Act and the Rehabilitation Act by failing to provide accurate and comprehensive captioning for online educational videos.

The case between the NAD and Harvard was unique because it brought forth issues with the accuracy and comprehensiveness of the university’s captions.

After four years of litigation, on November 27, 2019, the NAD and Harvard University settled. The settlement contained specific requirements for caption accuracy, citing that captions should be on par with 3Play Media’s 99 percent accuracy rate.

In the case of video files, to overlay or externally embed synchronized visual text for speech and, consistent with WCAG 2.1 AA, provide nondialogue audio information needed to understand the program content, including sound effects, music, laughter, speaker identification, and location on a digital media file at an accuracy rate equal to that offered by a vendor captioning service such as 3PlayMedia and in a manner consistent with industry standards regarding synchronicity, completeness, and proper placement;Case 3:15-cv-30023-KAR Document 201-1

The outcome of the NAD v. Harvard case (as well as the NAD v. MIT case) has set a legal precedent regarding caption accuracy. The new guidelines for Harvard and MIT may be a strong motivator for other institutions to focus on caption quality and perhaps move away from solely relying on auto-captions.

*Please note, for this post, we’re specifically referring to auto-captions for recorded video content.

How the ADA Impacts Online Video Accessibility ebook

3play media logo in blue

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.