The Problem with Using Auto Captions in Higher Education

August 1, 2023 BY REBECCA KLEIN

Real-Time Captioning in Higher Education [Free Webinar]

Higher educational institutions have a responsibility to ensure accessible learning materials for their students, including providing accurately captioned videos. The question is, are institutions that rely on auto captions providing truly accessible content?

In the 2023 State of Captioning, which measures captioning behaviors across industries, 45% of respondents reported using auto captions for educational content. But institutions that only use auto captions for recorded videos are not necessarily delivering on accessibility.

As the use of captioning in classrooms continues to grow, institutions must strive to meet accessibility standards. So, do auto captions meet quality standards? How accurate are they? Why might auto captions in higher education be problematic for recorded and live video content? In this blog, we’ll answer these questions and more.

Why Auto Captions in Higher Education Present a Problem for Students

Auto captions use automatic speech recognition (ASR) technology to transcribe audio and synchronize text with videos to create closed captions. Platforms like YouTube offer auto captioning tools for free, which is why auto captions are often so appealing.

However, using ASR on its own to generate captions for recorded videos is detrimental to the accuracy of the captions. In the 2023 State of Automatic Speech Recognition, which analyzes the general state of speech-to-text technology as it applies to the task of captioning and transcription, even the most accurate solutions on the market could only achieve up to 93% accuracy for non-specialized content with great audio quality, and most engines tested scored considerably lower. Higher education classes are often on complex subjects with difficult terminologies, such as medical school lectures, that aren’t often transcribed accurately with ASR.

Additionally, many conditions must be present to reach that limited accuracy, such as little or no background noise, minimal grammatical errors, minimal mispronunciations, and excellent audio quality. If any of the critical audio conditions aren’t met, auto caption accuracy can drop as low as 57.5%, according to our 2023 report. That means that out of every ten words, between five and six words could be captioned incorrectly. With that low of an accuracy rate, viewers must follow along with confusing and often inaccurate information.

The key to 99% or higher caption accuracy is human involvement.

ASR can be a useful tool for accurate captions since it does the bulk of the transcribing work. Once an auto captioning tool generates a rough transcript for recorded video content, a human should conduct quality assurance and make necessary edits to ensure caption accuracy.

Learn about real-time captioning in higher education ➡️

Caption Accuracy Goes a Long Way, Especially in Higher Education

The importance of digital media in education is increasing as more educators use videos in their curricula. Educational videos may come in the form of recorded lectures, study materials, and recorded presentations. All of that content must be accessible, which means it should be accurately captioned.

For educational content, accuracy is especially critical. Students who are deaf or hard of hearing rely on captions to consume video materials for their courses. Without accurate captions, these students are left with an inequitable learning experience.

The other aspect to consider is that accurate captions benefit all students, and 80% of people who use captions are not deaf or hard of hearing. Many students without disabilities utilize captions as a tool to improve their learning, comprehension, and focus. One study by the University of Florida St. Petersburg found that 42% of students use closed captions to help maintain focus.

At the very least, accurate captions are a powerful learning tool that helps all students. At the very most, accurate captions are what allow students with hearing disabilities to have equal opportunities in learning environments. These two reasons should serve as a strong motivation for institutions to rethink their auto-caption use and to ensure high-quality, accessible captions for all recorded video content.

Caption Accuracy as a Legal Standard

Recent legal cases have deemed accurate captions as a legal expectation for educational institutions.

In 2015, the NAD filed a class-action lawsuit against Harvard University for allegedly violating the Americans with Disabilities Act and the Rehabilitation Act by failing to provide accurate and comprehensive captioning for online educational videos.

The case between the NAD and Harvard was unique because it brought forth issues with the accuracy and comprehensiveness of the university’s captions.

After four years of litigation, on November 27, 2019, the NAD and Harvard University settled. The settlement contained specific requirements for caption accuracy, citing that captions should be on par with 3Play Media’s 99% accuracy rate.

In the case of video files, to overlay or externally embed synchronized visual text for speech and, consistent with WCAG 2.1 AA, provide nondialogue audio information needed to understand the program content, including sound effects, music, laughter, speaker identification, and location on a digital media file at an accuracy rate equal to that offered by a vendor captioning service such as 3PlayMedia and in a manner consistent with industry standards regarding synchronicity, completeness, and proper placement;<span class="su-quote-cite"><a href="https://creeclaw.org/wp-content/uploads/2019/11/NAD-v-Harvard-Consent-Decree.pdf" target="_blank">Case 3:15-cv-30023-KAR Document 201-1</a></span>

The outcome of the NAD v. Harvard case (as well as the NAD v. MIT case) has set a legal precedent regarding caption accuracy. The new guidelines for Harvard and MIT may be a strong motivator for other institutions to focus on caption quality and move away from solely relying on auto captions.

This blog was originally published in April 2020 by Jaclyn Leduc and has since been updated for accuracy, clarity, and comprehensiveness.

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

by Jena Wallace in Video Accessibility

3Play’s Patent Playbook blog series tells the stories behind our patented technology. Learn how 3Play Media’s Research and Development (R&D) teams are spearheading innovation in accessibility tech and creating breakthroughs in the media accessibility industry at large. Captioning Best Practices for Media…

March 22, 2024

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

by Jena Wallace in Video Accessibility

Human-In-The-Loop AI Dubbing Artificial intelligence (AI)-based dubbing solutions are emerging as a way to make videos accessible globally. But can AI deliver the quality necessary? Enter human-in-the-loop (HITL), a critical part of any successful AI dubbing workflow. In this blog, we will…

Updated April 4, 2024

What Is Dubbing? Everything You Need to Know About Dubbing Videos

by Jaclyn Lazzari in Video Accessibility

3Play’s Revolutionary AI Dubbing Service Dubbing is a common practice in the film and video industry, yet many people are still unsure of exactly what it is. That’s because dubbing preferences vary significantly by country and are shaped by the cultural landscape.…

Updated April 4, 2024