How Can Advancements in Speech Recognition Help to Create Better Captioning?

January 6, 2020 BY ALEX FLEMING
Updated: January 7, 2020

When it comes to consideration around important issues like regulations and compliance, often the first thoughts go to the banking and financial sectors. Over the years, legislative regulation in these sectors has become one of the most important areas of consideration and innovation. Organizations have invested huge sums to not only protect themselves but the data and security of their customers. While we all want to ensure that our private data and especially banking data is safe, there are other industries that put equal importance on the adherence to the regulations in their industries to deliver better services to their end-users and protect themselves from fines.

Closed captioning is a crucial element within video content. It not only makes content more accessible; it also enables more people to engage with and enjoy the content. Additionally, adding text to video content enables enhanced searchability not only by end-users but from content providers to ensure their consumers can quickly find the content that they want.

Speech recognition has come a long way in recent years. In 2019, more organizations than ever started looking to this technology as a practical and scalable tool to enhance their businesses and to add value to existing business processes. Word error rates within automatically generated transcription have continued to improve, with English leading the way as one of the world’s most spoken languages. But transcription isn’t captioning!

View the 2019 Annual State of ASR Report

Styling, formatting, bespoke language and terminology across industries and different platforms that content is available are all considerations and have different requirements when it comes to delivering compliant captions. Constant updates to these requirements also add an additional layer of complexity. Automatic speech recognition significantly reduces the human effort required to convert the bulk of speech within media content into text for use in captioning.

Accuracy

Recent advancements in automatic speech recognition services mean that organizations have never been so empowered to offload larger portions of raw transcription to machines. The word error rate (often the proxy for measuring accuracy) now delivered by these solutions continues to be driven down across more languages. This is a good thing as more territories are looking to leaders in accessibility like North America and applying similar captioning legislation in their regions.

Even with these continued advancements in the reduction of errors within transcriptions, the accuracy required by the Federal Communications Commission (FCC) in North America and other regulatory bodies are a challenge without the addition of human editors. Automatic transcription means that human editors can focus on the complexities of delivering perfect captions that are still in the pipeline for automated transcription. This includes elements that are part of the legislations but not directly related to ASR like non-speech elements and the format of captions post transcription.

Punctuation

The words are a vital part of the transcript output, however, there are other elements that can significantly reduce the effort of transforming transcripts into captions. The inclusion of advanced punctuation with a rich number of punctuation characters and better capitalization means that transcripts are one step closer to a text-based representation of natural language and North American regulation even in their raw form. The regulation states that ‘punctuation should be used for maximum clarity in the text’. Better punctuation also makes machine transcripts easier to read for human editors, accelerating their ability to add the required changes to create captions quicker than ever while preserving the best possible accuracy.

2019 State of Automatic Speech Recognition ➡️

Customization

The Americans with Disabilities Act (ADA), tasked with equal opportunity for persons with disabilities, also sets out that captions should preserve and identify slang or accents. While this can be enabled through human editors, customization within the ASR means that solutions can be tailored to ensure it delivers the best possible accuracy, with limited editing.

Enabling the fast and effortless inclusion of difficult terms like names, accents, abbreviations, acronyms, and other specialists, industry or content-specific language into the recognition model, delivers the tools to take control of ASR capabilities and adapt to a diverse range of content. From documentaries and feature films to online videos and music videos, customization ensures that any media type can be transcribed without error – further reducing the time, and heavy lifting of humans in the captioning process.

Captioning is a complex and heavily regulated area of the media industry. Expectations might be that this looks relatively straightforward, however, there are rules that need to be followed to ensure regulations are met and that content is properly accessible. It is proven that automatic speech recognition technology adds value in the captioning process and with the continued advancements in machine learning, this will further contribute towards the value it can offer across more languages and to more challenging audio. It is for this reason that an ASR solution not only requires best-in-class accuracy, a wide breadth of languages, customization and the ability to deliver continued improvements to help deliver better captions.

—
Read the full 2019 State of ASR Report to learn more!

This is a guest blog written by Alex Fleming, Product Marketing Manager, at Speechmatics.

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

by Jena Wallace in Industry Trends

3Play’s Patent Playbook blog series tells the stories behind our patented technology. Learn how 3Play Media’s Research and Development (R&D) teams are spearheading innovation in accessibility tech and creating breakthroughs in the media accessibility industry at large. Captioning Best Practices for Media…

March 22, 2024

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

by Jena Wallace in Industry Trends

Human-In-The-Loop AI Dubbing Artificial intelligence (AI)-based dubbing solutions are emerging as a way to make videos accessible globally. But can AI deliver the quality necessary? Enter human-in-the-loop (HITL), a critical part of any successful AI dubbing workflow. In this blog, we will…

Updated April 4, 2024

What Is Dubbing? Everything You Need to Know About Dubbing Videos

by Jaclyn Lazzari in Video Accessibility

3Play’s Revolutionary AI Dubbing Service Dubbing is a common practice in the film and video industry, yet many people are still unsure of exactly what it is. That’s because dubbing preferences vary significantly by country and are shaped by the cultural landscape.…

Updated April 4, 2024

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.

Product

Why 3Play?

Learn

Company

Further Reading

3Play’s Patent Playbook: Transforming Caption Placement at Scale with Automated Closed Caption Positioning

Human-in-the-Loop (HITL) Dubbing: The Key to High-Quality and Engaging Global Content

What Is Dubbing? Everything You Need to Know About Dubbing Videos

Subscribe to the Blog Digest