How Can Advancements in Speech Recognition Help to Create Better Captioning?
Updated: January 7, 2020
When it comes to consideration around important issues like regulations and compliance, often the first thoughts go to the banking and financial sectors. Over the years, legislative regulation in these sectors has become one of the most important areas of consideration and innovation. Organizations have invested huge sums to not only protect themselves but the data and security of their customers. While we all want to ensure that our private data and especially banking data is safe, there are other industries that put equal importance on the adherence to the regulations in their industries to deliver better services to their end-users and protect themselves from fines.
Closed captioning is a crucial element within video content. It not only makes content more accessible; it also enables more people to engage with and enjoy the content. Additionally, adding text to video content enables enhanced searchability not only by end-users but from content providers to ensure their consumers can quickly find the content that they want.
Speech recognition has come a long way in recent years. In 2019, more organizations than ever started looking to this technology as a practical and scalable tool to enhance their businesses and to add value to existing business processes. Word error rates within automatically generated transcription have continued to improve, with English leading the way as one of the world’s most spoken languages. But transcription isn’t captioning!
Styling, formatting, bespoke language and terminology across industries and different platforms that content is available are all considerations and have different requirements when it comes to delivering compliant captions. Constant updates to these requirements also add an additional layer of complexity. Automatic speech recognition significantly reduces the human effort required to convert the bulk of speech within media content into text for use in captioning.
Recent advancements in automatic speech recognition services mean that organizations have never been so empowered to offload larger portions of raw transcription to machines. The word error rate (often the proxy for measuring accuracy) now delivered by these solutions continues to be driven down across more languages. This is a good thing as more territories are looking to leaders in accessibility like North America and applying similar captioning legislation in their regions.
Even with these continued advancements in the reduction of errors within transcriptions, the accuracy required by the Federal Communications Commission (FCC) in North America and other regulatory bodies are a challenge without the addition of human editors. Automatic transcription means that human editors can focus on the complexities of delivering perfect captions that are still in the pipeline for automated transcription. This includes elements that are part of the legislations but not directly related to ASR like non-speech elements and the format of captions post transcription.
The words are a vital part of the transcript output, however, there are other elements that can significantly reduce the effort of transforming transcripts into captions. The inclusion of advanced punctuation with a rich number of punctuation characters and better capitalization means that transcripts are one step closer to a text-based representation of natural language and North American regulation even in their raw form. The regulation states that ‘punctuation should be used for maximum clarity in the text’. Better punctuation also makes machine transcripts easier to read for human editors, accelerating their ability to add the required changes to create captions quicker than ever while preserving the best possible accuracy.
The Americans with Disabilities Act (ADA), tasked with equal opportunity for persons with disabilities, also sets out that captions should preserve and identify slang or accents. While this can be enabled through human editors, customization within the ASR means that solutions can be tailored to ensure it delivers the best possible accuracy, with limited editing.
Enabling the fast and effortless inclusion of difficult terms like names, accents, abbreviations, acronyms, and other specialists, industry or content-specific language into the recognition model, delivers the tools to take control of ASR capabilities and adapt to a diverse range of content. From documentaries and feature films to online videos and music videos, customization ensures that any media type can be transcribed without error – further reducing the time, and heavy lifting of humans in the captioning process.
Captioning is a complex and heavily regulated area of the media industry. Expectations might be that this looks relatively straightforward, however, there are rules that need to be followed to ensure regulations are met and that content is properly accessible. It is proven that automatic speech recognition technology adds value in the captioning process and with the continued advancements in machine learning, this will further contribute towards the value it can offer across more languages and to more challenging audio. It is for this reason that an ASR solution not only requires best-in-class accuracy, a wide breadth of languages, customization and the ability to deliver continued improvements to help deliver better captions.
Read the full 2019 State of ASR Report to learn more!
This is a guest blog written by Alex Fleming, Product Marketing Manager, at Speechmatics.
How our Customer Success Teams Provide Unrivaled Video Accessibility Support
Our customer success teams at 3Play Media feel passionate about helping our customers find the best solutions to manage accessibility at their organizations. With complex needs and workflows, it can sometimes seem daunting to achieve video accessibility. However, with the right team…
Beginner’s Guide to Pinterest Video Accessibility
Pinterest video content allows businesses to capture their audience’s attention better and engage in a fresh way. Whether you’re a fitness brand wanting to share workout clips or an eCommerce brand hoping to showcase product tutorials, Pinterest is a great place to…
The Current State of Translation for Virtual Events in the Corporate Space
Events have always been – and will continue to be – a staple for countless organizations, who use events to raise awareness of their brand, meet and interact with new prospects, and increase sales. However, due to the influx of online interactions…