How Can Advancements in Speech Recognition Help to Create Better Captioning?
Updated: January 7, 2020
When it comes to consideration around important issues like regulations and compliance, often the first thoughts go to the banking and financial sectors. Over the years, legislative regulation in these sectors has become one of the most important areas of consideration and innovation. Organizations have invested huge sums to not only protect themselves but the data and security of their customers. While we all want to ensure that our private data and especially banking data is safe, there are other industries that put equal importance on the adherence to the regulations in their industries to deliver better services to their end-users and protect themselves from fines.
Closed captioning is a crucial element within video content. It not only makes content more accessible; it also enables more people to engage with and enjoy the content. Additionally, adding text to video content enables enhanced searchability not only by end-users but from content providers to ensure their consumers can quickly find the content that they want.
Speech recognition has come a long way in recent years. In 2019, more organizations than ever started looking to this technology as a practical and scalable tool to enhance their businesses and to add value to existing business processes. Word error rates within automatically generated transcription have continued to improve, with English leading the way as one of the world’s most spoken languages. But transcription isn’t captioning!
Styling, formatting, bespoke language and terminology across industries and different platforms that content is available are all considerations and have different requirements when it comes to delivering compliant captions. Constant updates to these requirements also add an additional layer of complexity. Automatic speech recognition significantly reduces the human effort required to convert the bulk of speech within media content into text for use in captioning.
Recent advancements in automatic speech recognition services mean that organizations have never been so empowered to offload larger portions of raw transcription to machines. The word error rate (often the proxy for measuring accuracy) now delivered by these solutions continues to be driven down across more languages. This is a good thing as more territories are looking to leaders in accessibility like North America and applying similar captioning legislation in their regions.
Even with these continued advancements in the reduction of errors within transcriptions, the accuracy required by the Federal Communications Commission (FCC) in North America and other regulatory bodies are a challenge without the addition of human editors. Automatic transcription means that human editors can focus on the complexities of delivering perfect captions that are still in the pipeline for automated transcription. This includes elements that are part of the legislations but not directly related to ASR like non-speech elements and the format of captions post transcription.
The words are a vital part of the transcript output, however, there are other elements that can significantly reduce the effort of transforming transcripts into captions. The inclusion of advanced punctuation with a rich number of punctuation characters and better capitalization means that transcripts are one step closer to a text-based representation of natural language and North American regulation even in their raw form. The regulation states that ‘punctuation should be used for maximum clarity in the text’. Better punctuation also makes machine transcripts easier to read for human editors, accelerating their ability to add the required changes to create captions quicker than ever while preserving the best possible accuracy.
The Americans with Disabilities Act (ADA), tasked with equal opportunity for persons with disabilities, also sets out that captions should preserve and identify slang or accents. While this can be enabled through human editors, customization within the ASR means that solutions can be tailored to ensure it delivers the best possible accuracy, with limited editing.
Enabling the fast and effortless inclusion of difficult terms like names, accents, abbreviations, acronyms, and other specialists, industry or content-specific language into the recognition model, delivers the tools to take control of ASR capabilities and adapt to a diverse range of content. From documentaries and feature films to online videos and music videos, customization ensures that any media type can be transcribed without error – further reducing the time, and heavy lifting of humans in the captioning process.
Captioning is a complex and heavily regulated area of the media industry. Expectations might be that this looks relatively straightforward, however, there are rules that need to be followed to ensure regulations are met and that content is properly accessible. It is proven that automatic speech recognition technology adds value in the captioning process and with the continued advancements in machine learning, this will further contribute towards the value it can offer across more languages and to more challenging audio. It is for this reason that an ASR solution not only requires best-in-class accuracy, a wide breadth of languages, customization and the ability to deliver continued improvements to help deliver better captions.
Read the full 2019 State of ASR Report to learn more!
This is a guest blog written by Alex Fleming, Product Marketing Manager, at Speechmatics.
How to Scale Live Closed Captioning: 6 Top Tips
As live video content continues to grow in popularity, most video & social media platforms have enabled live streaming features due to the sheer number of people tuning into live streams. In 2019 alone, internet users watched a staggering 1.1 billion hours of…
Xbox Accessibility: The Importance of “Day One” Inclusive Design
There are an estimated 400 million gamers with disabilities on the planet. For Xbox, accessibility is at the core of what they do. They are committed to creating a gaming experience that everyone can enjoy, including the 400 million individuals mentioned above.…
Top 5 Use Cases for Live Captioning
Many brands and organizations are turning to live videos to engage with their audience, especially when they can’t connect in person. For viewers with competing priorities, live video content provides a great deal of flexibility – as long as there’s a device…