Industry Trends

Dog Training and Machine Learning: What They Have In Common

•

May 12, 2021

Although sometimes it seems we’re eerily close, machines haven’t replaced us yet. Yes, machines can make faster and more complex decisions, but it’s pretty easy to break one. Also, machines still can’t process logic that they haven’t been taught.

Try out some unexpected questions on your favorite voice assistant. A 2018 study found that Amazon’s Alexa was answering just over 50 percent of questions it was asked and 80 percent of those were correct. Amazon started crowdsourcing answers for Alexa from users in 2018, and in 2019, answer quantity went up, while measured quality took a more subjective turn. One user responded to the question “How do dolphins breed?” with “Dolphins are mammals and breathe with the lungs,” presumably assuming “breed” was meant to be “breathe.”

Andrew Ng recently affirmed that machine learning models may shine on curated test sets yet struggle on applications beyond a controlled environment. “So even though at a moment in time, on a specific data set, we can show this works, the clinical reality is that these models still need a lot of work to reach production… All of AI, not just healthcare, has a proof-of-concept-to-production gap,” Ng says.

Training Artificial Intelligence

Artificial intelligence (AI) can’t teach itself yet – and a recent article from the Harvard Business Review asserts that the secret to AI is people, seemingly underscoring Ng’s point with a similar theme. From personal experience, including a recent tour in adtech and now the video accessibility tech world, this rings true. Both solution spaces are highly dependent on AI – specifically machine learning – to offer value at scale. Machine learning (ML) applications in adtech include optimization of media and consumer pairings, identity, fraud detection and audience propensity, to name a few. All require training or a “truth set”.

Light bulb with sparkles Machine learning applications in video accessibility are equally diverse, with the obvious use being automated speech recognition (ASR). 3Play incorporates machine learning in a myriad of processes, including determining expected transcription job difficulty, likely errors in a transcript, and automated training of customer-specific language modeling with continuously updated truth sets. 3Play has been been training this process for 13 years, which starts to explain our position as the premium service provider in the captioning and video accessibility space.

Allow me to expand on the “secret to AI is people”, especially regarding transcription. Training in any capacity (whether it be training machine learning, training a new puppy, or training for a marathon) isn’t always easy. Just this morning, in fact, my dog, Fluffy, chewed up a new carpet in my dining room. No kidding. I needed to take the opportunity to teach Fluffy that chewing on the carpet is not appreciated, in hopes he might be discouraged. This, in theory, is not so different from training AI. If AI mangles an accented speaker’s dialogue, or struggles through obscure or specific terminology, a training set must be updated for that model to learn and handle similar challenges in the future. The fuller and better quality the training set, the more effectively the model learns.

3Play Media & Artificial Intelligence

As you may or may not know, 3Play transcription has refined the same fundamental process for caption production since 2008. Automated Speech Recognition (ASR), and especially 3Play’s application of ASR, has since significantly improved, in part because we’ve directly trained it and in part because we’ve augmented general ASR training with customer specific mappings of common corrections via bespoke and proprietary post-ASR process models.

Running automation on files to produce text is easy – that’s why AI generated captions are cheap, sometimes free, and worth every cent.

Training machine learning models correctly is hard and expensive – just like training a puppy. That’s why you should care that 3Play patented our editing training and contractor processes. Not just because we’ve been doing it longer (we have) and are objectively, consistently producing the highest accuracy output (we are), but because both the volume and quality of human edited training corrections matters. That dolphins don’t breed with their lungs feels like a detail we’ll want right in captions that reflect directly on our brand, school or program. While 3Play Media is detecting words (and not pneumonia) in captioning content that teaches people critical skills, including perhaps how to detect pneumonia; you probably don’t want to be on the receiving end of someone who had bad captions on that lesson. Accuracy matters.

Microphone icon on yellow background

3Play invented the original hybrid machine-human transcription process in 2008, utilizing automatic speech recognition (ASR) and AI, with editing, and quality assurance (QA) review. We’ve filed multiple patents yearly describing this process and enhancements to it from 2011 to 2021, and we remain busy. The rise of the marketplace model enabled 3Play to articulate and file a patent application for our contractor job market in 2011. Our contractors, along with 3Play technology, have been training our technology driven process to improve each year for 13 years. That’s 5 years earlier than most newer market entrants began developing a product.

Improving on 99.6% average transcript accuracy is challenging. 3Play devotes an entire third of our process to raise transcript accuracy from 98% to 99.6%, and we’re currently running multiple machine learning models and tooling experiments to push it higher. The last .4% can be subjective, trivial, or could just be a formatting preference as language and communication continue to evolve. A machine alone won’t get us all the way there, just as dogs won’t train themselves anytime soon, but we should expect the right tech-enabled processes to continue making real gains.

This blog post was written by John Slocum, Vice President of Product at 3Play Media.