Speech Recognition Gaffe of the Week: Bike Police Caption Fail

March 15, 2013 BY SHANNON K. MURPHY
Updated: January 4, 2018

Once again, we revisit Rhett and Link for another hilarious caption fail.

The video is funny, yes. But this is the perfect opportunity to bring up some of the reasons YouTube’s auto captions has inaccuracies, which are related to several audio characteristics.

Sound Quality- As you watch this video, you’ll notice the errors start right off the bat, largely due to police sirens in the background. No surprise YouTube’s speech recognition software screwed this up, as sometimes it is even hard for humans to understand speech during the presence of loud sounds. This is one of the reasons why captions and subtitles are helpful to us all.

Speech Quality- Fast speech, accents or a lack of enunciation between words often cause problems for speech recognition.

Complex Vocabulary- At one point, Link quickly sprouts off the phrase “bullet propulsion devices.” Considering that this isn’t the common wording for the more apt, “gun,” it’s no wonder the system does not recognize this jargon.

When a customer uploads a video to 3Play Media, we also run it through speech recognition, but then a professional transcriptionist reviews every word. After this round of editing, a quality assurance manager conducts a secondary review, researching difficult words and checking punctuation. This is how we’re able to achieve 99%+ accuracy for transcripts and captions, despite cases of poor audio quality, multiple speakers, difficult content, and accents. We try not to pick on YouTube too much. After all, they don’t have our great team! While our advanced technology enables competitive prices, it’s our stringent, multi-step human review that delivers quality.

Learn more about our Transcription Process

Read the free report: 2017 State of Captioning.

The closed caption CC icon shown in the middle of a TV.