3Play Media Reshapes Video Landscape
Searchable, Interactive Transcripts Enhance Content
Kathryn M. O’Neill | MIT ILP
What if you could search a video as easily as you can search text—even click on a word in a transcript and jump right to the segment you want to watch? 3Play Media, a new company founded by four graduates of the MIT Sloan School of Management is making it easy and affordable to create such “interactive transcripts”—technology that promises to revolutionize the way videos are watched and studied.
“I see this really as a disruptive technology,” said Laurie Everett, director of MIT World, which features full-length videos of high-profile MIT lectures. “I think we’ll look back at some point and wonder how we ever managed to work with video without captions and transcripts, because this technology is going to become ubiquitous.”
“Captions have been considered something you have to do,” said Josh Miller MBA ’09, cofounder of 3Play Media with Chris “CJ” Johnson ’02 MBA ’08, Chris Antunes MBA ’08, and Jeremy Barron MBA ’08. “Interactive transcripts add sexiness to functionality that accomplishes essentially the same thing. Rather than say, ‘You should be captioning your video,’ we can say, ‘Look how much better your user experience can be.’”
3Play’s interactive transcripts enable users to watch a video in one window and simultaneously follow along with the transcript in another—the transcript is completely synchronized, highlighting each word as it is spoken. Every word of the transcript is also a link, so users can search for a term, click on that spot in the transcript, and immediately be taken to the corresponding moment in the video.
The Industrial Liaison Program (ILP) will be the first to pair videos with interactive transcripts from 3Play when it rolls out the video portion of its new website this January. All of ILP’s 2009 videos—faculty presentations from its Technology and the Corporation conference series—will be fully interactive at that time, according to Michael Lawson, ILP’s director of corporate communications.
“Video paired with interactive transcripts really underscores the value and utility of video,” Lawson said. “Video content can be very powerful, but most business executives don’t have the time to wade through a video hoping they’ll find useful content. And, up to now, there’s been no quick way to make that assessment.”
The interactive videos will allow ILP members to search lectures for relevant content, and reference specific segments to colleagues —making the material far more useful and efficient for busy executives, Lawson said.
Another major benefit of 3Play’s transcripts is accessibility for the hearing impaired, but accurate transcripts are also useful for non-English speakers and for those who need help with the technical terms used in some lectures. Not surprisingly, those who work with video at MIT have felt the lack of affordable transcription services keenly.
“If you have a transcript, and it’s searchable, that’s huge,” said Everett, who has been working for years to improve the accessibility of MIT World, which streams about 4,000 videos a day to audiences all over the world. (More than half of MIT World’s viewers are from outside North America.) “I literally get mail from all around the world every day. A typical email will say, ‘I am Chinese, I would like to study this, can I get a transcript?’”
Currently MIT World is only able to provide short synopses of the talks it features, said Everett, who is seeking funding to add 3Play’s interactive transcripts to the site.
Similarly, ILP previously supplied only abstracts for its talks, and indexing its videos was a major headache, Lawson said. “Something needs to be indexed for it to come up in a search,” he said, and that used to mean someone needed to watch the video and pick out keywords manually. Now, 3Play’s transcripts allow full-text searching.
But how do they do it?
While transcription services have been around for years, the process has been very manual, and costly.
Speech-recognition software is part of 3Play’s solution, but speech-recognition transcripts of lectures will almost always contain errors, according to James Glass, a principal research scientist at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and head of CSAIL’s Spoken Language Systems Group. “Although speech recognition output can be useful for search, for users who desire readable transcripts, there remains the need for humans to clean up speech recognition mistakes as well as hesitations, false starts, and other artifacts produced by the talker,” Glass said. “Humans also need to add capitalization and punctuation in order to produce a more readable transcript.”
“3Play’s founders know from working with Jim Glass and speech-to-text technology as it is now, that we’re at about 80 percent accuracy,” Everett said. And while researchers are working to improve the technology, that next 20 percent is especially hard, because speech recognition technology can be tripped up by specialized vocabulary, accents, false starts, even background noise, she said.
Johnson of 3Play got interested in interactive transcription while working with Glass on a project to provide search capability for academic lectures, using videos from MIT World and MIT’s OpenCourseWare. “We made a prototype that’s available where people could browse through these videos, they could enter search terms, and we’d show them where there was a hit,” Glass said, adding that the prototype also allowed people to jump to a particular point in the lecture and see a synchronized transcript. But getting transcripts of the lectures was fairly expensive, and that’s where Johnson saw a need to fill, he said.
“While everyone else is trying to innovate on 80 to 100, [3Play] tried to focus on correcting the 80 percent document to 100 percent,” Everett said. “They have used technology to innovate around the correction. That’s the first piece of the magic, and it’s a big part of the disruption.”
3Play uses its own patent-pending software to streamline a person’s ability to make the transcript 100 percent accurate. “You can’t get around the fact that a human has to review the [transcript]. The technology is designed to make that process as efficient as possible,” Antunes said.
Here’s how 3Play works, according to the company’s website. First, a computer uses automatic speech recognition technology to produce a draft transcript. Then, a professional transcriptionist uses 3Play’s proprietary software to edit every word of the transcript. And finally, a quality assurance manager conducts a final review of the transcripts and captions to ensure exceptional quality.
The result is “astounding,” said Everett, who spent 10 years working in media access at WGBH before coming to MIT in 2001, and noted that the process is significantly faster than conventional transcription services. In fact, according to Johnson, 3Play is able to create interactive transcripts up to three times faster than typical transcription in Word.
Plus, 3Play’s result is much more than just a transcript, or even closed captioning. Every word is a link, time-synched to the video. Click on any word and the video jumps to that spot.
“We deal with all different speakers, all different noise conditions. The process of actually taking that document, putting it in a Word document and making it readable for a person is not a trivial problem,” Johnson said.
But that’s not the only problem 3Play addresses. The company has also found innovative ways to make the transcripts useful, Everett said. For example, the transcript can be output as a printable document, turned into a caption for a Flash player, or used to generate tagwords.
“Nothing that I’ve seen lets you do all this in a simple easy package like this,” said Kris Brewer, webmaster and community liaison for TechTV, MIT’s user-generated content website. Brewer plans to implement 3Play’s interactive transcripts within the next few months to provide more accessibility as well as new features—such as allowing TechTV’s users to create “video mashups,” a customized sampling of video clips that maintains references to the original videos, so viewers can follow links to view more.
For ILP, the goal is to make MIT research more accessible to its corporate members, allowing one representative to reference key portions of a video and disseminate that throughout the company. But ILP is just the first to roll out interactive transcripts, Lawson said; others at MIT are already lined up to be next.
“If ILP has these features on the videos it will help all of us to show that this is how it works, this is what we need to do,” said Everett, explaining that proving the usefulness of interactive transcripts will help her and others at MIT secure funding. “We all want to get this done. There is no priority higher than this one.”
Copyright © 2009,2010 Massachusetts Institute of Technology
Original article posted here.