« Return to all how-to guides

HTML5 Video Captioning

HTML5 logo

HTML is the markup language used to render almost every page on the web. HTML5 is the latest version, and it’s replete with incredibly useful features, including a universal video standard that lets developers add video to a web page without using any third party plugins, like Flash. The new standard also makes it much easier to publish accessible video through closed captioning.

This article provides an overview of how HTML5 will improve and standardize accessible video through captioning. Although HTML5 is still evolving, most browsers have already adopted the basic video features. The hope is that we will also be able to converge on a single web captioning format. Although we’re not quite there yet, this article examines the two caption formats being considered.

Why is video captioning so difficult in HTML?

In the current version of HTML, there is no standard for showing a video on a web page. Almost all videos are shown through plugins, like Flash, QuickTime, Silverlight, and RealPlayer. The problem with this approach is that there is no standardization across different browsers and devices. And although web publishers try to build redundancies and fallback provisions to maximize compatibility, it’s practically impossible to publish video that works universally. As a consequence, publishing closed captions has been difficult and unreliable because both the caption format and encoding method depend on the video publishing technology used.

How does HTML5 simplify web video and accessibility?

HTML5 is a major step forward for standardizing video across web browsers and devices, and thus simplifying closed captioning. The idea is that web video will be based on an open, universal standard that works everywhere. HTML5 natively supports video without the need for third party plugins. A video can be added to a web page using the video element, which makes it almost as simple as adding an image. The track element can then be used to display closed captions, subtitles, text video descriptions, chapter markers, or other time-aligned metadata.

The HTML code below shows how these elements work:

<video  width="320" height="240">
  <source type="video/mp4" src="my_video_file.mp4" >
  <track src="captions_file.vtt" label="English captions" kind="captions" srclang="en-us" default >

The attributes of the track element work like this:

src – specifies the name and location of the captions or subtitles file
label – specifies the title of the track
kind – specifies the type of time-aligned text. The options are captions, subtitles, chapters, descriptions, or metadata.
srclang – specifies the language
default – specifies that this track is enabled by default. Note that multiple track elements can be used simultaneously.

Will HTML5 include a standard caption format?

Currently there are two competing caption formats being considered. In part, this is because there are two groups collaborating on HTML5: The Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C).

WHATWG has developed and proposed the WebVTT (Web Video Text Tracks) caption format, which is a new, user friendly text format that consists of line numbers, timelines, and text with formatting options. WebVTT is similar to the widely established SRT format, but accommodates text formatting, positioning, and rendering options (pop-up, roll-on, paint-on).

W3C has proposed using TTML (timed text markup language), which is a widely established XML format supported in Adobe Flash and Microsoft Silverlight and used by sites like Netflix and Hulu.

To see how the two caption formats work, Microsoft built a HTML5 captioning prototype that demonstrates both formats in HTML5.

3Play Media has been participating in the development of captioning standards through the Web Media Text Tracks Community Group, which was created to advance this area of HTML5 and improve web captioning solutions.

Although the current HTML5 spec supports both caption formats, it appears that the WebVTT format is gaining ground on TTML. The hope is that we will converge on a single caption format, which would greatly simplify the process of publishing accessible video.

WebVTT caption format

The WebVTT caption format is a text file with a .vtt extension. The file begins with a header “WEBVTT FILE” followed by cues and their corresponding text. There are several parameters that allow you to control the line position, text position, and alignment. You can also add styling to the text within the cue itself. The example below demonstrates a bold <b> element.


00:00:13.000 --&gt; 00:00:016.100
<strong>ARNE DUNCAN:</strong> I'll start and
then turn it over to you.

00:00:16.100 --&gt; 00:00:20.100
It's so critically important
that parents be actively engaged

TTML caption format

<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en">
     <p begin="00:00:13.00" end="00:00:16.10">
       ARNE DUNCAN: I'll start and then turn it over to you.
     <p begin="00:00:16.10" end="00:00:20.10">
       It's so critically important that parents be actively engaged

When will the HTML5 video captioning features be ready for web-wide use?

The W3C and WHATWG have developed specifications for how video and captions should work in browsers. Although these standards are still being refined, it’s now up to the browser developers (Microsoft, Google, Mozilla, and Apple) to adopt these standards and build in the functionality. That will take some time. Although there appears to be a lot of consensus around video standardization, there are still some open issues hampering universal adoption. The reality is that browser developers have their own technical, legal, and business agendas .

Although the new <video> element is already supported by most browsers, there has been no consensus on a single video format (MP4, WebM, and Ogg are being considered). Also, most of the advanced video features are not yet ready for use. Unfortunately this includes the <track> element, which is required to publish captions and subtitles.

On May 25, 2011 the W3C announced “Last Call”, which was an invitation for communities inside and outside of W3C to provide feedback on whether the HTML5 technical requirements have been satisfied. The recommended release was set for 2014 and the hope is that it will gain web-wide adoption over the subsequent few years.

Interested in Learning More?