HTML5 Video: How Captioning Works

November 11, 2015 BY EMILY GRIFFIN
Updated: January 4, 2018

HTML is the markup language used to render almost every page on the web. HTML5 is the latest version, and it’s replete with incredibly useful features, including a universal video standard that lets developers add video to a web page without using any third party plugins, like Flash. The new standard also makes it much easier to publish accessible video with closed captioning.

How HTML5 Improves Video Accessibility on the Web

Most browsers have adopted the basic video features offered by HTML5, as have popular cloud-based media players like Video.js and JW Player. Some browsers offer better video accessibility than others.

Best of all, HTML5 video offers a standardized caption format that works on all desktop devices and browsers.

Why Was Video Captioning in HTML so Difficult?

In older versions of HTML, there was no standard for rendering a video on a web page. Almost all videos were shown through plugins, like Flash, QuickTime, Silverlight, or RealPlayer.

Without standardization, you ran into compatibility conflicts across different browsers and devices. And although web publishers try to build redundancies and fallback provisions to maximize compatibility, it was practically impossible to publish video that worked universally.

As a consequence, publishing closed captions was difficult and unreliable because both caption format and encoding method depend on the video publishing technology used.

How HTML5 Captioning Works

HTML5 is a major step forward for standardizing video across web browsers and devices, and thus simplifying closed captioning. The idea is that web video will be based on an open, universal standard that works everywhere.

HTML5 natively supports video without the need for third party plugins. A video can be added to a web page using the video element, which makes it almost as simple as adding an image.

The track element can be used to display closed captions, subtitles, audio descriptions, chapter markers, or other timed-text data.

The HTML code below shows how these elements work:

<video width="320" height="240">
  <source type="video/mp4" src="my_video_file.mp4" />
  <track src="captions_file.vtt" label="English captions" kind="captions" srclang="en-us" default=""></track></video>
The attributes of the track element work like this:

src – specifies the URL location of the caption file
label – specifies the title of the track
kind – specifies the type of time-aligned text. The options are captions, subtitles, chapters, descriptions, or metadata.
srclang – specifies the language
default – specifies that this track is enabled by default. Note that multiple track elements can be used simultaneously.

What is the Standard Caption Format for HTML5 Video?

For several years, two caption formats competed for dominance in HTML5 video. In part, this is because there are two groups collaborating on HTML5: The Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C).

WHATWG developed and proposed the WebVTT (Web Video Text Tracks) caption format, which is a new, user-friendly text format that consists of line numbers, timelines, and text with formatting options. WebVTT is similar to the widely established SRT format, but accommodates text formatting, positioning, and rendering options (pop-up, roll-on, paint-on).

W3C proposed using TTML (timed text markup language), which is a widely established XML format supported in Adobe Flash, Microsoft Silverlight, and MS OfficeMix.

Eventually, W3C decided to support WebVTT. WebVTT is now the accepted standard format for HTML5 video.

WebVTT Caption Format

The WebVTT caption format is a text file with a .vtt extension. The file begins with a header “WEBVTT FILE” followed by cues and their corresponding text.

There are several parameters that allow you to control the line position, text position, and alignment. You can also add styling to the text within the cue itself. The example below demonstrates a bold <b> element.


00:00:13.000 --&gt; 00:00:016.100
<strong>ARNE DUNCAN:</strong> I'll start and
then turn it over to you.

00:00:16.100 --&gt; 00:00:20.100
It's so critically important
that parents be actively engaged

How To Add Captions & Subtitles to HTML5 Videos: Free Guide

Read the free report: 2017 State of Captioning.

The closed caption CC icon shown in the middle of a TV.