Everything You Need to Know About HTML5 Video Captioning

Updated: October 23, 2018

HTML5 video captioning makes accessibility a whole lot easier.

HTML is the markup language used to render almost every page on the web. HTML5 is the latest version, and it’s replete with incredibly useful features, including a universal video standard that lets developers add video to a web page without using any third party plugins. The new standard also makes it much easier to publish accessible video with closed captioning.

How HTML5 Improves Video Accessibility on the Web

Most browsers have adopted the basic video features offered by HTML5, as have popular cloud-based media players like Video.js and JW Player. Some browsers offer better video accessibility than others.

Best of all, HTML5 video offers a standardized caption format that works on all desktop devices and browsers.

Why Was Video Captioning in HTML so Difficult?

In older versions of HTML, there was no standard for rendering a video on a web page. Almost all videos were shown through plugins, like QuickTime, Silverlight, or RealPlayer.

Without standardization, you ran into compatibility conflicts across different browsers and devices. And although web publishers try to build redundancies and fallback provisions to maximize compatibility, it was practically impossible to publish video that worked universally.

As a consequence, publishing closed captions was difficult and unreliable because both caption format and encoding method depend on the video publishing technology used.

How HTML5 Video Captioning Works

HTML5 is a major step forward for standardizing video across web browsers and devices and thus simplifying closed captioning. The idea is that web video will be based on an open, universal standard that works everywhere.

HTML5 natively supports video without the need for third-party plugins. A video can be added to a web page using the video element, which makes it almost as simple as adding an image.

The track element can be used to display closed captions, subtitles, audio descriptions, chapter markers, or other timed-text data.

The HTML code below shows how these elements work:

<video width="320" height="240">
<source type="video/mp4" src="my_video_file.mp4" />
<track src="captions_file.vtt" label="English captions" kind="captions" srclang="en-us" default=""></track></video>
The attributes of the track element work like this:

src – specifies the URL location of the caption file
label – specifies the title of the track
kind – specifies the type of time-aligned text. The options are captions, subtitles, chapters, descriptions, or metadata.
srclang – specifies the language
default – specifies that this track is enabled by default. Note that multiple track elements can be used simultaneously.

What is the Standard Caption Format for HTML5 Video?

how to create a webvtt file

For several years, two caption formats competed for dominance in HTML5 video. In part, this is because there are two groups collaborating on HTML5: The Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C).

WHATWG developed and proposed the WebVTT (Web Video Text Tracks) caption format, which is a new, user-friendly text format that consists of line numbers, timelines, and text with formatting options. WebVTT is similar to the widely established SRT format, but accommodates text formatting, positioning, and rendering options (pop-up, roll-on, paint-on).

W3C proposed using TTML (timed text markup language), which is a widely established XML format supported in Adobe Flash, and Microsoft Silverlight.

Eventually, W3C decided to support WebVTT. WebVTT is now the accepted standard format for HTML5 video.

WebVTT Caption Format

The WebVTT caption format is a text file with a .vtt extension. The file begins with a header “WEBVTT FILE” followed by cues and their corresponding text.

There are several parameters that allow you to control the line position, text position, and alignment. You can also add styling to the text within the cue itself. The example below demonstrates a bold <b> element.


00:00:13.000 --&gt; 00:00:016.100
<strong>ARNE DUNCAN:</strong> I'll start and
then turn it over to you.

00:00:16.100 --&gt; 00:00:20.100
It's so critically important
that parents be actively engaged

How HTML5 Video Captioning Works

  1. First, you’ll need a WebVTT caption file. You can either create one yourself, convert an existing caption file, or hire a professional captioning company.
  2. Upload a copy of your caption file to the same folder as your video file.
  3. Add a track tag inside the video’s html code. You can include information for:
    • src – the URL location of the caption file on your server
    • label – the title of the track as it displays for the viewer
    • kind – the type of time-aligned text. The options are captions, subtitles, chapters, descriptions, or metadata.
    • srclang – the language
    • default – makes this track enabled by default. Note that multiple track elements can be used simultaneously.
  4. Here is an example of what your code could look like for a video with both English captions and Spanish captions:

    <video width="320" height="240"> 
    <source type="video/mp4" src="/my_video_file.mp4" > 
    <track src="/captions_file.vtt" label="English" kind="captions" srclang="en-us" default >
    <track src="/Spanish_captions_file.vtt" label="Spanish" kind="subtitles" srclang="sp" >
  5. Save your changes. You video should now display a CC icon.

How To Add Captions & Subtitles to HTML5 Videos: Free Guide


This article was originally published as, “HTML5 Video: How Captioning Works” on November 11, 2015, by Emily Griffin. It has since been updated.

3play media logo in blue

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.

By subscribing you agree to our privacy policy.