Introducing Vertical Caption (or Subtitle) Placement

March 24, 2014 BY ROGER ZIMMERMAN
Updated: June 3, 2019

What Is Vertical Caption (or Subtitle) Placement?

Vertical caption (or subtitle) placement is our patent-pending solution to a common problem: Some videos have text on the bottom of the screen that is important for understanding the video. For example, a documentary may include the names and titles of interview subjects as they are first introduced. An “auction” reality TV show may display the current prices of an item as it is bid up by the participants. Or an engineering lecture may show a formula near the bottom of the screen.

Vertical Caption Placement Demo

In each of these cases, placing a caption at the bottom of the frame, as is usually done, would obscure text that is important for the viewer to see. Instead, the captions should be placed at the top of the screen during these time periods (as long as the top of the screen does not also contain such text). It is also important for such placed captions to be vertically “stable”; they should not jump around in the middle of a sentence, and they should remain in their current position for long enough so that their movement does not overly distract the viewer.

Our vertical caption placement functionality meets all of these requirements, and does so completely automatically, cost-effectively, and as part of the standard captioning and subtitling workflow.

Play the video below to see a demonstration of vertical caption placement.

How Does It Work?

The caption placement algorithm works in several stages:

First, every frame of the video is examined by a text detection algorithm that uses a statistical model to determine the likelihood that text exists at the bottom of the frame. This statistical model was developed using many thousands of hand-labeled video frames and is designed to identify regions where letter-like patterns appear. The algorithm does not attempt to “read” the text. It is interested only in determining the probability that text-like patterns are present in the search region.

Next, the algorithm searches across time, comparing the vertical pixel location of the purported text regions for every frame. The algorithm then computes a time-dependent probability for the text, taking into account the per-frame probabilities as well as the positional stability across frames.

Then, if the time-dependent text probability is high enough, the algorithm checks for burned-in text at the top of the video, over the same time region. The bottom and top text probabilities are compared, and if the bottom probability is higher, captions are moved to the top of the video. Usually, the top probability is very close to zero because it is uncommon for text to exist at the top. If the top probability is higher than the bottom probability, the caption is left in the bottom position.

Finally, the algorithm applies time and continuity constraints to all time regions where captions are to be repositioned to the top of the video. In particular, if any part of a sentence is to be placed at the top, then all captions in that sentence will also be placed at the top. Or, if a sentence is very short, such that the captions would jump back and forth between top and bottom locations, the algorithm may choose to leave the caption at the more common location (e.g., instead of going top-bottom-top, it may resolve to do top-top-top).

The vertical caption placement process is applied to the core timed text document, associating the top and bottom placement indicators with each frame. When you request an output format that supports caption (or subtitle) placement (see below), the core document is converted to that format, with the placement information translated in a format-compatible manner. The downloaded caption (or subtitle) file can then be used with a video player that supports caption placement.

Caption Placement Limitations

  • It is not possible to use the Same Day service in combination with caption placement due to the increased processing time.
  • The caption placement algorithm is designed to err on the side of moving captions to the top, which results in fewer cases of bottom text obscuration. On rare occasions, the captions are moved to the top needlessly.
  • It is not possible to perform vertical caption placement on videos that have continuous burned-in text at the bottom of the frame (e.g. burned-in time code).
  • Videos that have burned-in text at the top of the frame can be put through vertical caption placement, but we will NOT check the top of the frame prior to placement; i.e. the top-of-frame text probability will always be assumed to be zero.
  • Vertical caption placement can be run retroactively on files that have already been processed. However, it must be done within 60 days after file completion. The process requires the source video files, which are deleted from our system after 60 days.
  • A per-minute price increment applies. See pricing details.

Supported Output Formats

Below are the caption/subtitle output formats that support vertical placement.

  • SCC
  • STL
  • Web-VTT
  • DFXP
  • SMPTE- TT

3play media logo in blue

Subscribe to the Blog Digest

Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.


By subscribing you agree to our privacy policy.