Opened 4 years ago

Closed 4 years ago

#4915 closed enhancement (fixed)

WebVTT decoder doesn't handle html escapes

Reported by: RiCON Owned by:
Priority: minor Component: avcodec
Version: git-master Keywords: webvtt
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

WebVTT spec specifies a dozen HTML escapes that should be handled, including '>', '<' and '&'. These aren't converted back to the proper characters.

FFmpeg version:

% ffmpeg -i htmlescapes.vtt out.srt
ffmpeg version N-75818-g8135b1e Copyright (c) 2000-2015 the FFmpeg developers
  built with gcc 5.2.0 (Rev4, Built by MSYS2 project)

Attached is an example vtt file, result with this build and proper result.
Examples of where these html escapes are used can be found by getting the subtitles from any video in Comedy Central's site using something like youtube-dl. Example:

% youtube-dl --all-subs "http://www.cc.com/video-clips/52dpzm/the-daily-show-with-trevor-noah-terrible--unending-national-tragedies"

Attachments (4)

out.srt (52 bytes) - added by RiCON 4 years ago.
Resulting .srt from ffmpeg
proper.srt (41 bytes) - added by RiCON 4 years ago.
Proper .srt with escapes converted
cc.vtt (11.0 KB) - added by RiCON 4 years ago.
Example of WebVTT with escapes as downloaded using youtube-dl
htmlescapes.vtt (275 bytes) - added by RiCON 4 years ago.
Added more test tags and replacements

Download all attachments as: .zip

Change History (6)

Changed 4 years ago by RiCON

Resulting .srt from ffmpeg

Changed 4 years ago by RiCON

Proper .srt with escapes converted

Changed 4 years ago by RiCON

Example of WebVTT with escapes as downloaded using youtube-dl

Changed 4 years ago by RiCON

Added more test tags and replacements

comment:2 Changed 4 years ago by RiCON

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.