Opened 6 years ago

Closed 6 years ago

#4915 closed enhancement (fixed)

WebVTT decoder doesn't handle html escapes

Reported by: RiCON Owned by:
Priority: minor Component: avcodec
Version: git-master Keywords: webvtt
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

WebVTT spec specifies a dozen HTML escapes that should be handled, including '>', '<' and '&'. These aren't converted back to the proper characters.

FFmpeg version:

% ffmpeg -i htmlescapes.vtt out.srt
ffmpeg version N-75818-g8135b1e Copyright (c) 2000-2015 the FFmpeg developers
  built with gcc 5.2.0 (Rev4, Built by MSYS2 project)

Attached is an example vtt file, result with this build and proper result.
Examples of where these html escapes are used can be found by getting the subtitles from any video in Comedy Central's site using something like youtube-dl. Example:

% youtube-dl --all-subs "http://www.cc.com/video-clips/52dpzm/the-daily-show-with-trevor-noah-terrible--unending-national-tragedies"

Attachments (4)

out.srt (52 bytes ) - added by RiCON 6 years ago.
Resulting .srt from ffmpeg
proper.srt (41 bytes ) - added by RiCON 6 years ago.
Proper .srt with escapes converted
cc.vtt (11.0 KB ) - added by RiCON 6 years ago.
Example of WebVTT with escapes as downloaded using youtube-dl
htmlescapes.vtt (275 bytes ) - added by RiCON 6 years ago.
Added more test tags and replacements

Download all attachments as: .zip

Change History (6)

by RiCON, 6 years ago

Attachment: out.srt added

Resulting .srt from ffmpeg

by RiCON, 6 years ago

Attachment: proper.srt added

Proper .srt with escapes converted

by RiCON, 6 years ago

Attachment: cc.vtt added

Example of WebVTT with escapes as downloaded using youtube-dl

by RiCON, 6 years ago

Attachment: htmlescapes.vtt added

Added more test tags and replacements

comment:2 by RiCON, 6 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.