Opened 9 years ago
Closed 9 years ago
#4915 closed enhancement (fixed)
WebVTT decoder doesn't handle html escapes
Reported by: | RiCON | Owned by: | |
---|---|---|---|
Priority: | minor | Component: | avcodec |
Version: | git-master | Keywords: | webvtt |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
WebVTT spec specifies a dozen HTML escapes that should be handled, including '>', '<' and '&'. These aren't converted back to the proper characters.
FFmpeg version:
% ffmpeg -i htmlescapes.vtt out.srt ffmpeg version N-75818-g8135b1e Copyright (c) 2000-2015 the FFmpeg developers built with gcc 5.2.0 (Rev4, Built by MSYS2 project)
Attached is an example vtt file, result with this build and proper result.
Examples of where these html escapes are used can be found by getting the subtitles from any video in Comedy Central's site using something like youtube-dl. Example:
% youtube-dl --all-subs "http://www.cc.com/video-clips/52dpzm/the-daily-show-with-trevor-noah-terrible--unending-national-tragedies"
Attachments (4)
Change History (6)
by , 9 years ago
by , 9 years ago
Example of WebVTT with escapes as downloaded using youtube-dl
comment:1 by , 9 years ago
Forgot to link to the spec: http://dev.w3.org/html5/webvtt/#dfn-webvtt-cue-amp-escape
comment:2 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in 53886d6955134be8acc26f336bdf068fd970669d.
Resulting .srt from ffmpeg