Opened 16 months ago
Last modified 11 months ago
#9808 new defect
Curly brace characters in WebVTT subtitle files not encoded correctly
| Reported by: | Gavin Llewellyn | Owned by: | |
|---|---|---|---|
| Priority: | normal | Component: | avcodec |
| Version: | unspecified | Keywords: | |
| Cc: | Gavin Llewellyn, tfischer | Blocked By: | |
| Blocking: | Reproduced by developer: | no | |
| Analyzed by developer: | no |
Description
Summary of the bug:
I am trying to add subtitles from a VTT file to an MP4 video. However, where the VTT file has a pair of curly braces, I only see a backslash character in the MP4 when played back with QuickTime Player.
I can see the same issue when using ffmpeg to convert a VTT file to an SRT file.
How to reproduce:
% ./ffmpeg -i curly_braces.vtt output.srt
ffmpeg version N-107064-g7adeeff91f-tessus Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)
configuration: --cc=/usr/bin/clang --prefix=/opt/ffmpeg --extra-version=tessus --enable-avisynth --enable-fontconfig --enable-gpl --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-version3 --pkg-config-flags=--static --disable-ffplay
libavutil 57. 26.100 / 57. 26.100
libavcodec 59. 33.100 / 59. 33.100
libavformat 59. 24.100 / 59. 24.100
libavdevice 59. 6.100 / 59. 6.100
libavfilter 8. 40.100 / 8. 40.100
libswscale 6. 6.100 / 6. 6.100
libswresample 4. 6.100 / 4. 6.100
libpostproc 56. 5.100 / 56. 5.100
Input #0, webvtt, from 'curly_braces.vtt':
Duration: N/A, bitrate: N/A
Stream #0:0: Subtitle: webvtt
Output #0, srt, to 'output.srt':
Metadata:
encoder : Lavf59.24.100
Stream #0:0: Subtitle: subrip
Metadata:
encoder : Lavc59.33.100 srt
Stream mapping:
Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt))
Press [q] to stop, [?] for help
size= 0kB time=00:00:03.00 bitrate= 0.3kbits/s speed=4.44e+03x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2040.000000%
The input file:
$ cat curly_braces.vtt
WEBVTT
1
00:00:01.000 --> 00:00:06.000
{
2
00:00:02.000 --> 00:00:07.000
}
3
00:00:03.000 --> 00:00:08.000
{}
The output file:
$ cat output.srt
1
00:00:01,000 --> 00:00:06,000
\{
2
00:00:02,000 --> 00:00:07,000
\}
3
00:00:03,000 --> 00:00:08,000
\
Note that these characters do not need to be escaped in the VTT file from what I can see from the WebVTT spec: https://www.w3.org/TR/webvtt1/#webvtt-cue-text-span
Attachments (1)
Change History (3)
comment:1 by , 11 months ago
| Component: | undetermined → avcodec |
|---|
comment:2 by , 11 months ago
| Cc: | added |
|---|
Note:
See TracTickets
for help on using tickets.



I can confirm this bug. It happens when a WebVTT file is decoded in libavcodec/webvttdec.c, function webvtt_event_to_ass. See also line 40's comment "escape to avoid ASS markup conflicts".
Although removing line 40 would "solve" this particular bug, it may introduce new problems in other areas of subtitle handling, esp. ASS markup.
My proposed solution is to change function webvtt_encode_frame in libavcodec/webvttenc.c to "unescape" the two affected characters "{" and "}".
I am going to attach a patch that implements this feature.