Opened 23 months ago

Last modified 18 months ago

#9808 new defect

Curly brace characters in WebVTT subtitle files not encoded correctly

Reported by: Gavin Llewellyn Owned by:
Priority: normal Component: avcodec
Version: unspecified Keywords:
Cc: Gavin Llewellyn, tfischer Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
I am trying to add subtitles from a VTT file to an MP4 video. However, where the VTT file has a pair of curly braces, I only see a backslash character in the MP4 when played back with QuickTime Player.

I can see the same issue when using ffmpeg to convert a VTT file to an SRT file.

How to reproduce:

% ./ffmpeg -i curly_braces.vtt output.srt
ffmpeg version N-107064-g7adeeff91f-tessus Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.17)
  configuration: --cc=/usr/bin/clang --prefix=/opt/ffmpeg --extra-version=tessus --enable-avisynth --enable-fontconfig --enable-gpl --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-version3 --pkg-config-flags=--static --disable-ffplay
  libavutil      57. 26.100 / 57. 26.100
  libavcodec     59. 33.100 / 59. 33.100
  libavformat    59. 24.100 / 59. 24.100
  libavdevice    59.  6.100 / 59.  6.100
  libavfilter     8. 40.100 /  8. 40.100
  libswscale      6.  6.100 /  6.  6.100
  libswresample   4.  6.100 /  4.  6.100
  libpostproc    56.  5.100 / 56.  5.100
Input #0, webvtt, from 'curly_braces.vtt':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Subtitle: webvtt
Output #0, srt, to 'output.srt':
  Metadata:
    encoder         : Lavf59.24.100
  Stream #0:0: Subtitle: subrip
    Metadata:
      encoder         : Lavc59.33.100 srt
Stream mapping:
  Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt))
Press [q] to stop, [?] for help
size=       0kB time=00:00:03.00 bitrate=   0.3kbits/s speed=4.44e+03x    
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2040.000000%

The input file:

$ cat curly_braces.vtt 
WEBVTT

1
00:00:01.000 --> 00:00:06.000
{

2
00:00:02.000 --> 00:00:07.000
}

3
00:00:03.000 --> 00:00:08.000
{}

The output file:

$ cat output.srt 
1
00:00:01,000 --> 00:00:06,000
\{

2
00:00:02,000 --> 00:00:07,000
\}

3
00:00:03,000 --> 00:00:08,000
\

Note that these characters do not need to be escaped in the VTT file from what I can see from the WebVTT spec: https://www.w3.org/TR/webvtt1/#webvtt-cue-text-span

Attachments (1)

webvtt-unescape.diff (1.7 KB ) - added by tfischer 18 months ago.
Unescaping '{' and '}'

Download all attachments as: .zip

Change History (3)

comment:1 by tfischer, 18 months ago

Component: undeterminedavcodec

I can confirm this bug. It happens when a WebVTT file is decoded in libavcodec/webvttdec.c, function webvtt_event_to_ass. See also line 40's comment "escape to avoid ASS markup conflicts".
Although removing line 40 would "solve" this particular bug, it may introduce new problems in other areas of subtitle handling, esp. ASS markup.

My proposed solution is to change function webvtt_encode_frame in libavcodec/webvttenc.c to "unescape" the two affected characters "{" and "}".

I am going to attach a patch that implements this feature.

by tfischer, 18 months ago

Attachment: webvtt-unescape.diff added

Unescaping '{' and '}'

comment:2 by tfischer, 18 months ago

Cc: tfischer added
Note: See TracTickets for help on using tickets.