Opened 4 years ago
Last modified 4 years ago
#7637 new defect
movenc.c does not properly handle subtitle durations (including pauses) exceeding INT_MAX - 1 microseconds
|Reported by:||erikbs||Owned by:|
|Blocking:||Reproduced by developer:||no|
|Analyzed by developer:||no|
Summary of the bug:
Take this sample VTT/SRT file:
WEBVTT 1 00:35:47.484 --> 00:35:50.000 Durations that exceed the signed int max value break the program
The first timecode translates to 2 147 484 000 microseconds, which is slightly greater the greatest value a signed 32-bit integer can hold (i.e. INT_MAX = 2 147 483 647). From what I understand, empty subtitle frames are written when there are no subtitles to display, which in this case means that a 35 min ~48 sec long empty frame is supposed to be written first. This exceeds the max int value and in ways unknown to me breaks the output file. The problem will occur in all of the following cases:
- The first text block starts after more than 35 min 47 sec
- The duration of any text block exceeds 35 min 47 sec
- The time between any two consecutive text blocks exceeds 35 min 47 sec
However, it does not occur if there is more than 35 min 47 sec left of the video after the last text block has been shown (I believe this is because the subtitles stream stops right after the last block, but I’m not sure – in the past ffmpeg would extend the last text block so that it would not end until the video/audio did, at least that happened when extracting the subtitles as an SRT file).
How to reproduce:
With input.mp4 being any mp4 video file that is at least 35 minutes and 50 seconds long and input.vtt being a text file containing the lines above, consider the following command line and output:
% ffmpeg -i input.mp4 -i 'input.vtt' -c copy -c:s mov_text test.mp4 -y ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers built with Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn) configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-avresample --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-librsvg --enable-libtheora --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libsoxr --enable-libspeex --enable-libass --enable-libbluray --enable-lzma --enable-gnutls --enable-fontconfig --enable-libfreetype --enable-libfribidi --disable-libjack --disable-libopencore-amrnb --disable-libopencore-amrwb --disable-libxcb --disable-libxcb-shm --disable-libxcb-xfixes --disable-indev=jack --enable-opencl --disable-outdev=xv --enable-audiotoolbox --enable-videotoolbox --enable-sdl2 --disable-securetransport --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --arch=x86_64 --enable-x86asm --enable-libx265 --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid libavutil 56. 22.100 / 56. 22.100 libavcodec 58. 35.100 / 58. 35.100 libavformat 58. 20.100 / 58. 20.100 libavdevice 58. 5.100 / 58. 5.100 libavfilter 7. 40.101 / 7. 40.101 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 3.100 / 5. 3.100 libswresample 3. 3.100 / 3. 3.100 libpostproc 55. 3.100 / 55. 3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: isommp42 creation_time : 2013-12-12T07:49:32.000000Z Duration: 01:35:52.42, start: 0.000000, bitrate: 497 kb/s Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 640x262, 398 kb/s, 25 fps, 25 tbr, 50 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 95 kb/s (default) Metadata: creation_time : 2013-12-12T07:50:59.000000Z handler_name : IsoMedia File Produced by Google, 5-11-2011 Input #1, webvtt, from 'input.vtt': Duration: N/A, bitrate: N/A Stream #1:0: Subtitle: webvtt Output #0, mp4, to 'test.mp4': Metadata: major_brand : mp42 minor_version : 0 compatible_brands: isommp42 encoder : Lavf58.20.100 Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 640x262, q=2-31, 398 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 95 kb/s (default) Metadata: creation_time : 2013-12-12T07:50:59.000000Z handler_name : IsoMedia File Produced by Google, 5-11-2011 Stream #0:2: Subtitle: mov_text (tx3g / 0x67337874) Metadata: encoder : Lavc58.35.100 mov_text Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #1:0 -> #0:2 (webvtt (native) -> mov_text (native)) Press [q] to stop, [?] for help frame=41005 fps=0.0 q=-1.0 size= 102912kB time=00:35:47.48 bitrate= 392.6kbits/ [mp4 @ 0x7fa8b2807600] Application provided duration: 2147484000 / timestamp: 2147484000 is out of range for mov/mp4 format [mp4 @ 0x7fa8b2807600] pts has no value frame=78181 fps=78176 q=-1.0 size= 188160kB time=00:52:07.28 bitrate= 492.9kbit frame=114025 fps=76013 q=-1.0 size= 275200kB time=01:16:01.11 bitrate= 494.3kbi frame=139645 fps=69820 q=-1.0 size= 339712kB time=01:33:06.27 bitrate= 498.2kbi frame=143809 fps=69301 q=-1.0 Lsize= 350392kB time=01:35:52.39 bitrate= 499.0kbits/s speed=2.77e+03x video:280077kB audio:67411kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.835837%
The produced files is valid, but the subtitles track is messed up.
The result is that the subtitle line is printed out as soon as the movie starts, instead of being shown at the correct time offset. If the file contains more than one text block, they are shown right after each other, but each with the correct duration.
In the Terminal printout, there will be an “Application provided duration: ####” warning for the text block that breaks the encoding and then one for each text block following it. There will also be a “pts has no value” warning for all of them.
Note that in the encoding status lines (“frame=... fps=... […] time=...”), the timecode will always start with the first “bad” value, and not increase until the encoder has passed that timecode. This provides a clue; it means that something goes wrong even before the actual encoding starts.
I have not checked how (or if) MP4Box tackles this.
In this case I used the mov_text codec for the subtitles, but movtextenc.c does not seem to care about durations at all, so the problem lies somewhere else (i.e. it’s not mov_text specific).
I noticed that in movenc.c, AVPacket structs actually are integrity checked, and a warning is printed out if the duration exceeds INT_MAX. However, this event is not handled. Instead, the packet is thrown away before any frame is written, which I guess is why the text blocks stack up at the beginning of the video with my test files.
If a subtitle frame (empty or not) will have a duration of more than INT_MAX microseconds, ffmpeg should instead split it into identical frames with durations of up to INT_MAX - 1. The mov_write_packet function in libavformat/movenc.c is probably where it must be done, cf. how it already calls mov_write_subtitle_end_packet() after the last subtitle packet/frame has been written.
Change History (4)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
|Component:||avcodec → undetermined|
|Keywords:||mov mov_text added|
comment:3 by , 4 years ago
|Component:||undetermined → avformat|
@flx90: Thank you, that looks like a much better workaround solution than using e.g. an underscore like I tried (lines consisting of plain spaces were just ignored).
PS. Adding the component “avformat”, see my original post where I point out that libavformat/movenc.c is where a patch should be applied (the encoder fails to respect that subtitle lines and pauses lengths are restricted to INT_MAX - 1 μs).
comment:4 by , 4 years ago
Update: while it does complain about invalid tags written by ffmpeg, MP4Box seems to handle the subtitles perfectly, both when it comes to writing them and when it comes to extracting them from an existing MP4 file.
I'm facing exactly the same problem.
Currently putting <i></i> for 1 ms as a workaround between a long time of absence of subtitles.