Opened 8 months ago
Last modified 8 months ago
#10583 new defect
lavf fails to probe ass file with utf16le encoding
Reported by: | llyyr | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avformat |
Version: | git-master | Keywords: | ass |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
ffplay/ffmpeg can't open ass subtitle with utf16le encoding
How to reproduce:
% ffplay [input] -vf subtitles=filename=utf16le.ass ffplay version 6.0 Copyright (c) 2003-2023 the FFmpeg developers built with gcc 13 (SUSE Linux) configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --optflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --disable-openssl --enable-gnutls --enable-ladspa --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libjack --enable-libjxl --enable-librist --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopenh264-dlopen --enable-libopus --enable-libpulse --enable-librav1e --enable-librubberband --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-lv2 --enable-libmfx --enable-vaapi --enable-vdpau --enable-version3 --enable-libfdk-aac-dlopen --enable-nonfree --enable-libvo-amrwbenc --enable-libx264 --enable-libx265 --enable-libxvid libavutil 58. 2.100 / 58. 2.100 libavcodec 60. 3.100 / 60. 3.100 libavformat 60. 3.100 / 60. 3.100 libavdevice 60. 1.100 / 60. 1.100 libavfilter 9. 3.100 / 9. 3.100 libswscale 7. 1.100 / 7. 1.100 libswresample 4. 10.100 / 4. 10.100 libpostproc 57. 1.100 / 57. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '(HDR HEVC 10-bit BT.2020 59.940fps) Camp by Sony.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isom creation_time : 2016-02-03T08:01:30.000000Z Duration: 00:02:07.15, start: 0.000000, bitrate: 75806 kb/s Stream #0:0[0x1](und): Video: hevc (Main 10) (hvc1 / 0x31637668), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160 [SAR 1:1 DAR 16:9], 75620 kb/s, 59.94 fps, 59.94 tbr, 60k tbn (default) Metadata: creation_time : 2016-02-03T07:59:49.000000Z handler_name : Video Media Handler vendor_id : [0][0][0][0] encoder : HEVC Coding Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 192 kb/s (default) Metadata: creation_time : 2016-02-03T07:59:49.000000Z handler_name : Sound Media Handler vendor_id : [0][0][0][0] [Parsed_subtitles_0 @ 0x7f4fd5507000] libass API version: 0x1701000 [Parsed_subtitles_0 @ 0x7f4fd5507000] libass source: tarball: 0.17.1 [Parsed_subtitles_0 @ 0x7f4fd5507000] Shaper: FriBidi 1.0.13 (SIMPLE) HarfBuzz-ng 8.1.1 (COMPLEX) [Parsed_subtitles_0 @ 0x7f4fd5507000] Unable to open utf16le.ass0
This can be reproduced on the oldest version available on distros (4.4.4) as well as git master.
I applied the following diff
diff --git a/libavformat/assdec.c b/libavformat/assdec.c index bf7b8a73a2..90bce9ff9d 100644 --- a/libavformat/assdec.c +++ b/libavformat/assdec.c @@ -43,6 +43,9 @@ static int ass_probe(const AVProbeData *p) ff_text_read(&tr, buf, sizeof(buf)); + for (int i = 0; i < sizeof(buf); i++) + printf("Character at index %d: %d)\n", i, buf[i]); + if (!memcmp(buf, "[Script Info]", 13)) return AVPROBE_SCORE_MAX;
and got the following output
Character at index 0: (ASCII: -17) Character at index 1: [ (ASCII: 91) Character at index 2: S (ASCII: 83) Character at index 3: c (ASCII: 99) Character at index 4: r (ASCII: 114) Character at index 5: i (ASCII: 105) Character at index 6: p (ASCII: 112) Character at index 7: t (ASCII: 116) Character at index 8: (ASCII: 32) Character at index 9: I (ASCII: 73) Character at index 10: n (ASCII: 110) Character at index 11: f (ASCII: 102) Character at index 12: o (ASCII: 111)
Attachments (1)
Change History (3)
by , 8 months ago
Attachment: | utf16le.ass added |
---|
follow-up: 2 comment:1 by , 8 months ago
comment:2 by , 8 months ago
Replying to mkver:
Your file contains two UTF-16 LE BOMs at the beginning; the subtitles.c code strips the first one away and when streamcopying said file, the resulting file gets an UTF-8 BOM from the second input BOM (otherwise there would be no BOM as usual).
Ah I see, I thought this would be supported since I have quite a few files with two BOMs at the beginning. Would a patch that adds support for correctly probing such files be accepted?
Your file contains two UTF-16 LE BOMs at the beginning; the subtitles.c code strips the first one away and when streamcopying said file, the resulting file gets an UTF-8 BOM from the second input BOM (otherwise there would be no BOM as usual). The only bug I encountered is in ff_text_peek_r8(): It's undoing of the read is incorrect in case there the text reader's buffer still contains other characters (which are currently implicitly discarded). But this is irrelevant for all current users of ff_text_peek_r8().