Opened 7 months ago

Last modified 7 months ago

#10583 new defect

lavf fails to probe ass file with utf16le encoding

Reported by: llyyr Owned by:
Priority: normal Component: avformat
Version: git-master Keywords: ass
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

ffplay/ffmpeg can't open ass subtitle with utf16le encoding

How to reproduce:

% ffplay [input] -vf subtitles=filename=utf16le.ass
ffplay version 6.0 Copyright (c) 2003-2023 the FFmpeg developers
  built with gcc 13 (SUSE Linux)
  configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --optflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --disable-openssl --enable-gnutls --enable-ladspa --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libjack --enable-libjxl --enable-librist --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopenh264-dlopen --enable-libopus --enable-libpulse --enable-librav1e --enable-librubberband --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-lv2 --enable-libmfx --enable-vaapi --enable-vdpau --enable-version3 --enable-libfdk-aac-dlopen --enable-nonfree --enable-libvo-amrwbenc --enable-libx264 --enable-libx265 --enable-libxvid
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '(HDR HEVC 10-bit BT.2020 59.940fps) Camp by Sony.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isom
    creation_time   : 2016-02-03T08:01:30.000000Z
  Duration: 00:02:07.15, start: 0.000000, bitrate: 75806 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main 10) (hvc1 / 0x31637668), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 3840x2160 [SAR 1:1 DAR 16:9], 75620 kb/s, 59.94 fps, 59.94 tbr, 60k tbn (default)
    Metadata:
      creation_time   : 2016-02-03T07:59:49.000000Z
      handler_name    : Video Media Handler
      vendor_id       : [0][0][0][0]
      encoder         : HEVC Coding
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      creation_time   : 2016-02-03T07:59:49.000000Z
      handler_name    : Sound Media Handler
      vendor_id       : [0][0][0][0]
[Parsed_subtitles_0 @ 0x7f4fd5507000] libass API version: 0x1701000 
[Parsed_subtitles_0 @ 0x7f4fd5507000] libass source: tarball: 0.17.1
[Parsed_subtitles_0 @ 0x7f4fd5507000] Shaper: FriBidi 1.0.13 (SIMPLE) HarfBuzz-ng 8.1.1 (COMPLEX)
[Parsed_subtitles_0 @ 0x7f4fd5507000] Unable to open utf16le.ass0   

This can be reproduced on the oldest version available on distros (4.4.4) as well as git master.

I applied the following diff

diff --git a/libavformat/assdec.c b/libavformat/assdec.c
index bf7b8a73a2..90bce9ff9d 100644
--- a/libavformat/assdec.c
+++ b/libavformat/assdec.c
@@ -43,6 +43,9 @@ static int ass_probe(const AVProbeData *p)
 
     ff_text_read(&tr, buf, sizeof(buf));
 
+    for (int i = 0; i < sizeof(buf); i++)
+        printf("Character at index %d: %d)\n", i, buf[i]);
+
     if (!memcmp(buf, "[Script Info]", 13))
         return AVPROBE_SCORE_MAX;
 

and got the following output

Character at index 0:  (ASCII: -17)
Character at index 1: [ (ASCII: 91)
Character at index 2: S (ASCII: 83)
Character at index 3: c (ASCII: 99)
Character at index 4: r (ASCII: 114)
Character at index 5: i (ASCII: 105)
Character at index 6: p (ASCII: 112)
Character at index 7: t (ASCII: 116)
Character at index 8:   (ASCII: 32)
Character at index 9: I (ASCII: 73)
Character at index 10: n (ASCII: 110)
Character at index 11: f (ASCII: 102)
Character at index 12: o (ASCII: 111)

Attachments (1)

utf16le.ass (921.1 KB ) - added by llyyr 7 months ago.

Download all attachments as: .zip

Change History (3)

by llyyr, 7 months ago

Attachment: utf16le.ass added

comment:1 by mkver, 7 months ago

Your file contains two UTF-16 LE BOMs at the beginning; the subtitles.c code strips the first one away and when streamcopying said file, the resulting file gets an UTF-8 BOM from the second input BOM (otherwise there would be no BOM as usual). The only bug I encountered is in ff_text_peek_r8(): It's undoing of the read is incorrect in case there the text reader's buffer still contains other characters (which are currently implicitly discarded). But this is irrelevant for all current users of ff_text_peek_r8().

in reply to:  1 comment:2 by llyyr, 7 months ago

Replying to mkver:

Your file contains two UTF-16 LE BOMs at the beginning; the subtitles.c code strips the first one away and when streamcopying said file, the resulting file gets an UTF-8 BOM from the second input BOM (otherwise there would be no BOM as usual).

Ah I see, I thought this would be supported since I have quite a few files with two BOMs at the beginning. Would a patch that adds support for correctly probing such files be accepted?

Note: See TracTickets for help on using tickets.