Opened 5 years ago

Closed 4 years ago

#3363 closed defect (fixed)

ffprobe silently drops non-ASCII metadata in VQF files

Reported by: trejkaz Owned by:
Priority: important Component: ffprobe
Version: git-master Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

% ffprobe -show_format -show_streams -print_format json test.vqf 

% ffprobe -version
ffprobe version N-60503-g28975cb-tessus
built on Jan 28 2014 18:43:59 with llvm-gcc 4.2.1 (LLVM build 2336.1.00)
configuration: --prefix=/Users/tessus/data/ext/ffmpeg/sw --as=yasm --extra-version=tessus --disable-shared --enable-static --disable-ffplay --enable-gpl --enable-pthreads --enable-postproc --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid --enable-libspeex --enable-bzlib --enable-zlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libxavs --enable-version3 --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvpx --enable-libgsm --enable-libopus --enable-libmodplug --enable-fontconfig --enable-libfreetype --enable-libass --enable-libbluray --enable-filters --enable-runtime-cpudetect
libavutil      52. 63.100 / 52. 63.100
libavcodec     55. 49.101 / 55. 49.101
libavformat    55. 28.100 / 55. 28.100
libavdevice    55.  7.100 / 55.  7.100
libavfilter     4.  1.101 /  4.  1.101
libswscale      2.  5.101 /  2.  5.101
libswresample   0. 17.104 /  0. 17.104
libpostproc    52.  3.100 / 52.  3.100

[json @ 0x103000000] 1 invalid UTF-8 sequence(s) found in string 'Bl?mchen', replaced with

The value ffprobe emits is "Blchen".

The value it emitted before fixing #2502 was "Bl�mchen" (invalid character intentional) - which although containing an invalid character, at least retained all the valid characters. The current builds drop the "m" as well as the invalid character.

The value I would like to see, however, is "Blümchen".

If the issue is that the VQF module is doing something wrong to convert to Unicode, it would be good to get that fixed.

If the issue is that VQF is one of those legacy formats where the encoding isn't known, would it be possible to have some way to specify the system encoding? I can't just change the encoding of the entire system, because doing that in a cross-platform way is not really practical.

There is a sample exhibiting the issue in the mplayer samples:

http://samples.mplayerhq.hu/vqf/handinha.vqf

Change History (4)

comment:1 Changed 5 years ago by cehoyos

Is this not reproducible with ffmpeg?

comment:2 follow-up: Changed 5 years ago by trejkaz

ffmpeg outputs:

Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf':
  Metadata:
    title           : Hand in Hand (Gewalt ist doof!)
    comment         : http://bluemchen.koti.com.pl
    copyright       : Edel Records GmbH
    filename        : handinha.vqf
    author          : Bl?mchen
    size            : 300441

So it hasn't lost the character, but it has still mangled it.

comment:3 in reply to: ↑ 2 Changed 5 years ago by saste

Replying to trejkaz:

ffmpeg outputs:

Input #0, vqf, from '/Users/trejkaz/Downloads/handinha.vqf':
  Metadata:
    title           : Hand in Hand (Gewalt ist doof!)
    comment         : http://bluemchen.koti.com.pl
    copyright       : Edel Records GmbH
    filename        : handinha.vqf
    author          : Bl?mchen
    size            : 300441

So it hasn't lost the character, but it has still mangled it.

This depends on our UTF-8 decoding mechanism.

The '?' and the following character are interpreted as a single invalid UTF-8 sequence, and thus are consumed as a single "invalid" sequence. We could add a new flag for lazy decoding (starts from the second character if the whole sequence is invalid, which seems the system used by the terminal), or allow to set the text encoding.

comment:4 Changed 4 years ago by cehoyos

  • Priority changed from normal to important
  • Reproduced by developer set
  • Resolution set to fixed
  • Status changed from new to closed
  • Version changed from unspecified to git-master

The ffprobe issue was a regression afaict.

Fixed by Michael in a31547ce

Note: See TracTickets for help on using tickets.