Opened 3 years ago

Last modified 3 years ago

#8964 new enhancement

Wrong character encoding for AVI container

Reported by: malaterre Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: avi
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

% echo $LANG
en_US.UTF-8
% ffmpeg -y -i input.mp4 -c:v copy -c:a copy -metadata title="Un Monstre à Paris" -metadata comment="An exercise with unicode" output.avi
% ffprobe -v error -hide_banner -show_format -print_format json -i output.avi              
[...]
            "title": "Un Monstre à Paris",

The above is incorrect, it should instead prints:

% mediainfo output.avi 
[...]
Movie name                               : Un Monstre à Paris

See:

Attachments (1)

RjICQ.png (17.2 KB ) - added by malaterre 3 years ago.
Title as printed from Windows 8.1 session

Download all attachments as: .zip

Change History (10)

by malaterre, 3 years ago

Attachment: RjICQ.png added

Title as printed from Windows 8.1 session

comment:1 by malaterre, 3 years ago

Last edited 3 years ago by malaterre (previous) (diff)

comment:2 by malaterre, 3 years ago

% ffmpeg -y -i input.mp4 -c:v copy -c:a copy -metadata title="$(echo 'Un Monstre à Paris' | iconv -t cp1252)" -metadata comment="An exercise with unicode" output.avi

gives:

% mediainfo output.avi
[...]
Movie name                               : Un Monstre à Paris

comment:3 by malaterre, 3 years ago

Version: unspecified4.1.4

comment:4 by mkver, 3 years ago

What makes you believe that the character encoding used by avi is defined at all?

comment:5 by malaterre, 3 years ago

This is based on the following comment:

RIFF: The internal encoding of RIFF strings (eg. in AVI and WAV files) is assumed to be Latin unless otherwise specified by the RIFF CSET chunk or the "-charset RIFF=CHARSET" option.

comment:6 by malaterre, 3 years ago

I did double check the RIFF specification from:

It states:

Specifies the code page used for file elements. If the CSET
chunk is not present, or if this field has value zero, assume
standard ISO 8859/1 code page (identical to code page
1004 without code points defined in hex columns 0, 1, 8,
and 9).

in reply to:  4 comment:7 by malaterre, 3 years ago

Replying to mkver:

What makes you believe that the character encoding used by avi is defined at all?

Which specification were you looking at ?

comment:8 by Carl Eugen Hoyos, 3 years ago

Component: ffprobeavformat
Keywords: avi added
Priority: normalwish
Version: 4.1.4git-master

For future tickets: Please remember to test current FFmpeg git head and provide the command line you tested together with the complete, uncut console output to make your tickets valid.

comment:9 by Carl Eugen Hoyos, 3 years ago

Summary: ffprobe: Wrong character encoding for AVI containerWrong character encoding for AVI container
Note: See TracTickets for help on using tickets.