Opened 4 months ago

Last modified 3 months ago

#11240 open defect

[Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail

Reported by: m.feriati Owned by:
Priority: normal Component: tools
Version: git-master Keywords: FFREPORT filename
Cc: MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by m.feriati)

Summary of the bug
this has been around for a long time, maybe years.
on windows, if the filename set in FFREPORT environment variable contains accented characters, the resulting log won't match the expected filename, although the log contains the correct filename.

ffplay started on 2024-10-11 at 16:22:37
Report written to "00 - hétérogénéisé.log"
Log level: 40
Command line:
ffplay -hide_banner -noborder -i "00 - h\xe9t\xe9rog\xe9n\xe9is\xe9.mp4" -vf "scale=1280:-2,setsar=1"
Initialized direct3d renderer.

also tried different console code pages with no success

chcp 1252
chcp 65001 :: my default console code page
chcp 437

expected behavior: produce 00 - hétérogénéisé.log
actual behavior: produces 00 - hétérogénéisé.log

hint
the target name can be recovered by reading the log filename bytes in code page 1252 and then rewritten as utf8

fyi
this applies to all ffmpeg binaries.

How to reproduce:

> set "FFREPORT=file=00 - hétérogénéisé.log:level=40"
> ffplay -hide_banner -noborder -i "in\00 - hétérogénéisé.mp4" -vf scale=1280:-2,setsar=1

>ffmpeg -version
ffmpeg started on 2024-10-11 at 16:15:29
Report written to "00 - hétérogénéisé.log"
Log level: 40
ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers

Change History (13)

comment:1 by m.feriati, 3 months ago

Component: undeterminedffmpeg
Description: modified (diff)

comment:2 by MasterQuestionable, 3 months ago

Cc: MasterQuestionable added
Component: ffmpegavutil
Keywords: accented removed
Summary: accented characters in FFREPORT variable create wrong log filename[Windows] Non-ASCII characters in command incorrectly reflected in "-report"
Type: defectenhancement

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

Last edited 3 months ago by MasterQuestionable (previous) (diff)

in reply to:  2 ; comment:3 by Marton Balint, 3 months ago

Component: avutilffmpeg
Summary: [Windows] Non-ASCII characters in command incorrectly reflected in "-report"non-ASCII characters in report filename create wrong log filename on Windows
Type: enhancementdefect

Replying to MasterQuestionable:

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().

in reply to:  3 comment:4 by m.feriati, 3 months ago

Replying to Marton Balint:

Replying to MasterQuestionable:

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().

thanks marton,
this is furthermore confirmed by the content of the generated log file:
Report written to "00 - hétérogénéisé.log"
which means that the ffreport environment variable is properly read by the ffmpeg binary.
ffmpeg just fails when it comes to create the log file with the proper character encoding.

see https://github.com/FFmpeg/FFmpeg/blob/db7b4fc89fb18d5ff0a1426bd433c234555a3fff/fftools/opt_common.c#L1210
report_file = fopen(filename.str, "w");
located in method int init_report(const char *env, FILE **file)

comment:5 by MasterQuestionable, 3 months ago

Summary: non-ASCII characters in report filename create wrong log filename on Windows[Windows] Non-ASCII characters as "-report" filename produced garbled filename

͏    Would you clarify somewhat... what went wrong exactly?
͏    Which part of the output log had unexpected content?

͏    Is it only the output log's filename wrong?
͏    (I thought it was the filenames in log's content...)

in reply to:  5 comment:6 by m.feriati, 3 months ago

Replying to MasterQuestionable:

͏    Would you clarify somewhat... what went wrong exactly?
͏    Which part of the output log had unexpected content?

͏    Is it only the output log's filename wrong?
͏    (I thought it was the filenames in log's content...)

indeed it is only the output log's filename that is enconded in the wrong code page.

Version 0, edited 3 months ago by m.feriati (next)

comment:7 by MasterQuestionable, 3 months ago

͏    Then this should be addressable.
͏    And likely just the cause mentioned by Marton.

͏    Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏    See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000

͏    ----

͏    The purpose is to hint the theoretical boundary:
͏    UTF-16 no surrogate cannot represent every possibility of UTF-8.

Last edited 3 months ago by MasterQuestionable (previous) (diff)

in reply to:  7 comment:8 by m.feriati, 3 months ago

Replying to MasterQuestionable:

͏    Then this should be addressable.
͏    And likely just the cause mentioned by Marton.

͏    Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏    See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000

for sure, it doesn't.
the issue's scope is to create a log file whose name matches an already existing input file.

comment:9 by m.feriati, 3 months ago

Keywords: accented added
Summary: [Windows] Non-ASCII characters as "-report" filename produced garbled filename[Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename

comment:10 by MasterQuestionable, 3 months ago

͏    I believe it should also be reproducible specifying the filename in whatsoever manner.
͏    Also nothing specific to "accented". (apparently any non-ASCII would trap)

͏    Worth notice: title length matters.

comment:11 by m.feriati, 3 months ago

Summary: [Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename[Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename

comment:12 by MasterQuestionable, 3 months ago

Component: ffmpegtools
Keywords: accented removed
Summary: [Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename[Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail

͏    It appears "-report" only supports custom options via "FFREPORT" environment variable:
͏    https://ffmpeg.org/ffmpeg.html#Generic-options
͏    (yes, I don't use it...)

͏    Specifically tried below:
(Windows CMD)
͏    SET "FFREPORT=file=中.txt" & ffmpeg -i "中.mp4"
͏    SET "FFREPORT=file=é.txt" & ffmpeg -i "中.mp4"
͏    SET "FFREPORT=file=А.txt" & ffmpeg -i "中.mp4"
͏    .
͏    “Failed to open report "中.txt": Invalid argument”
͏    Garbled filename for else.
͏    Note "А" is Cyrillic.
͏    "é" is "e" + "́". (not "é")
͏    [ https://github.com/MasterInQuestion/Markup/blob/main/AAA.htm ]

comment:13 by m.feriati, 3 months ago

Keywords: filename added
Status: newopen
Version: unspecifiedgit-master

it is worth to mention that the environment variable is read as an utf8 string

env = getenv_utf8("FFREPORT");

https://github.com/FFmpeg/FFmpeg/blob/3330b733d3eb912ee60a90a163ef8ee5f44ff4c0/fftools/cmdutils.c#L566

Note: See TracTickets for help on using tickets.