Opened 20 months ago

Closed 9 months ago

#11240 closed defect (fixed)

[Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail

Reported by: m.feriati Owned by: Zhao Zhili <quink@noreply.code.ffmpeg.org>
Priority: normal Component: tools
Version: git-master Keywords: FFREPORT filename
Cc: MasterQuestionable Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by m.feriati)

Summary of the bug
this has been around for a long time, maybe years.
on windows, if the filename set in FFREPORT environment variable contains accented characters, the resulting log won't match the expected filename, although the log contains the correct filename.

ffplay started on 2024-10-11 at 16:22:37
Report written to "00 - hétérogénéisé.log"
Log level: 40
Command line:
ffplay -hide_banner -noborder -i "00 - h\xe9t\xe9rog\xe9n\xe9is\xe9.mp4" -vf "scale=1280:-2,setsar=1"
Initialized direct3d renderer.

also tried different console code pages with no success

chcp 1252
chcp 65001 :: my default console code page
chcp 437

expected behavior: produce 00 - hétérogénéisé.log
actual behavior: produces 00 - hétérogénéisé.log

hint
the target name can be recovered by reading the log filename bytes in code page 1252 and then rewritten as utf8

fyi
this applies to all ffmpeg binaries.

How to reproduce:

> set "FFREPORT=file=00 - hétérogénéisé.log:level=40"
> ffplay -hide_banner -noborder -i "in\00 - hétérogénéisé.mp4" -vf scale=1280:-2,setsar=1

>ffmpeg -version
ffmpeg started on 2024-10-11 at 16:15:29
Report written to "00 - hétérogénéisé.log"
Log level: 40
ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers

Change History (16)

comment:1 by m.feriati, 20 months ago

Component: undeterminedffmpeg
Description: modified (diff)

comment:2 by MasterQuestionable, 20 months ago

Cc: MasterQuestionable added
Component: ffmpegavutil
Keywords: accented removed
Summary: accented characters in FFREPORT variable create wrong log filename[Windows] Non-ASCII characters in command incorrectly reflected in "-report"
Type: defectenhancement

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

Last edited 20 months ago by MasterQuestionable (previous) (diff)

in reply to:  2 ; comment:3 by Marton Balint, 20 months ago

Component: avutilffmpeg
Summary: [Windows] Non-ASCII characters in command incorrectly reflected in "-report"non-ASCII characters in report filename create wrong log filename on Windows
Type: enhancementdefect

Replying to MasterQuestionable:

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().

in reply to:  3 comment:4 by m.feriati, 20 months ago

Replying to Marton Balint:

Replying to MasterQuestionable:

͏    Likely Windows problem.
͏    See also: https://trac.ffmpeg.org/ticket/11241#comment:5

͏    Windows is known to be buggy with Unicode support.
͏    I guess this one is probably caused by the shell and OS's handling.
͏    Likely not addressable from FFmpeg.

No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().

thanks marton,
this is furthermore confirmed by the content of the generated log file:
Report written to "00 - hétérogénéisé.log"
which means that the ffreport environment variable is properly read by the ffmpeg binary.
ffmpeg just fails when it comes to create the log file with the proper character encoding.

see https://github.com/FFmpeg/FFmpeg/blob/db7b4fc89fb18d5ff0a1426bd433c234555a3fff/fftools/opt_common.c#L1210
report_file = fopen(filename.str, "w");
located in method int init_report(const char *env, FILE **file)

comment:5 by MasterQuestionable, 20 months ago

Summary: non-ASCII characters in report filename create wrong log filename on Windows[Windows] Non-ASCII characters as "-report" filename produced garbled filename

͏    Would you clarify somewhat... what went wrong exactly?
͏    Which part of the output log had unexpected content?

͏    Is it only the output log's filename wrong?
͏    (I thought it was the filenames in log's content...)

in reply to:  5 comment:6 by m.feriati, 20 months ago

Replying to MasterQuestionable:

͏    Would you clarify somewhat... what went wrong exactly?
͏    Which part of the output log had unexpected content?

͏    Is it only the output log's filename wrong?
͏    (I thought it was the filenames in log's content...)

indeed it is only the output log's filename that is encoded in the wrong code page.

Last edited 20 months ago by m.feriati (previous) (diff)

comment:7 by MasterQuestionable, 20 months ago

͏    Then this should be addressable.
͏    And likely just the cause mentioned by Marton.

͏    Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏    See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000

͏    ----

͏    The purpose is to hint the theoretical boundary:
͏    UTF-16 no surrogate cannot represent every possibility of UTF-8.

Last edited 20 months ago by MasterQuestionable (previous) (diff)

in reply to:  7 comment:8 by m.feriati, 20 months ago

Replying to MasterQuestionable:

͏    Then this should be addressable.
͏    And likely just the cause mentioned by Marton.

͏    Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏    See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000

for sure, it doesn't.
the issue's scope is to create a log file whose name matches an already existing input file.

comment:9 by m.feriati, 20 months ago

Keywords: accented added
Summary: [Windows] Non-ASCII characters as "-report" filename produced garbled filename[Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename

comment:10 by MasterQuestionable, 20 months ago

͏    I believe it should also be reproducible specifying the filename in whatsoever manner.
͏    Also nothing specific to "accented". (apparently any non-ASCII would trap)

͏    Worth notice: title length matters.

comment:11 by m.feriati, 20 months ago

Summary: [Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename[Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename

comment:12 by MasterQuestionable, 19 months ago

Component: ffmpegtools
Keywords: accented removed
Summary: [Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename[Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail

͏    It appears "-report" only supports custom options via "FFREPORT" environment variable:
͏    https://ffmpeg.org/ffmpeg.html#Generic-options
͏    (yes, I don't use it...)

͏    Specifically tried below:
(Windows CMD)
͏    SET "FFREPORT=file=中.txt" & ffmpeg -i "中.mp4"
͏    SET "FFREPORT=file=é.txt" & ffmpeg -i "中.mp4"
͏    SET "FFREPORT=file=А.txt" & ffmpeg -i "中.mp4"
͏    .
͏    “Failed to open report "中.txt": Invalid argument”
͏    Garbled filename for else.
͏    Note "А" is Cyrillic.
͏    "é" is "e" + "́". (not "é")
͏    [ https://github.com/MasterInQuestion/Markup/blob/main/AAA.htm ]

comment:13 by m.feriati, 19 months ago

Keywords: filename added
Status: newopen
Version: unspecifiedgit-master

it is worth to mention that the environment variable is read as an utf8 string

env = getenv_utf8("FFREPORT");

https://github.com/FFmpeg/FFmpeg/blob/3330b733d3eb912ee60a90a163ef8ee5f44ff4c0/fftools/cmdutils.c#L566

comment:14 by m.feriati, 10 months ago

hello, any updates to share ?
the fix is provided in the current discussion, is there anyone to apply it ?

Last edited 10 months ago by m.feriati (previous) (diff)

comment:15 by MasterQuestionable, 9 months ago

͏    Didn't notice any apparent patch..?
͏    Try submit a clear one?
͏    https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls

comment:16 by Zhao Zhili <quink@noreply.code.ffmpeg.org>, 9 months ago

Owner: set to Zhao Zhili <quink@noreply.code.ffmpeg.org>
Resolution: fixed
Status: openclosed

In 2a1d5dd7/ffmpeg:

fftools: use fopen_utf8 to open FFREPORT

Should fix #11240

Note: See TracTickets for help on using tickets.