Opened 4 months ago
Last modified 3 months ago
#11240 open defect
[Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail
Reported by: | m.feriati | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | tools |
Version: | git-master | Keywords: | FFREPORT filename |
Cc: | MasterQuestionable | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by )
Summary of the bug
this has been around for a long time, maybe years.
on windows, if the filename set in FFREPORT environment variable contains accented characters, the resulting log won't match the expected filename, although the log contains the correct filename.
ffplay started on 2024-10-11 at 16:22:37 Report written to "00 - hétérogénéisé.log" Log level: 40 Command line: ffplay -hide_banner -noborder -i "00 - h\xe9t\xe9rog\xe9n\xe9is\xe9.mp4" -vf "scale=1280:-2,setsar=1" Initialized direct3d renderer.
also tried different console code pages with no success
chcp 1252 chcp 65001 :: my default console code page chcp 437
expected behavior: produce 00 - hétérogénéisé.log
actual behavior: produces 00 - hétérogénéisé.log
hint
the target name can be recovered by reading the log filename bytes in code page 1252 and then rewritten as utf8
fyi
this applies to all ffmpeg binaries.
How to reproduce:
> set "FFREPORT=file=00 - hétérogénéisé.log:level=40" > ffplay -hide_banner -noborder -i "in\00 - hétérogénéisé.mp4" -vf scale=1280:-2,setsar=1 >ffmpeg -version ffmpeg started on 2024-10-11 at 16:15:29 Report written to "00 - hétérogénéisé.log" Log level: 40 ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
Change History (13)
comment:1 by , 3 months ago
Component: | undetermined → ffmpeg |
---|---|
Description: | modified (diff) |
follow-up: 3 comment:2 by , 3 months ago
Cc: | added |
---|---|
Component: | ffmpeg → avutil |
Keywords: | accented removed |
Summary: | accented characters in FFREPORT variable create wrong log filename → [Windows] Non-ASCII characters in command incorrectly reflected in "-report" |
Type: | defect → enhancement |
follow-up: 4 comment:3 by , 3 months ago
Component: | avutil → ffmpeg |
---|---|
Summary: | [Windows] Non-ASCII characters in command incorrectly reflected in "-report" → non-ASCII characters in report filename create wrong log filename on Windows |
Type: | enhancement → defect |
Replying to MasterQuestionable:
͏ Likely Windows problem.
͏ See also: https://trac.ffmpeg.org/ticket/11241#comment:5
͏ Windows is known to be buggy with Unicode support.
͏ I guess this one is probably caused by the shell and OS's handling.
͏ Likely not addressable from FFmpeg.
No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().
comment:4 by , 3 months ago
Replying to Marton Balint:
Replying to MasterQuestionable:
͏ Likely Windows problem.
͏ See also: https://trac.ffmpeg.org/ticket/11241#comment:5
͏ Windows is known to be buggy with Unicode support.
͏ I guess this one is probably caused by the shell and OS's handling.
͏ Likely not addressable from FFmpeg.
No, ffmpeg has full unicode filename support even on windows. The problem is likely that the ffreport file is opened using fopen() instead fopen_utf8().
thanks marton,
this is furthermore confirmed by the content of the generated log file:
Report written to "00 - hétérogénéisé.log"
which means that the ffreport environment variable is properly read by the ffmpeg binary.
ffmpeg just fails when it comes to create the log file with the proper character encoding.
see https://github.com/FFmpeg/FFmpeg/blob/db7b4fc89fb18d5ff0a1426bd433c234555a3fff/fftools/opt_common.c#L1210
report_file = fopen(filename.str, "w");
located in method int init_report(const char *env, FILE **file)
follow-up: 6 comment:5 by , 3 months ago
Summary: | non-ASCII characters in report filename create wrong log filename on Windows → [Windows] Non-ASCII characters as "-report" filename produced garbled filename |
---|
͏ Would you clarify somewhat... what went wrong exactly?
͏ Which part of the output log had unexpected content?
͏ Is it only the output log's filename wrong?
͏ (I thought it was the filenames in log's content...)
comment:6 by , 3 months ago
Replying to MasterQuestionable:
͏ Would you clarify somewhat... what went wrong exactly?
͏ Which part of the output log had unexpected content?
͏ Is it only the output log's filename wrong?
͏ (I thought it was the filenames in log's content...)
indeed it is only the output log's filename that is encoded in the wrong code page.
follow-up: 8 comment:7 by , 3 months ago
͏ Then this should be addressable.
͏ And likely just the cause mentioned by Marton.
͏ Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏ See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000
͏ ----
͏ The purpose is to hint the theoretical boundary:
͏ UTF-16 no surrogate cannot represent every possibility of UTF-8.
comment:8 by , 3 months ago
Replying to MasterQuestionable:
͏ Then this should be addressable.
͏ And likely just the cause mentioned by Marton.
͏ Note Windows does not necessarily support full-Unicode for the filename. (UTF-16 no surrogate)
͏ See also: https://github.com/exiftool/exiftool/issues/253#issuecomment-2063406000
for sure, it doesn't.
the issue's scope is to create a log file whose name matches an already existing input file.
comment:9 by , 3 months ago
Keywords: | accented added |
---|---|
Summary: | [Windows] Non-ASCII characters as "-report" filename produced garbled filename → [Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename |
comment:10 by , 3 months ago
͏ I believe it should also be reproducible specifying the filename in whatsoever manner.
͏ Also nothing specific to "accented". (apparently any non-ASCII would trap)
͏ Worth notice: title length matters.
comment:11 by , 3 months ago
Summary: | [Windows] filenames with non-ASCII characters in report environment variable produces garbled log filename → [Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename |
---|
comment:12 by , 3 months ago
Component: | ffmpeg → tools |
---|---|
Keywords: | accented removed |
Summary: | [Windows] filenames with non-ASCII characters in FFREPORT variable produces garbled log filename → [Windows] Non-ASCII characters in "FFREPORT" may produce garbled filename or fail |
͏ It appears "-report" only supports custom options via "FFREPORT" environment variable:
͏ https://ffmpeg.org/ffmpeg.html#Generic-options
͏ (yes, I don't use it...)
͏ Specifically tried below:
(Windows CMD)
͏ SET "FFREPORT=file=中.txt" & ffmpeg -i "中.mp4"
͏ SET "FFREPORT=file=é.txt" & ffmpeg -i "中.mp4"
͏ SET "FFREPORT=file=А.txt" & ffmpeg -i "中.mp4"
͏ .
͏ “Failed to open report "中.txt": Invalid argument”
͏ Garbled filename for else.
͏ Note "А" is Cyrillic.
͏ "é" is "e" + "́". (not "é")
͏ [ https://github.com/MasterInQuestion/Markup/blob/main/AAA.htm ]
comment:13 by , 3 months ago
Keywords: | filename added |
---|---|
Status: | new → open |
Version: | unspecified → git-master |
it is worth to mention that the environment variable is read as an utf8 string
env = getenv_utf8("FFREPORT");
͏ Likely Windows problem.
͏ See also: https://trac.ffmpeg.org/ticket/11241#comment:5
͏ Windows is known to be buggy with Unicode support.
͏ I guess this one is probably caused by the shell and OS's handling.
͏ Likely not addressable from FFmpeg.