Opened 4 months ago

Last modified 4 months ago

#11241 new enhancement

CC_IDENT macro generation doesn't consider non-utf8 encodings causes warnings and scrambled log when compiling on Windows

Reported by: violet Owned by:
Priority: minor Component: build system
Version: git-master Keywords: CC_IDENT
Cc: violet, MasterQuestionable Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description (last modified by violet)

Summary of the bug:
How to reproduce:

ENV: Win10 with Simplified-Chinese encoding "gb2312", msys2, ffmpeg 7.1+ master branch, VS2022

% ./configure
 \src\ffmpeg\ffmpeg-7.1\config.h(1): warning C4828: 文件包含在偏移 0x66e 处开始的字符,该字符在当前源字符集中无效(代码页 65001)

In config.h, due to the forced "/utf8" cflag, the generated CC_IDENT looks like:

#define CC_IDENT "���� x64 �� Microsoft (R) C/C++ �Ż������� 19.41.34123 ��"

It is actually a "gb2312" Chinese string:

#define CC_IDENT "用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.41.34123 版"

(slow fix)
I think it might be better to support non-utf8 encodings in CC_IDENT macro generation code .

(fast fix)
Or, let ./configure inherit CC_IDENT from environment or params, just like ./configure --extra-cflags, add another option ./configure --cc-ident. In this way, users can provide CC_IDENT in correct encoding manually.

Change History (6)

comment:1 by violet, 4 months ago

Description: modified (diff)

comment:2 by violet, 4 months ago

Priority: normalcritical

comment:3 by violet, 4 months ago

Analyzed by developer: set
Reproduced by developer: set

comment:4 by MasterQuestionable, 4 months ago

Cc: MasterQuestionable added
Priority: criticalminor
Type: defectenhancement

͏    UTF-8 has become the de facto standard of everything Plain Text.
͏    If you use certain legacy randomly defined charset: normalize your input first.

comment:5 by violet, 4 months ago

I'm afraid the utf-8 encoding is the de facto standard only in Linux and Internet, not Windows desktop. There are plenty of non-utf8 Windows desktop softwares all over the world.

If you reproduce this problem on a non-english Windows, you would find the CC_IDENT is generated by truncating the C compiler description automatically. Unfortunately, the C compiler in VS, i.e., cl.exe uses current system encoding to describe themselves, not utf-8. It's unacceptable to change the system encoding type to just satisfy ffmpeg, other desktop softwares will crash.

Before I report this issue, I've already tried numerous times to make ffmpeg use the utf-8 MY_CC_IDENT string converted by myself. But ./configure just override my correct CC_IDENT agin and agin. That's why I suggested adding --cc-ident parameter to let user decide CC_IDENT if necessary.

In short, ffmpeg/configure script need provide a way to let user override CC_IDENT for non-utf8 users. Add a parameter like --cc-ident to let users override original CC_IDENT is the simplest way.

comment:6 by MasterQuestionable, 4 months ago

͏    So this is essentially a compiler issue.
͏    The compilation things in many cases are unjustifiably sophisticated.
͏    And much beyond the scope of FFmpeg: but programming languages and hardware, in general.

Note: See TracTickets for help on using tickets.