Context Navigation

#11241 new enhancement

CC_IDENT macro generation doesn't consider non-utf8 encodings causes warnings and scrambled log when compiling on Windows

Reported by:	violet	Owned by:
Priority:	minor	Component:	build system
Version:	git-master	Keywords:	CC_IDENT
Cc:	violet, MasterQuestionable	Blocked By:
Blocking:		Reproduced by developer:	yes
Analyzed by developer:	yes

Description (last modified by violet)

Summary of the bug:
How to reproduce:

ENV: Win10 with Simplified-Chinese encoding "gb2312", msys2, ffmpeg 7.1+ master branch, VS2022

% ./configure
 \src\ffmpeg\ffmpeg-7.1\config.h(1): warning C4828: 文件包含在偏移 0x66e 处开始的字符，该字符在当前源字符集中无效(代码页 65001)

In config.h, due to the forced "/utf8" cflag, the generated CC_IDENT looks like:

#define CC_IDENT "���� x64 �� Microsoft (R) C/C++ �Ż������� 19.41.34123 ��"

It is actually a "gb2312" Chinese string:

#define CC_IDENT "用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.41.34123 版"

(slow fix)
I think it might be better to support non-utf8 encodings in CC_IDENT macro generation code .

(fast fix)
Or, let ./configure inherit CC_IDENT from environment or params, just like ./configure --extra-cflags, add another option ./configure --cc-ident. In this way, users can provide CC_IDENT in correct encoding manually.

Change History (6)

comment:1 by violet, 4 months ago

Description:	modified (diff)

comment:2 by violet, 4 months ago

Priority:	normal → critical

comment:3 by violet, 4 months ago

Analyzed by developer:	set
Reproduced by developer:	set

comment:4 by MasterQuestionable, 4 months ago

Cc:	MasterQuestionable added
Priority:	critical → minor
Type:	defect → enhancement

͏ UTF-8 has become the de facto standard of everything Plain Text.
͏ If you use certain legacy randomly defined charset: normalize your input first.

comment:5 by violet, 4 months ago

I'm afraid the utf-8 encoding is the de facto standard only in Linux and Internet, not Windows desktop. There are plenty of non-utf8 Windows desktop softwares all over the world.

If you reproduce this problem on a non-english Windows, you would find the CC_IDENT is generated by truncating the C compiler description automatically. Unfortunately, the C compiler in VS, i.e., cl.exe uses current system encoding to describe themselves, not utf-8. It's unacceptable to change the system encoding type to just satisfy ffmpeg, other desktop softwares will crash.

Before I report this issue, I've already tried numerous times to make ffmpeg use the utf-8 MY_CC_IDENT string converted by myself. But ./configure just override my correct CC_IDENT agin and agin. That's why I suggested adding --cc-ident parameter to let user decide CC_IDENT if necessary.

In short, ffmpeg/configure script need provide a way to let user override CC_IDENT for non-utf8 users. Add a parameter like --cc-ident to let users override original CC_IDENT is the simplest way.

comment:6 by MasterQuestionable, 4 months ago

͏ So this is essentially a compiler issue.
͏ The compilation things in many cases are unjustifiably sophisticated.
͏ And much beyond the scope of FFmpeg: but programming languages and hardware, in general.

Note: See TracTickets for help on using tickets.

Download in other formats: