Changes between Initial Version and Version 1 of Ticket #7203, comment 3


Ignore:
Timestamp:
May 14, 2018, 4:01:00 AM (23 months ago)
Author:
mkver
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #7203, comment 3

    initial v1  
    11This file uses id3v2.3 tags. The TIT2-tag (the tag containing the title) is as follows in hex: 0x54 49 54 32 00 00 00 0A 00 00 00 C4 EE F0 EE E6 EA E0 20 31. According to [http://id3.org/id3v2.3.0#ID3v2_frame_overview the standard] the 0x00 after the length field indicates that the tag uses ISO-8859-1 as encoding, an encoding that does not contain cyrillic characters. For such purposes Unicode could (and should) be used, but isn't. This is a bug in the tool that created said file, not in FFmpeg.
    2 Btw: The last nine bytes are the actual titel; in Windows-1251 they would be read as "Дорожка 1"; in the Cyrillic DOS code page 866 that you are referring to it means "─юЁюцър 1". In ISO-8859-1 they mean "Äîðîæêà 1". FFmpeg's output to the console is encoded as UTF-8, but cmd.exe (that you seem to be using) expects applications to use the native legacy codepage of the system (for Russian Windows versions, this is usually Code page 855; cmd.exe is by the way Unicode compatible). The UTF-8 that FFmpeg writes to the console is 0xC3 84 C3 AE C3 B0 C3 AE C3 A6 C3 AA C3 A0 20 31. In CP 866 0xC3 is "├" whereas 0x84 is "Д". That six of the seven characters of the word (seem to) have been preserved does not really have a deeper meaning. It is accidental.
     2Btw: The last nine bytes are the actual titel; in Windows-1251 they would be read as "Дорожка 1"; in the Cyrillic DOS code page 866 that you are referring to it means "─юЁюцър 1". In ISO-8859-1 they mean "Äîðîæêà 1". FFmpeg's output to the console is encoded as UTF-8, but cmd.exe (that you seem to use) expects applications to use the native legacy codepage of the system (for Russian Windows versions, this is usually Code page 855; cmd.exe is by the way Unicode compatible). The UTF-8 that FFmpeg writes to the console is 0xC3 84 C3 AE C3 B0 C3 AE C3 A6 C3 AA C3 A0 20 31. In CP 866 0xC3 is "├" whereas 0x84 is "Д". That six of the seven characters of the word (seem to) have been preserved does not really have a deeper meaning. It is accidental.