Opened 5 years ago

Last modified 5 years ago

#7970 new enhancement

Guess character encoding of ID3v1 tags

Reported by: Jyrki Vesterinen Owned by:
Priority: normal Component: avformat
Version: git-master Keywords: id3v1
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

This ticket is essentially a variant of #7203. #7203 was closed because the file in that ticket uses ID3v2 tags and the tag in question was encoded as Windows-1251 but tagged as ISO-8859-1.

However, as far as I can tell, ID3v1 has no way to specify the character encoding. FFmpeg simply assumes UTF-8, and it's not always right. The attached file has the "artist" field encoded as Shift-JIS. Its value is すずき けいこ.

It would be great if FFmpeg attempted to heuristically detect character encoding of ID3v1 tags.

Attachments (1)

shift_jis_artist_name.mp3 (14.7 KB ) - added by Jyrki Vesterinen 5 years ago.

Download all attachments as: .zip

Change History (2)

by Jyrki Vesterinen, 5 years ago

Attachment: shift_jis_artist_name.mp3 added

comment:1 by Jyrki Vesterinen, 5 years ago

I looked around as to how much effort it would be to implement such heuristics. If it was easy enough (e.g. by utilizing an existing function somewhere), I could implement it myself and send a pull request.

It looks like FFmpeg doesn't currently have automatic character set detection anywhere. See #4054 for related discussion.

The mpv media player uses the uchardet library for character set detection: https://github.com/mpv-player/mpv/blob/c9e7473d67893d9248bedf63530a1e0325a3036a/misc/charset_conv.c#L136

It seems that implementing this would require pulling in a new library, either uchardet or something else. Such a change would be much larger than what I'm willing to do.

Note: See TracTickets for help on using tickets.