Opened 10 years ago

Last modified 10 years ago

#4054 new enhancement

libavformat: subtitles: provide a mechanism to guess subtitle character encoding

Reported by: Ridley Combs Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: sub
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no


Sometimes, especially when ffmpeg is being called programmatically, it is difficult or impossible for the caller (or user) to know the character encoding of a subtitle file. It'd be useful for libavformat to provide a mechanism to detect the encoding if an option is set, using some combination of universalchardet, enca, or libguess.

Change History (2)

comment:1 by gjdfgh, 10 years ago

Cc: added

Some things to note:

  • no subtitle charset detector is good/sufficient, and you will always have the situation in which you have multiple guesses, and you want the user to select which guess, etc.
  • I think it's wrong to add detection directly to (or below) the subtitle demuxers - instead, maybe there should be a function to guess subtitle codec from a list of packets (you could provide a convenience function which does that using the libavformat internal packet queue)
  • the actual subtitle conversion should be somewhere else too, and maybe work on the packets (or you could set it as sub charset option in libavcodec, forgot the option name)

Also, this should probably be discussed on the mailing list. The bug tracker sucks for this purpose.

comment:2 by Carl Eugen Hoyos, 10 years ago

Keywords: sub added; subtitles removed
Note: See TracTickets for help on using tickets.