Opened 3 years ago

Last modified 3 years ago

#4054 new enhancement

libavformat: subtitles: provide a mechanism to guess subtitle character encoding

Reported by: 11rcombs Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: sub
Cc: nfxjfg@googlemail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Sometimes, especially when ffmpeg is being called programmatically, it is difficult or impossible for the caller (or user) to know the character encoding of a subtitle file. It'd be useful for libavformat to provide a mechanism to detect the encoding if an option is set, using some combination of universalchardet, enca, or libguess.

Change History (2)

comment:1 Changed 3 years ago by gjdfgh

  • Cc nfxjfg@googlemail.com added

Some things to note:

  • no subtitle charset detector is good/sufficient, and you will always have the situation in which you have multiple guesses, and you want the user to select which guess, etc.
  • I think it's wrong to add detection directly to (or below) the subtitle demuxers - instead, maybe there should be a function to guess subtitle codec from a list of packets (you could provide a convenience function which does that using the libavformat internal packet queue)
  • the actual subtitle conversion should be somewhere else too, and maybe work on the packets (or you could set it as sub charset option in libavcodec, forgot the option name)

Also, this should probably be discussed on the mailing list. The bug tracker sucks for this purpose.

comment:2 Changed 3 years ago by cehoyos

  • Keywords sub added; subtitles removed
Note: See TracTickets for help on using tickets.