Opened 12 years ago
Closed 12 years ago
#2431 closed enhancement (fixed)
Detect if subtitle streams do not contain valid utf-8
Reported by: | Nick | Owned by: | |
---|---|---|---|
Priority: | wish | Component: | avcodec |
Version: | git-master | Keywords: | sub srt |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
ffmpeg subtitle encoding of special characters does not working correctly (with special characters of German, French, Spanish, etc.)
Problem Description:
Import and encoding of subtitles which containing special characters using ffmpeg from text based *.srt and *.ass-subtitle files into an *.mp4 or *.mkv container does not working correctly!
Special characters from languages like German (e.g. ä/Ä, ö/Ö. ü/Ü, ß), French, Spanish and so on, imported and encoded with ffmpeg as "mov_text" subtitle stream into an mp4 container or as "srt"-subtitle stream into an mkv container are not compatible with different media players and Blu-Ray stand-alone players!
(All tested special characters are included in ISO-8859-1)
I tested the following media players (Windows versions):
VLC 1.1.11
VLC 2.0.5
MPC-HC 1.6.6 (Media Player Classic - Home Cinema)
XBMC 12.0
SMPlayer 0.8.4
and ...
Philips stand-alone Blu-Ray player BDP5180
Philips Blu-Ray Home Cinema System
(Annotation to SMPlayer:
SMPlayer uses as default a new ass library to convert srt subtitles internal to an ass subtitle stream and render that like a normal ass/ssa subtitle. To force SMPlayer into the "normal subtitle mode" activate the option: -> SMPlayer menu "Options" >> "Preferences" >> "Subtitles" >> "Font and colors" and activate "Enable normal subtitles")
Examples for incorrect imported/encoded subtitles with ffmpeg:
Import and encoding of a *.srt subtitle text file as "mov_text" subtitle stream within an mp4 file:
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4
or
Import and encoding of a *.srt subtitle file as "SRT" subtitle stream within an mkv file:
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv
(content of "subtitle_test.srt" see below)
"ASS" subtitle streams into mkv files created by ffmpeg are compatible with most of these players but not with all:
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv
For comparison:
MP4Box creates fully compliant subtitle streams into mp4 containers working with all these media players!
I used MP4Box 0.4.6 (rev2698) to create a compliant subtitle streams (other versions of MP4Box working also).
MP4Box command line:
mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4
If you convert a subtitle streams of an mp4 file created by MP4Box using ffmpeg to another mp4 file or an mkv file, that subtitle stream working also correctly. Example code:
Step 1:
mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4
Step 2:
ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv
My (Philips) stand-alone Blu-Ray player can read "SRT" subtitle streams from mkv files only; therefore I have to create a fully compliant mkv file at the moment in two steps:
mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4 ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv
My little "standard test" to create some test files including a subtitle stream…
For this test I use a Windows 32bit environment with following files:
input.mp4 (a small mp4 file as input, duration ~30-60 seconds, including one AVC/H.264 video stream and one AAC audio stream)
subtitle_test.srt (a subtitle test file with different special characters, content see below)
subtitle_test.bat (a batch file to create all test files automatically, content see below)
ffmpeg.exe (the current version from http://ffmpeg.zeranoe.com/builds/win32/static/)
MP4Box.exe (e.g. http://www.videohelp.com/download/MP4Box-0.4.6-rev2698.zip )
Easy to use: Put all these files together in one folder and run the "subtitle_test.bat".
Then play the created files in a media player and activate the subtitle.
"subtitle_test.srt":
1 00:00:05,000 --> 00:00:14,000 German special characters: Ä/ä, Ö/ö, Ü/ü, ß 2 00:00:15,000 --> 00:00:19,000 French special characters: Æ/æ, À/à, Â/â, È/è, É/é, Ê/ê, Ë/ë, Î/î, Ï/ï, Ô/ô, Ù/ù, Û/û, Ç/ç, Ü/ü, ÿ 3 00:00:20,000 --> 00:00:24,000 Italian special characters: À/à, È/è, É/é, Ò/ò, Ù/ù 4 00:00:25,000 --> 00:00:29,000 Spanish special characters: ¡, ¿, ª, º, Á/á, É/é, Í/í, Ñ/ñ, Ó/ó, Ú/ú, Ü/ü 5 00:00:30,000 --> 00:00:55,000 These are printable characters of ISO-8859-1: (*str >= 32 && *str < 128) || (*str >= 160 && *str <= 255)
"subtitle_test.bat":
@echo on @rem This batch file requires following files: ffmpeg.exe, MP4Box.exe, input.mp4, subtitle_test.srt @echo ============================================================================ ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4 @echo ============================================================================ ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv @echo ============================================================================ ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv @echo ============================================================================ mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4 @echo ============================================================================ ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv @echo ============================================================================ ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv @echo ============================================================================ @pause
Please contact me if you want to get all my original test files in one zip! (all together <4 MB)
Here the output of a complete test run, started with "subtitle_test.bat":
============================================================================ C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4 ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl vs --enable-libxvid --enable-zlib libavutil 52. 24.100 / 52. 24.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 1.100 / 55. 1.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 48.105 / 3. 48.105 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 creation_time : 2013-03-24 21:33:20 encoder : Lavf54.63.100 Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Input #1, srt, from 'subtitle_test.srt': Duration: N/A, bitrate: N/A Stream #1:0: Subtitle: subrip Output #0, mp4, to 'ffmpeg_mov-text.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf55.1.100 Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 12800 tbn, 12800 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Stream #0:2(ger): Subtitle: mov_text ([8][0][0][0] / 0x0008) Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #1:0 -> #0:2 (subrip -> mov_text) Press [q] to stop, [?] for help frame= 1500 fps=0.0 q=-1.0 Lsize= 629kB time=00:01:00.02 bitrate= 85.8kbits/s video:151kB audio:432kB subtitle:0 global headers:0kB muxing overhead 7.752755% ============================================================================ C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl vs --enable-libxvid --enable-zlib libavutil 52. 24.100 / 52. 24.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 1.100 / 55. 1.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 48.105 / 3. 48.105 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 creation_time : 2013-03-24 21:33:20 encoder : Lavf54.63.100 Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Input #1, srt, from 'subtitle_test.srt': Duration: N/A, bitrate: N/A Stream #1:0: Subtitle: subrip Output #0, matroska, to 'ffmpeg_srt-subtitle.mkv': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf55.1.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Stream #0:2(ger): Subtitle: srt Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #1:0 -> #0:2 (subrip -> srt) Press [q] to stop, [?] for help frame= 1500 fps=0.0 q=-1.0 Lsize= 611kB time=00:01:00.02 bitrate= 83.4kbits/s video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.759849% ============================================================================ C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl vs --enable-libxvid --enable-zlib libavutil 52. 24.100 / 52. 24.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 1.100 / 55. 1.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 48.105 / 3. 48.105 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 creation_time : 2013-03-24 21:33:20 encoder : Lavf54.63.100 Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Input #1, srt, from 'subtitle_test.srt': Duration: N/A, bitrate: N/A Stream #1:0: Subtitle: subrip Output #0, matroska, to 'ffmpeg_ass-subtitle.mkv': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf55.1.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s Metadata: creation_time : 2013-03-24 21:33:20 handler_name : SoundHandler Stream #0:2(ger): Subtitle: ssa Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #1:0 -> #0:2 (subrip -> ass) Press [q] to stop, [?] for help frame= 1500 fps=0.0 q=-1.0 Lsize= 612kB time=00:01:00.02 bitrate= 83.5kbits/s video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.759022% ============================================================================ C:\Video>mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4 IsoMedia import - track ID 1 - Video (size 720 x 404) IsoMedia import - track ID 2 - Audio (SR 44100 - 2 channels) Timed Text (SRT) import - text track 720 x 404, font Serif (size 18) Saving mp4box_mov-text.mp4: 0.500 secs Interleaving ============================================================================ C:\Video>ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl vs --enable-libxvid --enable-zlib libavutil 52. 24.100 / 52. 24.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 1.100 / 55. 1.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 48.105 / 3. 48.105 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'mp4box_mov-text.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isom creation_time : 2013-04-03 20:31:53 Duration: 00:01:20.00, start: 0.000000, bitrate: 62 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s Metadata: creation_time : 2013-04-03 20:31:53 handler_name : GPAC ISO Audio Handler Stream #0:2(deu): Subtitle: mov_text (tx3g / 0x67337874) Metadata: creation_time : 2013-04-03 20:31:53 handler_name : Imported with GPAC 0.4.6-DEV-rev Output #0, matroska, to 'mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv': Metadata: major_brand : isom minor_version : 1 compatible_brands: isom encoder : Lavf55.1.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s Metadata: creation_time : 2013-04-03 20:31:53 handler_name : GPAC ISO Audio Handler Stream #0:2(deu): Subtitle: srt Metadata: creation_time : 2013-04-03 20:31:53 handler_name : Imported with GPAC 0.4.6-DEV-rev Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #0:2 -> #0:2 (mov_text -> srt) Press [q] to stop, [?] for help frame= 1500 fps=0.0 q=-1.0 Lsize= 612kB time=00:01:00.02 bitrate= 83.5kbits/s video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.777075% ============================================================================ C:\Video>ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl vs --enable-libxvid --enable-zlib libavutil 52. 24.100 / 52. 24.100 libavcodec 55. 2.100 / 55. 2.100 libavformat 55. 1.100 / 55. 1.100 libavdevice 55. 0.100 / 55. 0.100 libavfilter 3. 48.105 / 3. 48.105 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'mp4box_mov-text.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isom creation_time : 2013-04-03 20:31:53 Duration: 00:01:20.00, start: 0.000000, bitrate: 62 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s Metadata: creation_time : 2013-04-03 20:31:53 handler_name : GPAC ISO Audio Handler Stream #0:2(deu): Subtitle: mov_text (tx3g / 0x67337874) Metadata: creation_time : 2013-04-03 20:31:53 handler_name : Imported with GPAC 0.4.6-DEV-rev Output #0, matroska, to 'mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv': Metadata: major_brand : isom minor_version : 1 compatible_brands: isom encoder : Lavf55.1.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc Metadata: creation_time : 2013-03-24 21:33:20 handler_name : VideoHandler Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s Metadata: creation_time : 2013-04-03 20:31:53 handler_name : GPAC ISO Audio Handler Stream #0:2(deu): Subtitle: ssa Metadata: creation_time : 2013-04-03 20:31:53 handler_name : Imported with GPAC 0.4.6-DEV-rev Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #0:2 -> #0:2 (mov_text -> ass) Press [q] to stop, [?] for help frame= 1500 fps=0.0 q=-1.0 Lsize= 612kB time=00:01:00.02 bitrate= 83.5kbits/s video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.776069% ============================================================================
Attachments (9)
Change History (36)
by , 12 years ago
Attachment: | ffmpeg_subtitle_test.zip added |
---|
comment:1 by , 12 years ago
Keywords: | sub srt added; ffmpeg subtitle encoding of special characters removed |
---|---|
Priority: | important → normal |
Version: | unspecified → git-master |
Is this a regression?
follow-up: 3 comment:2 by , 12 years ago
Your SRT file is not in UTF-8, and you did not specify any -sub_charenc
option.
follow-up: 4 comment:3 by , 12 years ago
Replying to Cigaes:
Your SRT file is not in UTF-8, and you did not specify any
-sub_charenc
option.
Yes, my srt file is not (pre-)encoded in UTF-8!
This srt file is (Windows/ANSI) plain text only, like other srt files created by tools like "Sub Rip" or subtitle files you can download from internet data bases.
(All used special characters in my srt file are included in the windows default code page CP-1252 and ISO-8859-1)
The missing UTF-8 coding of the subtitle stream inside the mp4/mkv container seems the problem!
(If you open the created container files with a Hex-Editor you can see the subtitle stream was not converted to UTF-8.)
Should not detect FFmpeg by itself whether the subtitle source file is plain text only or in UTF-8 and select the right encoding mode automatically?
For the FFmpeg.exe (Windows Build) I did not found any documentation to using the -sub_charenc option and their possible parameters. (Is here also an automatic detection mode available?)
Could you please show me a working command line example using -sub_charenc with ffmpeg.exe?
follow-up: 5 comment:4 by , 12 years ago
Replying to Nick:
Should not detect FFmpeg by itself whether the subtitle source file is plain text only or in UTF-8 and select the right encoding mode automatically?
How?
For the FFmpeg.exe (Windows Build) I did not found any documentation to using the -sub_charenc option
- http://ffmpeg.org/ffmpeg-codecs.html#Codec-Options
- $ ffmpeg -h full
follow-up: 6 comment:5 by , 12 years ago
Replying to cehoyos:
How?
If the source srt/ass text file is codec in UTF-8 it can be detected with the UTF-8 BOM sequence 0xEF,0xBB,0xBF (Byte Order Mark).
If UTF-8 BOM was not found it should be a plain text file which should be converted from CP-1252/ISO-8859-1 to UTF-8 as default.
Replying to cehoyos:
- http://ffmpeg.org/ffmpeg-codecs.html#Codec-Options
- $ ffmpeg -h full
For option -sub_charenc I found only:
-sub_charenc <string> .D...S set input text subtitles character encoding
and
‘sub_charenc encoding (decoding,subtitles)’ Set the input subtitles character encoding.
Seems only a decoder option!(?) What can be used as <encoding string>? cp1252, ...?
Sorry, I do not get it to work with this option. Please show me a working command line example!
comment:6 by , 12 years ago
Replying to Nick:
Replying to cehoyos:
How?
If the source srt/ass text file is codec in UTF-8 it can be detected with the UTF-8 BOM sequence 0xEF,0xBB,0xBF (Byte Order Mark).
Isn't the presence of a Byte Order Mark strictly optional?
comment:7 by , 12 years ago
Yes, it is. And there are dozens of possible encodings besides ISO-8859-1 and windows-1252 (which are not the same thing, although the differences are small).
follow-up: 9 comment:8 by , 12 years ago
You are right, the presence of the UTF-8 BOM is optional but here are different software tools which can detect the right encoding type (meaning ANSI text, UTF-8 with BOM or UTF-8 without BOM but not the code page).
I tested MP4Box with *.srt files in ANSI, UTF-8 and UTF-8 w/o BOM. MP4Box seems to detect the encoding type and create in all three cases the same result! It is possible!
Another example is the open source tool Notepad++, it can also detect the encoding type. Maybe you can find in source code of such tools methods to detect the right encoding type.
ISO-8859-1 and CP-1252 are not exactly the same but the used special characters in my "subtitle_test.srt" are the same in both! Therefore the little comment in my srt file ;-) ...
"These are printable characters of ISO-8859-1:
(*str >= 32 && *str < 128) II (*str >= 160 && *str <= 255)"
... for this range it is exactly the same.
For the most European Languages like French, German, Italian, Spanish and more it is enough to use as default CP-1252 or ISO-8859-1.
More important for the imported subtitle file is the question:
"Is it plain text or is it already UTF-8?"
My proposal to select a default code page for every subtitle stream:
- If no language is defined for the subtitle stream or the language is unknown:
--> use CP-1252 as default (or ISO-8859-1) - If a language is defined (e.g. with -metadata:s:s:0 language=ger):
--> use a selection table to set automatically a code page - If a dedicated code page is selected by an option like "-sub_charenc":
--> use that setting instead of the other ones
comment:9 by , 12 years ago
Replying to Nick:
- If a language is defined (e.g. with -metadata:s:s:0 language=ger):
--> use a selection table to set automatically a code page
Wouldn't this break subtitle files that are UTF-8 encoded?
comment:10 by , 12 years ago
Replying to cehoyos:
Wouldn't this break subtitle files that are UTF-8 encoded?
I meant that only for encoding of (ANSI) plain text into UTF-8. Subtitle streams which are already encoded in UTF-8 should not re-encode again.
--> check if srt file is plain text or UTF-8 --> if plain text only then encode to UTF-8 with selected code page setting
Encoding of ISO-8859-1 to UTF-8 as default case should be simple, see:
http://www.unicode.org/faq/utf_bom.html
http://stackoverflow.com/questions/4059775/convert-iso-8859-1-strings-to-utf-8-in-c-c
by , 12 years ago
Attachment: | subtitle_test.srt added |
---|
follow-up: 12 comment:11 by , 12 years ago
Component: | undetermined → avcodec |
---|---|
Priority: | normal → wish |
Summary: | ffmpeg subtitle encoding of special characters does not working correctly → Detect if subtitle streams do not contain valid utf-8 |
Type: | defect → enhancement |
I still see several problems with your approach, so while it is not sure to get accepted, I guess you could try implementing something:
- Invalid utf-8 files are rare, so not all cases would be covered
- FFmpeg can only scan the first bytes of the subtitle stream to guess the encoding, this works for the file you uploaded, but not in the general case
- What about utf-8 encoded subtitles that contain an error (ie a 0xC0 or 0xC1), they would suddenly be broken and users would report a regression.
Since you know the encoding of your subtitle file, I suggest using -sub_charenc
follow-up: 13 comment:12 by , 12 years ago
Replying to cehoyos:
Since you know the encoding of your subtitle file, I suggest using -sub_charenc
Ok, I know the encoding of my subtitle file but please show me:
How I can import and encode my "subtitle_test.srt" file to an mp4 or mkv file using "-sub_charenc"???
follow-up: 14 comment:13 by , 12 years ago
Replying to Nick:
Replying to cehoyos:
Since you know the encoding of your subtitle file, I suggest using -sub_charenc
Ok, I know the encoding of my subtitle file but please show me:
How I can import and encode my "subtitle_test.srt" file to an mp4 or mkv file using "-sub_charenc"???
As you told me in comment:5 (I didn't know before, I had never tested it): with -sub_charenc cp1252
$ ffmpeg -f lavfi -i testsrc -sub_charenc cp1252 -i subtitle_test.srt -vcodec mpeg4 -scodec mov_text -t 60 out.mov
comment:14 by , 12 years ago
Replying to cehoyos:
As you told me in comment:5 (I didn't know before, I had never tested it): with -sub_charenc cp1252
$ ffmpeg -f lavfi -i testsrc -sub_charenc cp1252 -i subtitle_test.srt -vcodec mpeg4 -scodec mov_text -t 60 out.mov
I tested your command line and I try to adapt that for my test files:
ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.srt -vcodec copy -acodec copy -scodec mov_text output.mp4
In both cases I get following error messages:
C:\Video>ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.srt -vcodec copy -acodec copy -scodec mov_text output.mp4 ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers built on Apr 1 2013 12:44:46 with gcc 4.8.0 (GCC) [...] [subrip @ 024fc540] Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec Input #1, srt, from 'subtitle_test.srt': Duration: N/A, bitrate: N/A Stream #1:0: Subtitle: subrip [subrip @ 024fc540] Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec [...] Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Stream #1:0 -> #0:2 (subrip -> mov_text) Error while opening decoder for input stream #1:0
The same happens if I try to import an ASS subtitle file, like:
ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.ass -vcodec copy -acodec copy -scodec mov_text output.mp4
"Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec" (?) -->
Where I can found a Windows build of ffmpeg.exe built with iconv support for srt/ass?
Would be the activation of the "iconv support" for SRT and ASS subtitle streams/files as default already the solution of my problem? This sounds like a pratical solution!
follow-up: 16 comment:15 by , 12 years ago
If you cannot compile yourself please request a ffmpeg binary with iconv support wherever you downloaded the binary you are using (assuming iconv works on Windows, I don't know).
comment:16 by , 12 years ago
Replying to cehoyos:
If you cannot compile yourself please request a ffmpeg binary with iconv support wherever you downloaded the binary you are using (assuming iconv works on Windows, I don't know).
?
I downloaded this ffmpeg binary from the official download site:
http://ffmpeg.org/download.html --> FFmpeg Windows Builds --> http://ffmpeg.zeranoe.com/builds/
Can it only requested individually or as default for all future "official" Windows binaries?
comment:18 by , 12 years ago
I asked on zeranoe.com to build a version of ffmpeg with --enable-iconv to support option -sub_charenc
A new Windows build of ffmpeg built with --enable-iconv is now available:
http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20130406-git-7775992-win32-static.7z
http://ffmpeg.zeranoe.com/builds/readme/win32/static/ffmpeg-20130406-git-7775992-win32-static-readme.txt
Thanks to zeranoe.com!
I tested this version and now -sub_charenc is working (more or less)!
The (first) characters are converted correctly from Windows ANSI to UTF-8 but now I found another problem. If -sub_charenc option is used to convert an srt file then the last character of every subtitle paragraph is missing or not correct converted! (see screenshot)
It seems a problem of the "subtitles character encoding conversion" function!
This problem seems independent whether the last character is an "ordinary" character or a special character. Furthermore it is independent whether the imported srt file is a Windows/DOS or Unix text file.
Could somebody please check this problem using option "-sub_charenc" with an Linux build of ffmpeg and/or avcodec created with --enable-iconv?
Command line:
ffmpeg -i input.mp4 -sub_charenc CP1252 -i subtitle_test_2.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle_2.mkv
(test files see attachment)
by , 12 years ago
Attachment: | subtitle_test_2.srt added |
---|
by , 12 years ago
Attachment: | subtitle_test_2.bat added |
---|
by , 12 years ago
Attachment: | sub_charenc_parameters.txt added |
---|
by , 12 years ago
Attachment: | subtitle_test_2_screenshots.gif added |
---|
by , 12 years ago
comment:19 by , 12 years ago
I confirm the problem on the last char. A patch has been sent to the devel mailing-list.
comment:20 by , 12 years ago
Thanks, I looking forward to hearing a solution will come!
A short feedback in this ticket when the patch implementation is finished would be nice.
follow-up: 22 comment:21 by , 12 years ago
The problem with the last character is fixed.
Detecting and reporting invalid UTF-8 is still undecided.
comment:22 by , 12 years ago
Replying to Cigaes:
The problem with the last character is fixed.
Detecting and reporting invalid UTF-8 is still undecided.
yes, I understand but it is more important to have a version which works without that bug of -sub_charenc. ffmpeg can now be used to import and encode subtitle files.
For information only: In which source code file was this bug fixed?
by , 12 years ago
Attachment: | last_char_bug-fix.png added |
---|
follow-up: 24 comment:23 by , 12 years ago
The subtitles character encoding conversion works now correctly with the bug fix of the "last char" problem. I tested the last Windows build of FFmpeg and "-sub_charenc" works now as expected! (without the UTF-8 auto detection!)
Windows build version tested:
http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20130408-git-9dc88ac-win32-static.7z
Short summary for other users:
If you want to import a plaintext subtitle file including special characters with ffmpeg/avcodec, then...
- ffmpeg/avcodec must be built with "--enable-iconv"
- option "-sub_charenc [code page]" is needed to enable the subtitles character encoding conversion for subtitle source files
- libavcodec version should be at least libavcodec 55.2.100 of 8th April 2013 or newer
(e.g. import of Windows ANSI *.srt/*.ass/*.ssa subtitle files with German, French or other special characters)
Command line examples
Import a subtitle file (copy video/audio streams, without re-encoding):
ffmpeg -i input.mp4 -sub_charenc ISO8859-1 -i subtitle.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger output.mp4 ffmpeg -i input.mkv -sub_charenc ISO-8859-1 -i subtitle.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=fre output.mkv
Import subtitle and re-encode video/audio streams:
ffmpeg -i input.avi -sub_charenc CP1252 -i subtitle.srt -vcodec h264 -acodec ac3 -scodec ass -metadata:s:s:0 language=ita output2.mkv ffmpeg -i input.mov -sub_charenc WINDOWS-1252 -i subtitle.srt -vcodec mpeg4 -acodec mp3 -scodec mov_text -metadata:s:s:0 language=spa output2.mp4
(code page parameters for option -sub_charenc see attachment)
Additional notice:
Different Philips stand-alone Blue-Ray players and Samsung TV's can read "SRT" subtitle streams in "mkv" files.
follow-up: 25 comment:24 by , 12 years ago
follow-up: 26 comment:25 by , 12 years ago
Replying to cehoyos:
Replying to Nick:
- ffmpeg/avcodec must be built with "--enable-iconv"
This is not correct.
You are sure?
I mean the import of plaintext subtitle files using the subtitles character encoding conversion (-sub_charenc).
Looking in source "...\ffmpeg\libavcodec\utils.c":
#if CONFIG_ICONV iconv_t cd = iconv_open("UTF-8", avctx->sub_charenc); if (cd == (iconv_t)-1) { av_log(avctx, AV_LOG_ERROR, "Unable to open iconv context " "with input character encoding \"%s\"\n", avctx->sub_charenc); ret = AVERROR(errno); goto free_and_end; } iconv_close(cd); #else av_log(avctx, AV_LOG_ERROR, "Character encoding subtitles " "conversion needs a libavcodec built with iconv support " "for this codec\n"); ret = AVERROR(ENOSYS); goto free_and_end; #endif
... and also in function recode_subtitle()
"conversion needs a libavcodec built with iconv support for this codec\n"
and
where is iconv_open() and iconv_close() definied?
(see also error message in comment:14)
comment:26 by , 12 years ago
Replying to Nick:
Replying to cehoyos:
Replying to Nick:
- ffmpeg/avcodec must be built with "--enable-iconv"
This is not correct.
You are sure?
Yes.
iconv is a system library that gets included if present, --enable-iconv is the default and therefore does not have to be specified (it makes no difference).
Subtitle test files (incomplete)