Opened 3 years ago

Closed 3 years ago

#2431 closed enhancement (fixed)

Detect if subtitle streams do not contain valid utf-8

Reported by: Nick Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: sub srt
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

ffmpeg subtitle encoding of special characters does not working correctly (with special characters of German, French, Spanish, etc.)

Problem Description:

Import and encoding of subtitles which containing special characters using ffmpeg from text based *.srt and *.ass-subtitle files into an *.mp4 or *.mkv container does not working correctly!
Special characters from languages like German (e.g. ä/Ä, ö/Ö. ü/Ü, ß), French, Spanish and so on, imported and encoded with ffmpeg as "mov_text" subtitle stream into an mp4 container or as "srt"-subtitle stream into an mkv container are not compatible with different media players and Blu-Ray stand-alone players!
(All tested special characters are included in ISO-8859-1)

I tested the following media players (Windows versions):
VLC 1.1.11
VLC 2.0.5
MPC-HC 1.6.6 (Media Player Classic - Home Cinema)
XBMC 12.0
SMPlayer 0.8.4
and ...
Philips stand-alone Blu-Ray player BDP5180
Philips Blu-Ray Home Cinema System

(Annotation to SMPlayer:
SMPlayer uses as default a new ass library to convert srt subtitles internal to an ass subtitle stream and render that like a normal ass/ssa subtitle. To force SMPlayer into the "normal subtitle mode" activate the option: -> SMPlayer menu "Options" >> "Preferences" >> "Subtitles" >> "Font and colors" and activate "Enable normal subtitles")

Examples for incorrect imported/encoded subtitles with ffmpeg:

Import and encoding of a *.srt subtitle text file as "mov_text" subtitle stream within an mp4 file:

ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4

or
Import and encoding of a *.srt subtitle file as "SRT" subtitle stream within an mkv file:

ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv

(content of "subtitle_test.srt" see below)


"ASS" subtitle streams into mkv files created by ffmpeg are compatible with most of these players but not with all:

ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv


For comparison:

MP4Box creates fully compliant subtitle streams into mp4 containers working with all these media players!
I used MP4Box 0.4.6 (rev2698) to create a compliant subtitle streams (other versions of MP4Box working also).
MP4Box command line:

mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4

If you convert a subtitle streams of an mp4 file created by MP4Box using ffmpeg to another mp4 file or an mkv file, that subtitle stream working also correctly. Example code:

Step 1:

mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4

Step 2:

ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv
ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv


My (Philips) stand-alone Blu-Ray player can read "SRT" subtitle streams from mkv files only; therefore I have to create a fully compliant mkv file at the moment in two steps:

mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4

ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv


My little "standard test" to create some test files including a subtitle stream…

For this test I use a Windows 32bit environment with following files:

input.mp4 (a small mp4 file as input, duration ~30-60 seconds, including one AVC/H.264 video stream and one AAC audio stream)
subtitle_test.srt (a subtitle test file with different special characters, content see below)
subtitle_test.bat (a batch file to create all test files automatically, content see below)
ffmpeg.exe (the current version from http://ffmpeg.zeranoe.com/builds/win32/static/)
MP4Box.exe (e.g. http://www.videohelp.com/download/MP4Box-0.4.6-rev2698.zip )

Easy to use: Put all these files together in one folder and run the "subtitle_test.bat".
Then play the created files in a media player and activate the subtitle.

"subtitle_test.srt":

1
00:00:05,000 --> 00:00:14,000
German special characters: 
Ä/ä, Ö/ö, Ü/ü, ß

2
00:00:15,000 --> 00:00:19,000
French special characters: 
Æ/æ, À/à, Â/â, È/è, É/é, Ê/ê, Ë/ë, 
Î/î, Ï/ï, Ô/ô, Ù/ù, Û/û, Ç/ç, Ü/ü, ÿ

3
00:00:20,000 --> 00:00:24,000
Italian special characters: 
À/à, È/è, É/é, Ò/ò, Ù/ù

4
00:00:25,000 --> 00:00:29,000
Spanish special characters: 
¡, ¿, ª, º, Á/á, É/é, Í/í, Ñ/ñ, Ó/ó, Ú/ú, Ü/ü

5
00:00:30,000 --> 00:00:55,000
These are printable characters of ISO-8859-1:
(*str >= 32 && *str < 128) || (*str >= 160 && *str <= 255)

"subtitle_test.bat":

@echo on
@rem This batch file requires following files: ffmpeg.exe, MP4Box.exe, input.mp4, subtitle_test.srt

@echo ============================================================================
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4

@echo ============================================================================
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv

@echo ============================================================================
ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv

@echo ============================================================================
mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4

@echo ============================================================================
ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv

@echo ============================================================================
ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv

@echo ============================================================================
@pause

Please contact me if you want to get all my original test files in one zip! (all together <4 MB)

Here the output of a complete test run, started with "subtitle_test.bat":

============================================================================

C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger ffmpeg_mov-text.mp4

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena
bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl
vs --enable-libxvid --enable-zlib
  libavutil      52. 24.100 / 52. 24.100
  libavcodec     55.  2.100 / 55.  2.100
  libavformat    55.  1.100 / 55.  1.100
  libavdevice    55.  0.100 / 55.  0.100
  libavfilter     3. 48.105 /  3. 48.105
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2013-03-24 21:33:20
    encoder         : Lavf54.63.100
  Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
Input #1, srt, from 'subtitle_test.srt':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Subtitle: subrip
Output #0, mp4, to 'ffmpeg_mov-text.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf55.1.100
    Stream #0:0(und): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 12800 tbn, 12800 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
    Stream #0:2(ger): Subtitle: mov_text ([8][0][0][0] / 0x0008)
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (subrip -> mov_text)
Press [q] to stop, [?] for help
frame= 1500 fps=0.0 q=-1.0 Lsize=     629kB time=00:01:00.02 bitrate=  85.8kbits/s
video:151kB audio:432kB subtitle:0 global headers:0kB muxing overhead 7.752755%
============================================================================

C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle.mkv

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena
bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl
vs --enable-libxvid --enable-zlib
  libavutil      52. 24.100 / 52. 24.100
  libavcodec     55.  2.100 / 55.  2.100
  libavformat    55.  1.100 / 55.  1.100
  libavdevice    55.  0.100 / 55.  0.100
  libavfilter     3. 48.105 /  3. 48.105
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2013-03-24 21:33:20
    encoder         : Lavf54.63.100
  Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
Input #1, srt, from 'subtitle_test.srt':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Subtitle: subrip
Output #0, matroska, to 'ffmpeg_srt-subtitle.mkv':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf55.1.100
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
    Stream #0:2(ger): Subtitle: srt
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (subrip -> srt)
Press [q] to stop, [?] for help
frame= 1500 fps=0.0 q=-1.0 Lsize=     611kB time=00:01:00.02 bitrate=  83.4kbits/s
video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.759849%
============================================================================

C:\Video>ffmpeg -i input.mp4 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 ass -metadata:s:s:0 language=ger ffmpeg_ass-subtitle.mkv

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena
bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl
vs --enable-libxvid --enable-zlib
  libavutil      52. 24.100 / 52. 24.100
  libavcodec     55.  2.100 / 55.  2.100
  libavformat    55.  1.100 / 55.  1.100
  libavdevice    55.  0.100 / 55.  0.100
  libavfilter     3. 48.105 /  3. 48.105
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2013-03-24 21:33:20
    encoder         : Lavf54.63.100
  Duration: 00:01:00.02, start: 0.023220, bitrate: 85 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
Input #1, srt, from 'subtitle_test.srt':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Subtitle: subrip
Output #0, matroska, to 'ffmpeg_ass-subtitle.mkv':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf55.1.100
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : SoundHandler
    Stream #0:2(ger): Subtitle: ssa
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (subrip -> ass)
Press [q] to stop, [?] for help
frame= 1500 fps=0.0 q=-1.0 Lsize=     612kB time=00:01:00.02 bitrate=  83.5kbits/s
video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.759022%
============================================================================

C:\Video>mp4box -add input.mp4#video -add input.mp4#audio -add subtitle_test.srt:lang=de -new mp4box_mov-text.mp4

IsoMedia import - track ID 1 - Video (size 720 x 404)
IsoMedia import - track ID 2 - Audio (SR 44100 - 2 channels)
Timed Text (SRT) import - text track 720 x 404, font Serif (size 18)
Saving mp4box_mov-text.mp4: 0.500 secs Interleaving
============================================================================

C:\Video>ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 srt mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena
bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl
vs --enable-libxvid --enable-zlib
  libavutil      52. 24.100 / 52. 24.100
  libavcodec     55.  2.100 / 55.  2.100
  libavformat    55.  1.100 / 55.  1.100
  libavdevice    55.  0.100 / 55.  0.100
  libavfilter     3. 48.105 /  3. 48.105
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'mp4box_mov-text.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isom
    creation_time   : 2013-04-03 20:31:53
  Duration: 00:01:20.00, start: 0.000000, bitrate: 62 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : GPAC ISO Audio Handler
    Stream #0:2(deu): Subtitle: mov_text (tx3g / 0x67337874)
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : Imported with GPAC 0.4.6-DEV-rev
Output #0, matroska, to 'mp4box_mov-text_converted-by-ffmpeg-to-srt.mkv':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isom
    encoder         : Lavf55.1.100
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : GPAC ISO Audio Handler
    Stream #0:2(deu): Subtitle: srt
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : Imported with GPAC 0.4.6-DEV-rev
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (mov_text -> srt)
Press [q] to stop, [?] for help
frame= 1500 fps=0.0 q=-1.0 Lsize=     612kB time=00:01:00.02 bitrate=  83.5kbits/s
video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.777075%
============================================================================

C:\Video>ffmpeg -i mp4box_mov-text.mp4 -map 0 -c copy -c:s:0 ass mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab
bcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena
bschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enabl
vs --enable-libxvid --enable-zlib
  libavutil      52. 24.100 / 52. 24.100
  libavcodec     55.  2.100 / 55.  2.100
  libavformat    55.  1.100 / 55.  1.100
  libavdevice    55.  0.100 / 55.  0.100
  libavfilter     3. 48.105 /  3. 48.105
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'mp4box_mov-text.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isom
    creation_time   : 2013-04-03 20:31:53
  Duration: 00:01:20.00, start: 0.000000, bitrate: 62 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], 20 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 58 kb/s
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : GPAC ISO Audio Handler
    Stream #0:2(deu): Subtitle: mov_text (tx3g / 0x67337874)
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : Imported with GPAC 0.4.6-DEV-rev
Output #0, matroska, to 'mp4box_mov-text_converted-by-ffmpeg-to-ass.mkv':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isom
    encoder         : Lavf55.1.100
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 720x404 [SAR 1:1 DAR 180:101], q=2-31, 20 kb/s, 25 fps, 1k tbn, 12800 tbc
    Metadata:
      creation_time   : 2013-03-24 21:33:20
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, 58 kb/s
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : GPAC ISO Audio Handler
    Stream #0:2(deu): Subtitle: ssa
    Metadata:
      creation_time   : 2013-04-03 20:31:53
      handler_name    : Imported with GPAC 0.4.6-DEV-rev
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (mov_text -> ass)
Press [q] to stop, [?] for help
frame= 1500 fps=0.0 q=-1.0 Lsize=     612kB time=00:01:00.02 bitrate=  83.5kbits/s
video:151kB audio:432kB subtitle:1 global headers:0kB muxing overhead 4.776069%
============================================================================

Attachments (9)

ffmpeg_subtitle_test.zip (2.2 MB) - added by Nick 3 years ago.
Subtitle test files (incomplete)
subtitle_test.srt (570 bytes) - added by cehoyos 3 years ago.
subtitle_test.ass (984 bytes) - added by Nick 3 years ago.
ASS subtitle test file
subtitle_test_2.srt (741 bytes) - added by Nick 3 years ago.
subtitle_test_2.bat (1.1 KB) - added by Nick 3 years ago.
sub_charenc_parameters.txt (3.8 KB) - added by Nick 3 years ago.
subtitle_test_2_screenshots.gif (41.0 KB) - added by Nick 3 years ago.
input.mp4 (627.6 KB) - added by Nick 3 years ago.
last_char_bug-fix.png (24.3 KB) - added by Nick 3 years ago.

Change History (36)

Changed 3 years ago by Nick

Subtitle test files (incomplete)

comment:1 Changed 3 years ago by cehoyos

  • Keywords sub srt added; ffmpeg subtitle encoding of special characters removed
  • Priority changed from important to normal
  • Version changed from unspecified to git-master

Is this a regression?

comment:2 follow-up: Changed 3 years ago by Cigaes

Your SRT file is not in UTF-8, and you did not specify any -sub_charenc option.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by Nick

Replying to Cigaes:

Your SRT file is not in UTF-8, and you did not specify any -sub_charenc option.


Yes, my srt file is not (pre-)encoded in UTF-8!
This srt file is (Windows/ANSI) plain text only, like other srt files created by tools like "Sub Rip" or subtitle files you can download from internet data bases.
(All used special characters in my srt file are included in the windows default code page CP-1252 and ISO-8859-1)

The missing UTF-8 coding of the subtitle stream inside the mp4/mkv container seems the problem!
(If you open the created container files with a Hex-Editor you can see the subtitle stream was not converted to UTF-8.)

Should not detect FFmpeg by itself whether the subtitle source file is plain text only or in UTF-8 and select the right encoding mode automatically?

For the FFmpeg.exe (Windows Build) I did not found any documentation to using the -sub_charenc option and their possible parameters. (Is here also an automatic detection mode available?)
Could you please show me a working command line example using -sub_charenc with ffmpeg.exe?

comment:4 in reply to: ↑ 3 ; follow-up: Changed 3 years ago by cehoyos

Replying to Nick:

Should not detect FFmpeg by itself whether the subtitle source file is plain text only or in UTF-8 and select the right encoding mode automatically?

How?

For the FFmpeg.exe (Windows Build) I did not found any documentation to using the -sub_charenc option

comment:5 in reply to: ↑ 4 ; follow-up: Changed 3 years ago by Nick

Replying to cehoyos:

How?

If the source srt/ass text file is codec in UTF-8 it can be detected with the UTF-8 BOM sequence 0xEF,0xBB,0xBF (Byte Order Mark).
If UTF-8 BOM was not found it should be a plain text file which should be converted from CP-1252/ISO-8859-1 to UTF-8 as default.


Replying to cehoyos:

For option -sub_charenc I found only:

 -sub_charenc       <string>     .D...S set input text subtitles character encoding

and

‘sub_charenc encoding (decoding,subtitles)’

    Set the input subtitles character encoding.

Seems only a decoder option!(?) What can be used as <encoding string>? cp1252, ...?
Sorry, I do not get it to work with this option. Please show me a working command line example!

comment:6 in reply to: ↑ 5 Changed 3 years ago by cehoyos

Replying to Nick:

Replying to cehoyos:

How?

If the source srt/ass text file is codec in UTF-8 it can be detected with the UTF-8 BOM sequence 0xEF,0xBB,0xBF (Byte Order Mark).

Isn't the presence of a Byte Order Mark strictly optional?

comment:7 Changed 3 years ago by Cigaes

Yes, it is. And there are dozens of possible encodings besides ISO-8859-1 and windows-1252 (which are not the same thing, although the differences are small).

comment:8 follow-up: Changed 3 years ago by Nick

You are right, the presence of the UTF-8 BOM is optional but here are different software tools which can detect the right encoding type (meaning ANSI text, UTF-8 with BOM or UTF-8 without BOM but not the code page).
I tested MP4Box with *.srt files in ANSI, UTF-8 and UTF-8 w/o BOM. MP4Box seems to detect the encoding type and create in all three cases the same result! It is possible!
Another example is the open source tool Notepad++, it can also detect the encoding type. Maybe you can find in source code of such tools methods to detect the right encoding type.

ISO-8859-1 and CP-1252 are not exactly the same but the used special characters in my "subtitle_test.srt" are the same in both! Therefore the little comment in my srt file ;-) ...
"These are printable characters of ISO-8859-1:
(*str >= 32 && *str < 128) II (*str >= 160 && *str <= 255)"

... for this range it is exactly the same.

For the most European Languages like French, German, Italian, Spanish and more it is enough to use as default CP-1252 or ISO-8859-1.
More important for the imported subtitle file is the question:
"Is it plain text or is it already UTF-8?"


My proposal to select a default code page for every subtitle stream:

  • If no language is defined for the subtitle stream or the language is unknown:
    --> use CP-1252 as default (or ISO-8859-1)
  • If a language is defined (e.g. with -metadata:s:s:0 language=ger):
    --> use a selection table to set automatically a code page
  • If a dedicated code page is selected by an option like "-sub_charenc":
    --> use that setting instead of the other ones

comment:9 in reply to: ↑ 8 Changed 3 years ago by cehoyos

Replying to Nick:

  • If a language is defined (e.g. with -metadata:s:s:0 language=ger):
    --> use a selection table to set automatically a code page

Wouldn't this break subtitle files that are UTF-8 encoded?

comment:10 Changed 3 years ago by Nick

Replying to cehoyos:

Wouldn't this break subtitle files that are UTF-8 encoded?


I meant that only for encoding of (ANSI) plain text into UTF-8. Subtitle streams which are already encoded in UTF-8 should not re-encode again.
--> check if srt file is plain text or UTF-8 --> if plain text only then encode to UTF-8 with selected code page setting

Encoding of ISO-8859-1 to UTF-8 as default case should be simple, see:
http://www.unicode.org/faq/utf_bom.html
http://stackoverflow.com/questions/4059775/convert-iso-8859-1-strings-to-utf-8-in-c-c

Changed 3 years ago by cehoyos

comment:11 follow-up: Changed 3 years ago by cehoyos

  • Component changed from undetermined to avcodec
  • Priority changed from normal to wish
  • Summary changed from ffmpeg subtitle encoding of special characters does not working correctly to Detect if subtitle streams do not contain valid utf-8
  • Type changed from defect to enhancement

I still see several problems with your approach, so while it is not sure to get accepted, I guess you could try implementing something:

  • Invalid utf-8 files are rare, so not all cases would be covered
  • FFmpeg can only scan the first bytes of the subtitle stream to guess the encoding, this works for the file you uploaded, but not in the general case
  • What about utf-8 encoded subtitles that contain an error (ie a 0xC0 or 0xC1), they would suddenly be broken and users would report a regression.

Since you know the encoding of your subtitle file, I suggest using -sub_charenc

comment:12 in reply to: ↑ 11 ; follow-up: Changed 3 years ago by Nick

Replying to cehoyos:

Since you know the encoding of your subtitle file, I suggest using -sub_charenc


Ok, I know the encoding of my subtitle file but please show me:
How I can import and encode my "subtitle_test.srt" file to an mp4 or mkv file using "-sub_charenc"???

comment:13 in reply to: ↑ 12 ; follow-up: Changed 3 years ago by cehoyos

Replying to Nick:

Replying to cehoyos:

Since you know the encoding of your subtitle file, I suggest using -sub_charenc

Ok, I know the encoding of my subtitle file but please show me:
How I can import and encode my "subtitle_test.srt" file to an mp4 or mkv file using "-sub_charenc"???

As you told me in comment:5 (I didn't know before, I had never tested it): with -sub_charenc cp1252
$ ffmpeg -f lavfi -i testsrc -sub_charenc cp1252 -i subtitle_test.srt -vcodec mpeg4 -scodec mov_text -t 60 out.mov

comment:14 in reply to: ↑ 13 Changed 3 years ago by Nick

Replying to cehoyos:

As you told me in comment:5 (I didn't know before, I had never tested it): with -sub_charenc cp1252
$ ffmpeg -f lavfi -i testsrc -sub_charenc cp1252 -i subtitle_test.srt -vcodec mpeg4 -scodec mov_text -t 60 out.mov


I tested your command line and I try to adapt that for my test files:

ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.srt -vcodec copy -acodec copy -scodec mov_text output.mp4

In both cases I get following error messages:

C:\Video>ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.srt -vcodec copy -acodec copy -scodec mov_text output.mp4

ffmpeg version N-51511-g599866f Copyright (c) 2000-2013 the FFmpeg developers
  built on Apr  1 2013 12:44:46 with gcc 4.8.0 (GCC)

[...]
[subrip @ 024fc540] Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec
Input #1, srt, from 'subtitle_test.srt':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Subtitle: subrip
[subrip @ 024fc540] Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec

[...]
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (subrip -> mov_text)
Error while opening decoder for input stream #1:0

The same happens if I try to import an ASS subtitle file, like:
ffmpeg -i input.mp4 -sub_charenc cp1252 -i subtitle_test.ass -vcodec copy -acodec copy -scodec mov_text output.mp4

"Character encoding subtitles conversion needs a libavcodec built with iconv support for this codec" (?) -->
Where I can found a Windows build of ffmpeg.exe built with iconv support for srt/ass?

Would be the activation of the "iconv support" for SRT and ASS subtitle streams/files as default already the solution of my problem? This sounds like a pratical solution!

Changed 3 years ago by Nick

ASS subtitle test file

comment:15 follow-up: Changed 3 years ago by cehoyos

If you cannot compile yourself please request a ffmpeg binary with iconv support wherever you downloaded the binary you are using (assuming iconv works on Windows, I don't know).

comment:16 in reply to: ↑ 15 Changed 3 years ago by Nick

Replying to cehoyos:

If you cannot compile yourself please request a ffmpeg binary with iconv support wherever you downloaded the binary you are using (assuming iconv works on Windows, I don't know).

?
I downloaded this ffmpeg binary from the official download site:
http://ffmpeg.org/download.html --> FFmpeg Windows Builds --> http://ffmpeg.zeranoe.com/builds/

Can it only requested individually or as default for all future "official" Windows binaries?

comment:17 Changed 3 years ago by cehoyos

I don't know, please ask in the Zeranoe forum.

comment:18 Changed 3 years ago by Nick

I asked on zeranoe.com to build a version of ffmpeg with --enable-iconv to support option -sub_charenc
A new Windows build of ffmpeg built with --enable-iconv is now available:
http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20130406-git-7775992-win32-static.7z
http://ffmpeg.zeranoe.com/builds/readme/win32/static/ffmpeg-20130406-git-7775992-win32-static-readme.txt
Thanks to zeranoe.com!

I tested this version and now -sub_charenc is working (more or less)!
The (first) characters are converted correctly from Windows ANSI to UTF-8 but now I found another problem. If -sub_charenc option is used to convert an srt file then the last character of every subtitle paragraph is missing or not correct converted! (see screenshot)
It seems a problem of the "subtitles character encoding conversion" function!
This problem seems independent whether the last character is an "ordinary" character or a special character. Furthermore it is independent whether the imported srt file is a Windows/DOS or Unix text file.

Could somebody please check this problem using option "-sub_charenc" with an Linux build of ffmpeg and/or avcodec created with --enable-iconv?

Command line:

ffmpeg -i input.mp4 -sub_charenc CP1252 -i subtitle_test_2.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=ger ffmpeg_srt-subtitle_2.mkv

(test files see attachment)

Changed 3 years ago by Nick

Changed 3 years ago by Nick

Changed 3 years ago by Nick

Changed 3 years ago by Nick

Changed 3 years ago by Nick

comment:19 Changed 3 years ago by Cigaes

I confirm the problem on the last char. A patch has been sent to the devel mailing-list.

comment:20 Changed 3 years ago by Nick

Thanks, I looking forward to hearing a solution will come!
A short feedback in this ticket when the patch implementation is finished would be nice.

comment:21 follow-up: Changed 3 years ago by Cigaes

The problem with the last character is fixed.

Detecting and reporting invalid UTF-8 is still undecided.

comment:22 in reply to: ↑ 21 Changed 3 years ago by Nick

Replying to Cigaes:

The problem with the last character is fixed.

Detecting and reporting invalid UTF-8 is still undecided.

yes, I understand but it is more important to have a version which works without that bug of -sub_charenc. ffmpeg can now be used to import and encode subtitle files.

For information only: In which source code file was this bug fixed?

Changed 3 years ago by Nick

comment:23 follow-up: Changed 3 years ago by Nick

The subtitles character encoding conversion works now correctly with the bug fix of the "last char" problem. I tested the last Windows build of FFmpeg and "-sub_charenc" works now as expected! (without the UTF-8 auto detection!)

Windows build version tested:
http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20130408-git-9dc88ac-win32-static.7z


Short summary for other users:

If you want to import a plaintext subtitle file including special characters with ffmpeg/avcodec, then...

  • ffmpeg/avcodec must be built with "--enable-iconv"
  • option "-sub_charenc [code page]" is needed to enable the subtitles character encoding conversion for subtitle source files
  • libavcodec version should be at least libavcodec 55.2.100 of 8th April 2013 or newer

(e.g. import of Windows ANSI *.srt/*.ass/*.ssa subtitle files with German, French or other special characters)

Command line examples

Import a subtitle file (copy video/audio streams, without re-encoding):

ffmpeg -i input.mp4 -sub_charenc ISO8859-1 -i subtitle.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=ger output.mp4

ffmpeg -i input.mkv -sub_charenc ISO-8859-1 -i subtitle.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 srt -metadata:s:s:0 language=fre output.mkv

Import subtitle and re-encode video/audio streams:

ffmpeg -i input.avi -sub_charenc CP1252 -i subtitle.srt -vcodec h264 -acodec ac3 -scodec ass -metadata:s:s:0 language=ita output2.mkv

ffmpeg -i input.mov -sub_charenc WINDOWS-1252 -i subtitle.srt -vcodec mpeg4 -acodec mp3 -scodec mov_text -metadata:s:s:0 language=spa output2.mp4

(code page parameters for option -sub_charenc see attachment)

Additional notice:
Different Philips stand-alone Blue-Ray players and Samsung TV's can read "SRT" subtitle streams in "mkv" files.

comment:24 in reply to: ↑ 23 ; follow-up: Changed 3 years ago by cehoyos

Replying to Nick:

  • ffmpeg/avcodec must be built with "--enable-iconv"

This is not correct.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 3 years ago by Nick

Replying to cehoyos:

Replying to Nick:

  • ffmpeg/avcodec must be built with "--enable-iconv"

This is not correct.

You are sure?
I mean the import of plaintext subtitle files using the subtitles character encoding conversion (-sub_charenc).

Looking in source "...\ffmpeg\libavcodec\utils.c":

#if CONFIG_ICONV
                    iconv_t cd = iconv_open("UTF-8", avctx->sub_charenc);
                    if (cd == (iconv_t)-1) {
                        av_log(avctx, AV_LOG_ERROR, "Unable to open iconv context "
                               "with input character encoding \"%s\"\n", avctx->sub_charenc);
                        ret = AVERROR(errno);
                        goto free_and_end;
                    }
                    iconv_close(cd);
#else
                    av_log(avctx, AV_LOG_ERROR, "Character encoding subtitles "
                           "conversion needs a libavcodec built with iconv support "
                           "for this codec\n");
                    ret = AVERROR(ENOSYS);
                    goto free_and_end;
#endif

... and also in function recode_subtitle()

"conversion needs a libavcodec built with iconv support for this codec\n"
and
where is iconv_open() and iconv_close() definied?

(see also error message in comment:14)

comment:26 in reply to: ↑ 25 Changed 3 years ago by cehoyos

Replying to Nick:

Replying to cehoyos:

Replying to Nick:

  • ffmpeg/avcodec must be built with "--enable-iconv"

This is not correct.

You are sure?

Yes.
iconv is a system library that gets included if present, --enable-iconv is the default and therefore does not have to be specified (it makes no difference).

comment:27 Changed 3 years ago by cehoyos

  • Resolution set to fixed
  • Status changed from new to closed

Fixed by Nicolas George.

Note: See TracTickets for help on using tickets.