Opened 7 years ago

Closed 5 years ago

Last modified 20 months ago

#6021 closed defect (fixed)

tx3g / mov_text subtitles are not encoded correctly in some specific cases

Reported by: erikbs Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: utf8 mov_text
Cc: erikbs, rwijnsma@xs4all.nl Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Consider the following command:

ffmpeg -i input.mp4 -i input.srt -c:a copy -c:v copy -c:s mov_text output.mp4

for converting SRT subtitles into 3GPP timed text (TTXT) and embedding them inside an MPEG4 container. Until recently, ffmpeg ignored any formatting/styling in the SRT file and just converted the raw text and timestamps instead. That produced files that had no problems.

In SRT files, the start and end points of the text to be formatted are determined by tags, e.g. <i> and </i>. In TTXT subtitles, the start and end points are instead saved as numbers. Currently ffmpeg measures these values in bytes, but it looks like they should be measured in characters instead. For example, the Chinese character 我 consists of three bytes, but is considered a single character.

The problem arises when I try to convert an SRT where a line contains multibyte characters and a formatted string, and there are less than X characters between the formatted string and the end of the line, where X is the difference between the length of the line in bytes and the length in characters. Take for example this SRT line:

1
00:00:01,000 --> 00:00:02,000
The character 我 consists of three bytes
<i>this string will cause problems</i>

Measured in characters, the formatted string starts at position 40 and ends at position 71 (i.e. the character at position 71 is not part of the string). Measured in bytes (excluding the tags of course), the string starts at position 42 and ends at position 73, which are the values ffmpeg stores inside the output file. When I open this file in QuickTime Player on Mac OS X, it seems to expect that these numbers be measured in characters. Since 我 is counted as one character, it proceeds to read off the end of the line (i.e. past the 73rd byte), resulting in an instant, brutal crash. VLC, which appears to handle errors better, either tries to correct the error or just ignores the formatting when invalid data is found.

I used MP4Box to convert the SRT to TTXT and to extract the TTXT from the MP4 generated by ffmpeg using

mp4box -ttxt input.srt # convert SRT to TTXT
mp4box -ttxt 3 output.mp4 # extract the third stream (subtitles)

When I compared the output files, it immediately became clear that MP4Box counts characters while ffmpeg counts bytes. During testing I was able to confirm that VLC counts in the same way as MP4Box and QuickTime: in characters, not bytes. It should also be mentioned that the standalone TTXT files, which are XML files, contain the properties fromChar and toChar, further indicating that we should count characters and not bytes. When stored inside an MPEG4 container, the TTXT files are “compressed” into some binary format I do not fully understand instead of using XML style tags. By replacing the correct bytes in the file produced by ffmpeg (byte count --> character count) using a hex editor, the file played correctly in VLC and QuickTime (with the correct letters italicized), and I also got MP4Box to extract an SRT that looked correct (without the hex editing, the SRT generated by MP4Box from the file produced by ffmpeg had the tags in wrong place).

The erroneous data seems to be written by the function encode_styl in the file libavcodec/movtextenc.c, at line 108-109 to be precise. Here the raw byte positions are written to the files. These are passed to the function through an MovTextContext struct, which has a member called style_attributes – an array where each element corresponds to a formatted string. Each element in this array is another struct, having members such as style_start and style_end. At the moment I have no idea where these values are produced, but the encode_styl function writes them to the file.

To correct the bug, the code that counts bytes and writes these values to the struct that is eventually passed to encode_styl should be corrected, so that it counts characters instead. I guess ffmpeg already has code for counting utf8 characters, but if it has not, then this article presents various ways of doing it.

Change History (20)

comment:1 by erikbs, 7 years ago

Version: 3.2.1git-master

comment:2 by erikbs, 7 years ago

Okay, so I took a closer look at the TTXT format documenation, and it clearly states that text lengths are counted in characters (not bytes). (Note that this document only touches the XML form. In binary form, the length of the different blocks (text, style etc.) appear to be counted in bytes.)

I also played around with the code, and think I have found a solution:

  1. In the MovTextContext struct, introduce a new member variable for the text position as characters (without replacing the existing text_pos, which is measured in bytes and needed for storing the length of the blocks)
  2. Increase the new variable by counting UTF8 characters instead of bytes (according to the format documentation, only UTF8 is supported anyway). A simple for+if will do (loop through all bytes and only count those where the two most significant bits are not 10)
  3. Replace all instances of text_pos with the new variable in code regarding style.

My modified code produces files that play correctly in QuickTime and contain subtitle streams equivalent to what MP4Box produces (MP4Box also sets height/width, ffmpeg does not, but this does not seem to matter). I have tested it on subtitles with nested styles too. My modifications may not be the prettiest, but the code works as far as I can see. I have not touched movtextdec.c, so when extracting TTXT subtitles from an MP4, ffmpeg generates SRT files with misplaced style tags.

For movtextenc.c I have created a patch. Any help with submitting it is highly appreciated. Me + git = disaster, and I do not even know if it is in the correct format (I used diff -u file1 file2):

--- movtextenc.c	2016-02-26 21:15:02.000000000 +0000
+++ movtextenc.c	2016-12-15 02:29:03.000000000 +0000
@@ -70,6 +70,7 @@
     uint8_t style_fontsize;
     uint32_t style_color;
     uint16_t text_pos;
+    uint16_t text_pos_char;
 } MovTextContext;
 
 typedef struct {
@@ -216,10 +217,10 @@
             }
 
             s->style_attributes_temp->style_flag = 0;
-            s->style_attributes_temp->style_start = AV_RB16(&s->text_pos);
+            s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_char);
         } else {
             if (s->style_attributes_temp->style_flag) { //break the style record here and start a new one
-                s->style_attributes_temp->style_end = AV_RB16(&s->text_pos);
+                s->style_attributes_temp->style_end = AV_RB16(&s->text_pos_char);
                 av_dynarray_add(&s->style_attributes, &s->count, s->style_attributes_temp);
                 s->style_attributes_temp = av_malloc(sizeof(*s->style_attributes_temp));
                 if (!s->style_attributes_temp) {
@@ -230,10 +231,10 @@
                 }
 
                 s->style_attributes_temp->style_flag = s->style_attributes[s->count - 1]->style_flag;
-                s->style_attributes_temp->style_start = AV_RB16(&s->text_pos);
+                s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_char);
             } else {
                 s->style_attributes_temp->style_flag = 0;
-                s->style_attributes_temp->style_start = AV_RB16(&s->text_pos);
+                s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_char);
             }
         }
         switch (style){
@@ -248,7 +249,7 @@
             break;
         }
     } else {
-        s->style_attributes_temp->style_end = AV_RB16(&s->text_pos);
+        s->style_attributes_temp->style_end = AV_RB16(&s->text_pos_char);
         av_dynarray_add(&s->style_attributes, &s->count, s->style_attributes_temp);
 
         s->style_attributes_temp = av_malloc(sizeof(*s->style_attributes_temp));
@@ -273,7 +274,7 @@
             break;
         }
         if (s->style_attributes_temp->style_flag) { //start of new style record
-            s->style_attributes_temp->style_start = AV_RB16(&s->text_pos);
+            s->style_attributes_temp->style_start = AV_RB16(&s->text_pos_char);
         }
     }
     s->box_flags |= STYL_BOX;
@@ -284,11 +285,11 @@
     MovTextContext *s = priv;
     if (color_id == 2) {    //secondary color changes
         if (s->box_flags & HLIT_BOX) {  //close tag
-            s->hlit.end = AV_RB16(&s->text_pos);
+            s->hlit.end = AV_RB16(&s->text_pos_char);
         } else {
             s->box_flags |= HCLR_BOX;
             s->box_flags |= HLIT_BOX;
-            s->hlit.start = AV_RB16(&s->text_pos);
+            s->hlit.start = AV_RB16(&s->text_pos_char);
             s->hclr.color = color | (0xFF << 24);  //set alpha value to FF
         }
     }
@@ -302,7 +303,10 @@
 {
     MovTextContext *s = priv;
     av_bprint_append_data(&s->buffer, text, len);
-    s->text_pos += len;
+    s->text_pos += len; // length of text in bytes
+    for (int i = 0; i < len; i++) // length of text in characters
+        if ((text[i] & 0xC0) != 0x80)
+            s->text_pos_char ++;
 }
 
 static void mov_text_new_line_cb(void *priv, int forced)
@@ -310,6 +314,7 @@
     MovTextContext *s = priv;
     av_bprint_append_data(&s->buffer, "\n", 1);
     s->text_pos += 1;
+    s->text_pos_char += 1;
 }
 
 static const ASSCodesCallbacks mov_text_callbacks = {
@@ -328,6 +333,7 @@
     size_t j;
 
     s->text_pos = 0;
+    s->text_pos_char = 0;
     s->count = 0;
     s->box_flags = 0;
     s->style_entries = 0;

comment:3 by Carl Eugen Hoyos, 7 years ago

Please commit your patch locally with git commit (mention the ticket number please), produce a patch file with git format-patch and send the patch file to the development mailing list. You can undo your local commit with git reset HEAD^ then.

comment:4 by Carl Eugen Hoyos, 7 years ago

Keywords: ttxt tx3g subtitles mp4 removed

in reply to:  3 comment:5 by erikbs, 7 years ago

Thanks, the patch has now been added to the list.

Two things remain:

  1. Fixing the decoder as well
  2. UTF-16, as is required by the format specification (only Big Endian UTF-16 is required, UTF-16 is optional)

I will take a look at the first one, but probably not the second, at least not now. According to the specification, UTF-8 is to be assumed unless there is a UTF-16 BOM, and I am afraid I will break something if I try to solve this without knowing the code. Besides, I have never encountered UTF-16 encoded subtitles and UTF-16 support is broken both with and without my patch (in different ways).

PS. Ticket #5429 is about the same problem. I did not realize that my ticket was a duplicate until after I had submitted it.

Last edited 7 years ago by erikbs (previous) (diff)

comment:6 by erikbs, 7 years ago

New patch submitted for correctly decoding styles when multibyte UTF-8 characters are involved.

About UTF-16:
Given that the byte length, uint64_t L, of the string, char *text, is known, the number of UTF-16 characters in the string can be calculated as follows:

uint64_t utf16_char_len(const char *text, uint64_t L) {
    uint64_t l = 0;
    uint16_t c = 0, start = 0;
    uint16_t m[2] = {0xFC00, 0xDC00}; // Bit masks
    
    if (L >= 2) c = ((uint16_t)text[0] << 8) + (uint8_t)text[1];
    switch (c) {
        case 0xFFFE:    // Little Endian, swap mask byte order
            m[0] = 0x00FC; m[1] = 0x00DC;
        case 0xFEFF:
            start = 2;  // Skip the BOM
        default:
            for (uint64_t i = start; i < L; i += 2)
                if (((((uint16_t)text[i] << 8) |
                    (uint8_t )text[i + 1] ) & m[0]) != m[1]) l++;
    }
    return l;
}

This code expects to be fed valid UTF-16 data and assumes Big Endian when no BOM is present.

The format specification only requires Big Endian support, but it demands that the BOM be present for UTF-16. Exactly how this is supposed to be encoded I don’t know.

comment:7 by Carl Eugen Hoyos, 7 years ago

Cc: erikbs added

Sorry if I misunderstand but I believe FFmpeg cannot output UTF-16 subtitles.
In any case, UTF-16 should not be part of your patch fixing UTF-8.

comment:8 by erikbs, 7 years ago

Ah, I see. That explains the error messages I received when trying to reencode a UTF-16 SRT into another UTF-16 SRT.

UTF-16 is not part of the two patches I have submitted for UTF-8.

comment:9 by CoRoNe, 7 years ago

When can I expect this patch to be integrated?
I'm asking because I'm having a somewhat similar situation.

Recently I found out that my Samsung Smart TV (2014) doesn't like tx3g subtitle-streams generated by FFMpeg. Whenever I mux subtitles as tx3g with -c:s mov_text, Media Player Classic - HC on my pc never has any problems rendering them, but my tv does. Each time I start such a video generated by FFMpeg, lines only randomly appear.
If I generate the same video with MP4Box, then all lines appear normally on my tv.
Before I encountered this issue I've tried -fix_sub_duration and several -movflags-options, but all to no avail.

I'd like to see if erikbs's suggested patch could also fix my issue.

comment:10 by CoRoNe, 7 years ago

I'm a complete novice when it comes to compiling stuff from source, but in the end, with the help of ffmpeg-windows-build-helpers I've tested your suggested patch, erikbs, but it does nothing for my issue, because my Samsung SMART TV still randomly shows the subtitle lines. Again, the same subtitles muxed with MP4Box and all lines are shown.
I guess I'll mux to Matroska for the time being.

comment:11 by erikbs, 7 years ago

There are several attributes that ffmpeg does not care about when writing mov_text data. You can use MP4Box to dump the subtitles stream to XML format using the following command:

mp4box -ttxt 3 FILENAME.mp4

As you can see, attributes such as width and height are all zero in XML files dumped from ffmpeg generated MP4s. The MP4s containing subtitles converted from SRT by MP4Box have non-zero values in these fields instead, as is evident when dumping the subtitles stream to XML/TTXT. It is possible that the missing values are causing your problem. It is also possible that the patch was not applied correctly (the one I posted above is in a format that is slightly different from the one in which files are supposed to be submitted to Patchwork). Feel free to upload a sample video that is affected by the problem. You can create a 30 second sample from a longer video using this command:

ffmpeg -i INPUT_FILE.mp4 -c copy -c:s copy -ss HH:MM:SS -t 00:00:30 OUTPUT_FILE.mp4

where HH:MM:SS is the start time (hours, minutes, seconds).

It could of course also be that my patch is onto something, but is incomplete. Thus other relevant questions are:

  1. Are the vanishing subtitle lines styled? (look for <b>, <i> or <font in the SRT)
  2. Do the lines contain special characters? Multibyte characters? Characters missing from the TV’s fonts? Line breaks?
  3. Is there anything special about the lines that are shown? (Special characters, styling, length, …)
  4. Do the subtitles vanish randomly or are the same lines displayed/hidden every time?

comment:12 by erikbs, 7 years ago

I have submitted new patches by the way:

https://patchwork.ffmpeg.org/patch/2795/ (encoder)

https://patchwork.ffmpeg.org/patch/2796/ (decoder)

These hopefully meet all the requirements (no reindentation etc.)

comment:13 by CoRoNe, 7 years ago

I only saw your posts just now, because I didn't get email-notification.

It is also possible that the patch was not applied correctly (the one I posted above is in a format that is slightly different from the one in which files are supposed to be submitted to Patchwork).

Last time I didn't actually apply your patch the official way. I just manually edited movtextenc.c. I've also done this now, because of line-numbers in your patch not being up-to-date anymore.

Feel free to upload a sample video that is affected by the problem.

See http://rwijnsma.home.xs4all.nl/files/test/

Thus other relevant questions are: ...

Everything is just plain text. The subtitles vanish randomly.

I've successfully compiled FFMpeg with your changes, but after creating another test-video ('VPWON_1269922(FFMpeg-mux_t32)patch.mp4') my TV didn't display subtitles at all anymore.
No problem with MPC-HC on my pc.

commandline used:

ffmpeg.exe -i "full-episode" -i "VPWON_1269922.srt" -t 32 -c copy -c:s mov_text
-metadata:s:s language=dut "VPWON_1269922(FFMpeg-mux_t32)patch.mp4"

comment:14 by erikbs, 5 years ago

Resolution: fixed
Status: newclosed

Closing the ticket, since I no longer experience this bug. I have looked at the code and the solution is more complicated than necessary, but it appears to do roughly the same as I suggested.

comment:15 by Carl Eugen Hoyos, 5 years ago

Fixed by Philip Langdale in af043b839c38e850af1184fd6be691f8475c048e

comment:16 by erikbs, 5 years ago

Well, better late than never. Given that it took too full years to fix the problem even though I provided a detailed explanation and submitted a patch 4-5 times (including here), I do not think I will bother trying again.

comment:17 by CoRoNe, 5 years ago

Cc: rwijnsma@xs4all.nl added

First time I've been looking at this again in 2 years time. I believe my issue is different than erikbs', because this fix didn't do anything for me.
Whenever I mux subtitles it will always be to a matroska container, because muxing mov_text in a mp4 container was problematic and after testing today I can say it still is.
Ran above command again with a (obviously) newer FFmpeg binary, but now my Samsung Smart TV (2014) doesn't show/render any subtitles at all. My tv has no problems at all with the video generated by MP4Box and shows all subtitle lines.
Could it be that FFmpeg doesn't fully adhere to the ISO/IEC 14496-17 standard? There has to be a reason why my tv doesn't like the FFmpeg generated MP4[tx3g] files afterall.

in reply to:  17 comment:18 by erikbs, 5 years ago

Replying to CoRoNe:

First time I've been looking at this again in 2 years time. I believe my issue is different than erikbs', because this fix didn't do anything for me.
Whenever I mux subtitles it will always be to a matroska container, because muxing mov_text in a mp4 container was problematic and after testing today I can say it still is.
Ran above command again with a (obviously) newer FFmpeg binary, but now my Samsung Smart TV (2014) doesn't show/render any subtitles at all. My tv has no problems at all with the video generated by MP4Box and shows all subtitle lines.
Could it be that FFmpeg doesn't fully adhere to the ISO/IEC 14496-17 standard? There has to be a reason why my tv doesn't like the FFmpeg generated MP4[tx3g] files afterall.

Absolutely possible (actually I am even convinced that it does not, considering how it was overlooked that character and byte count are two different things). Through my investigation, I discovered that MP4Box writes a subtitle box size while ffmpeg does not (or, more precisely, it apparently sets the box size to 0x0). Perhaps that is why the subtitles are not shown on the TV?

Another thing to try is to encode the subtitles into an MP4 file with ffmpeg as usual, then use MP4Box to demux and remux the tx3g subtitles generated by ffmpeg (mp4box -ttxt 3 FILENAME.mp4 etc.). If that works, then ffmpeg somehow messes up the muxing, not the encoding. If it still does not work, then there is something wrong with its tx3g encoder.

comment:19 by CoRoNe, 5 years ago

I discovered that MP4Box writes a subtitle box size while ffmpeg does not (or, more precisely, it apparently sets the box size to 0x0). Perhaps that is why the subtitles are not shown on the TV?

Do you have a patch for this by any chance? I'm not a hero with C/C++.

...demux and remux

This works. My tv renders the subtitles perfectly. So I'm guessing something is going wrong with the muxing process.
Comparing the 2 mp4-files in MediaInfo I noticed the one created by FFmpeg mentioning "Muxing mode: sbtl" and "Alternate group: 3", where as the one created by MP4Box does not.

And for everyone's info:

--- VPWON_1269922-t32_MP4Box.ttxt       2019-04-05 23:33:21.500000000 +0200
+++ VPWON_1269922-t32_FFmpeg_demux-MP4Box.ttxt  2019-04-05 23:16:37.475016000 +0200
@@ -1,12 +1,12 @@
 <?xml version="1.0" encoding="UTF-8" ?>
 <!-- GPAC 3GPP Text Stream -->
 <TextStream version="1.1">
-<TextStreamHeader width="400" height="60" layer="0" translation_x="0" translation_y="0">
+<TextStreamHeader width="0" height="0" layer="0" translation_x="0" translation_y="0">
 <TextSampleDescription horizontalJustification="center" verticalJustification="bottom" backColor="0
 0 0 0" verticalText="no" fillTextRegion="no" continuousKaraoke="no" scroll="None">
 <FontTable>
 <FontTableEntry fontName="Serif" fontID="1"/>
 </FontTable>
-<TextBox top="0" left="0" bottom="60" right="400"/>
+<TextBox top="0" left="0" bottom="0" right="0"/>
 <Style styles="Normal" fontID="1" fontSize="18" color="ff ff ff ff"/>
 </TextSampleDescription>
 </TextStreamHeader>

This appears to be the only output difference between MP4Box's and FFmpeg's tx3g encoder.
'VPWON_1269922-t32_MP4Box.ttxt': mp4box -ttxt "VPWON_1269922-t32.srt"
'VPWON_1269922-t32_FFmpeg_demux-MP4Box.ttxt': mp4box -ttxt 3 "VPWON_1269922(FFMpeg-mux_t32).mp4"

comment:20 by Balling, 20 months ago

Yep, my TV cannot play Muxing mode: sbtl files at all!!

Note: See TracTickets for help on using tickets.