Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#3393 closed defect (invalid)

Interlaced H.264 packets are split causing MP4 STTS

Reported by: wim_arbor Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: h264 mov
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:
when remuxing a mpeg-ts containing interlaced H.264 into mp4, both fields of each video frame are split into seperate packets. Software such as Mediainfo uses the STTS to determine the frame rate. It will show as 50fps instead of 25fps

How to reproduce:

% ffmpeg.exe -i h264_aac_576i_tff.ts -c:a copy -c:v copy -bsf:a aac_adtstoasc -async 1 h264_aac_576i_tff.mp4

ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 17 2014 15:45:12 with gcc 4.8.2 (GCC)
  configuration: --pkg-config=pkg-config --prefix=/home/arbor/software/packages/win32 --enable-memalign-hack --arch=x86 --target-os=mingw32 --cross-prefix=i686-w64-mingw32- --enable-libfaac --enable-libx264 --enable-gpl --enable-nonfree --disable-w32threads
  libavutil      52. 64.100 / 52. 64.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mpegts, from 'h264_aac_576i_tff.ts':
  Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s
  Program 1
    Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s
File 'h264_aac_576i_tff.mp4' already exists. Overwrite ? [y/N] y
Output #0, mp4, to 'h264_aac_576i_tff.mp4':
  Metadata:
    encoder         : Lavf55.33.100
    Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 25 fps, 90k tbn, 90k tbc
    Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 127 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[mp4 @ 03820060] pts has no value
    Last message repeated 68 times
[mp4 @ 03820060] pts has no value
    Last message repeated 1 times
frame=  142 fps=0.0 q=-1.0 Lsize=    3205kB time=00:00:02.78 bitrate=9444.8kbits/s
video:3159kB audio:43kB subtitle:0 data:0 global headers:0kB muxing overhead 0.096098%

h264_aac_576i_tff.ts has been uploaded to /incoming on upload.ffmpeg.org

Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.

Attachments (1)

0003-avcodec-h264-merge-fields-fix-duration.patch (677 bytes) - added by wim_arbor 5 years ago.
fixes duration of output file after previous patches have been applied

Download all attachments as: .zip

Change History (11)

comment:1 Changed 5 years ago by wim_arbor

Output of ffmpeg.exe -v 9 -loglevel 99 -i h264_aac_576i_tff.ts

ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 17 2014 15:45:12 with gcc 4.8.2 (GCC)
  configuration: --pkg-config=pkg-config --prefix=/home/arbor/software/packages/win32 --enable-memalign-hack --arch=x86 --target-os=mingw32 --cross-prefix=i686-w64-mingw32- --enable-libfaac --enable-libx264 --enable-gpl --enable-nonfree --disable-w32threads
  libavutil      52. 64.100 / 52. 64.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'.
Reading option '-i' ... matched as input file with argument 'h264_aac_576i_tff.ts'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 9.
Successfully parsed a group of options.
Parsing a group of options: input file h264_aac_576i_tff.ts.
Successfully parsed a group of options.
Opening an input file: h264_aac_576i_tff.ts.
[mpegts @ 035e5f40] Format mpegts probed with size=2048 and score=100
[mpegts @ 035e5f40] stream=0 stream_type=1b pid=1011 prog_reg_desc=HDMV
[mpegts @ 035e5f40] stream=1 stream_type=f pid=1100 prog_reg_desc=HDMV
[mpegts @ 035e5f40] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0
[mpegts @ 035e5f40] All programs have pmt, headers found
[h264 @ 003da520] no picture
[mpegts @ 035e5f40] All info found
[mpegts @ 035e5f40] After avformat_find_stream_info() pos: 0 bytes read:1036432 seeks:2 frames:81
Input #0, mpegts, from 'h264_aac_576i_tff.ts':
  Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s
  Program 1
    Stream #0:0[0x1011], 43, 1/90000: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 1/50, 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1100], 38, 1/90000: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s
Successfully opened the file.
At least one output file must be specified
[AVIOContext @ 035ee5a0] Statistics: 1036432 bytes read, 2 seeks

comment:2 Changed 5 years ago by wim_arbor

the output of mp4dump below shows

moov/trak/mdia/mdhd

    timescale = 90000
    duration = 255600
    duration(ms) = 2840

moov/trak/mdia/minf/stbl/stsd/stts

    entry_count = 1
    entry        0 = sample_count=142, sample_duration=1800

Framerate calculation: 90000/1800 = 50 fps
or: 2840/142 = 20 ms, 1000/20 = 50 fps

Output of mp4dump --verbosity 1 h264_aac_576i_tff.mp4
Note not relevant parts replaced with (...)

[ftyp] size=8+24
  major_brand = isom
  minor_version = 200
  compatible_brand = isom
  compatible_brand = iso2
  compatible_brand = avc1
  compatible_brand = mp41
[free] size=8+0
[mdat] size=8+3278041
[moov] size=8+3991
  [mvhd] size=12+96
    timescale = 1000
    duration = 2840
    duration(ms) = 2840
  [trak] size=8+2261
    [tkhd] size=12+80, flags=3
      enabled = 1
      id = 1
      duration = 2840
      width = 768.000000
      height = 576.000000
    [edts] size=8+28
      [elst] size=12+16
        entry count = 1
        entry/segment duration = 2840
        entry/media time = 3600
        entry/media rate = 1
    [mdia] size=8+2125
      [mdhd] size=12+20
        timescale = 90000
        duration = 255600
        duration(ms) = 2840
        language = und
      [hdlr] size=12+33
        handler_type = vide
        handler_name = VideoHandler
      [minf] size=8+2040
        [vmhd] size=12+8, flags=1
          graphics_mode = 0
          op_color = 0000,0000,0000
        [dinf] size=8+28
          [dref] size=12+16
            [url ] size=12+0, flags=1
              location = [local to file]
        [stbl] size=8+1976
          [stsd] size=12+176
            entry-count = 1
            [avc1] size=8+164
              data_reference_index = 1
              width = 720
              height = 576
              compressor =
              [avcC] size=8+62
                Configuration Version = 1
                Profile = High
                Profile Compatibility = 0
                Level = 41
                NALU Length Size = 4
                Sequence Parameter = [67 64 00 29 ac 2c a4 02 d0 91 7f e0 02 00 01 e9 41 41 41 50 00 00 03 00 10 00 00 03 03 2e 4a 00 02 71 00 00 07 a1 27 f1 8e 0e d0 a1 48 90]
                Picture Parameter = [68 e9 8d 35 25]
              [pasp] size=8+8
          [stts] size=12+12
            entry_count = 1
            entry        0 = sample_count=142, sample_duration=1800
          [stss] size=12+20
            entry_count = 4
          [ctts] size=12+356
            entry_count = 44
          [stsc] size=12+232
            entry_count = 19
            entry        0 = first_chunk=1, first_sample*=1, chunk_count*=1, samples_per_chunk=3, sample_desc_index=1
(...)
            entry       18 = first_chunk=129, first_sample*=139, chunk_count*=0, samples_per_chunk=4, sample_desc_index=1
          [stsz] size=12+576
            sample_size = 0
            sample_count = 142
          [stco] size=12+520
            entry_count = 129
            entry        0 = 48
(...)
            entry      128 = 3175684
  [trak] size=8+1508
(...)
  [udta] size=8+90
    [meta] size=12+78
      [hdlr] size=12+21
        handler_type = mdir
        handler_name =
      [ilst] size=8+37
        [.too] size=8+29
          [data] size=8+21
            type = 1
            lang = 0
            value = Lavf55.33.100

comment:3 Changed 5 years ago by wim_arbor

Note that after this patch to h264_parse() in libavcodec/h264_parser.c:

diff --git a/libavcodec/h264_parser.c b/libavcodec/h264_parser.c
index 4432871..564ae14 100644
--- a/libavcodec/h264_parser.c
+++ b/libavcodec/h264_parser.c
@@ -471,6 +471,7 @@ static int h264_parse(AVCodecParserContext *s,
         }
     }

+    s->flags |= PARSER_FLAG_COMPLETE_FRAMES;
     if (s->flags & PARSER_FLAG_COMPLETE_FRAMES) {
         next = buf_size;
     } else {

a mp4 is generated which is almost correct;

moov/trak/mdia/mdhd

    timescale = 90000
    duration = 253800
    duration(ms) = 2820

moov/trak/mdia/minf/stbl/stsd/stts

    entry_count = 2
    entry        0 = sample_count=70, sample_duration=3600
    entry        1 = sample_count=1, sample_duration=1800

So the first 70 samples are marked with the correct duration of 3600, except the last one has still a wrong duration.

But this change is not a valid patch as it causes problems elsewhere. The correct way is probably to fix this is h264_find_frame_end() also in libavcodec/h264_parser.c. But that is way more complex.

comment:4 Changed 5 years ago by cehoyos

  • Keywords h264 added
  • Reproduced by developer set
  • Status changed from new to open

Input sample contains 71 video frames:

$ ffmpeg -i h264_aac_576i_tff.ts -f null -
ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      52. 64.100 / 52. 64.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mpegts, from 'h264_aac_576i_tff.ts':
  Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s
  Program 1
    Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.33.100
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 200 kb/s, 90k tbn, 25 tbc
    Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (h264 -> rawvideo)
  Stream #0:1 -> #0:1 (aac -> pcm_s16le)
Press [q] to stop, [?] for help
[null @ 0x1980400] Encoder did not produce proper pts, making some up.
frame=   71 fps=0.0 q=0.0 Lsize=N/A time=00:00:02.84 bitrate=N/A
video:7kB audio:512kB subtitle:0 data:0 global headers:0kB muxing overhead -100.004142%

142 frames are remuxed:

$ ffmpeg -i h264_aac_576i_tff.ts -vcodec copy -strict -2 out.mov
ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      52. 64.100 / 52. 64.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mpegts, from 'h264_aac_576i_tff.ts':
  Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s
  Program 1
    Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s
Output #0, mov, to 'out.mov':
  Metadata:
    encoder         : Lavf55.33.100
    Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 25 fps, 90k tbn, 90k tbc
    Stream #0:1: Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (aac -> aac)
Press [q] to stop, [?] for help
[mov @ 0x29384c0] pts has no value
    Last message repeated 68 times
[mov @ 0x29384c0] pts has no value
    Last message repeated 1 times
frame=  142 fps=0.0 q=-1.0 Lsize=    3171kB time=00:00:02.78 bitrate=9345.6kbits/s
video:3159kB audio:8kB subtitle:0 data:0 global headers:0kB muxing overhead 0.129618%

The framerate is incorrect for out.mov:

$ ffmpeg -i out.mov -strict -2 out2.mov
ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      52. 64.100 / 52. 64.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf55.33.100
  Duration: 00:00:02.84, start: 0.014667, bitrate: 9148 kb/s
    Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 9113 kb/s, 50 fps, 50 tbr, 90k tbn, 50 tbc (default)
    Metadata:
      handler_name    : DataHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 23 kb/s (default)
    Metadata:
      handler_name    : DataHandler
Output #0, mov, to 'out2.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf55.33.100
    Stream #0:0(eng): Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 200 kb/s, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : DataHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : DataHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 -> mpeg4)
  Stream #0:1 -> #0:1 (aac -> aac)
Press [q] to stop, [?] for help
frame=  142 fps=0.0 q=2.0 Lsize=     142kB time=00:00:02.84 bitrate= 408.4kbits/s dup=71 drop=0
video:129kB audio:8kB subtitle:0 data:0 global headers:0kB muxing overhead 2.731602%

comment:5 Changed 5 years ago by cehoyos

  • Keywords mov added

comment:6 Changed 5 years ago by cehoyos

A patch can be found on the mailing list:
http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/175525

Changed 5 years ago by wim_arbor

fixes duration of output file after previous patches have been applied

comment:7 follow-up: Changed 5 years ago by wim_arbor

What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.

AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10

access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.

Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.

So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files. I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.

comment:8 Changed 5 years ago by wim_arbor

  • Resolution set to invalid
  • Status changed from open to closed

Closing as invalid because it violates the ISO spec.

I have seen other encoders encode files this way an thus violating the spec, but I can't prove a de-facto standard which does not agree with the ISO spec.

comment:9 in reply to: ↑ 7 ; follow-up: Changed 5 years ago by cehoyos

Replying to wim_arbor:

What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.

AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10

access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.

Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.

I don't know much about H.264 but I would have expected that it needs two PAFF fields to get a decoded picture.

So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files.

I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.

This sounds to me as if we should do the same, particularly if there is no playback application that fails for such output files.

comment:10 in reply to: ↑ 9 Changed 5 years ago by wim_arbor

Replying to cehoyos:

Replying to wim_arbor:

What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.

AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10

access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.

Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.

I don't know much about H.264 but I would have expected that it needs two PAFF fields to get a decoded picture.

You should not read the english word "picture" here, but picture as defined in the same spec;

picture: A collective term for a field or a frame.

And related definitions:

decoded picture: A decoded picture is derived by decoding a coded picture. A decoded picture is either a decoded frame, or a decoded field. A decoded field is either a decoded top field or a decoded bottom field.

coded picture: A coded representation of a picture. A coded picture may be either a coded field or a coded frame. Coded picture is a collective term referring to a primary coded picture or a redundant coded picture, but not to both together.

So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files.

I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.

This sounds to me as if we should do the same, particularly if there is no playback application that fails for such output files.

AFAIK there is no application which fails with the current output either. Just some software reports a wrong framerate for such files. The reaction on the mailing list is very clear that this violates the spec. And I agree now after reading it.

I will report a bug at mainconcept after I have checked their newest codec version. If they don't agree, I will report back here. But that will take some time, I did not want to keep this ticket open in the mean time.

But of course feel free if you want to implement this, currently I have to merge the patches myself to get a custom version.

Note: See TracTickets for help on using tickets.