#3393 closed defect (invalid)
Interlaced H.264 packets are split causing MP4 STTS
Reported by: | wim_arbor | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | git-master | Keywords: | h264 mov |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary of the bug:
when remuxing a mpeg-ts containing interlaced H.264 into mp4, both fields of each video frame are split into seperate packets. Software such as Mediainfo uses the STTS to determine the frame rate. It will show as 50fps instead of 25fps
How to reproduce:
% ffmpeg.exe -i h264_aac_576i_tff.ts -c:a copy -c:v copy -bsf:a aac_adtstoasc -async 1 h264_aac_576i_tff.mp4 ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers built on Feb 17 2014 15:45:12 with gcc 4.8.2 (GCC) configuration: --pkg-config=pkg-config --prefix=/home/arbor/software/packages/win32 --enable-memalign-hack --arch=x86 --target-os=mingw32 --cross-prefix=i686-w64-mingw32- --enable-libfaac --enable-libx264 --enable-gpl --enable-nonfree --disable-w32threads libavutil 52. 64.100 / 52. 64.100 libavcodec 55. 52.102 / 55. 52.102 libavformat 55. 33.100 / 55. 33.100 libavdevice 55. 10.100 / 55. 10.100 libavfilter 4. 1.102 / 4. 1.102 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100 Input #0, mpegts, from 'h264_aac_576i_tff.ts': Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s Program 1 Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s File 'h264_aac_576i_tff.mp4' already exists. Overwrite ? [y/N] y Output #0, mp4, to 'h264_aac_576i_tff.mp4': Metadata: encoder : Lavf55.33.100 Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 25 fps, 90k tbn, 90k tbc Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 127 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Press [q] to stop, [?] for help [mp4 @ 03820060] pts has no value Last message repeated 68 times [mp4 @ 03820060] pts has no value Last message repeated 1 times frame= 142 fps=0.0 q=-1.0 Lsize= 3205kB time=00:00:02.78 bitrate=9444.8kbits/s video:3159kB audio:43kB subtitle:0 data:0 global headers:0kB muxing overhead 0.096098%
h264_aac_576i_tff.ts has been uploaded to /incoming on upload.ffmpeg.org
Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.
Attachments (1)
Change History (11)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
the output of mp4dump below shows
moov/trak/mdia/mdhd
timescale = 90000 duration = 255600 duration(ms) = 2840
moov/trak/mdia/minf/stbl/stsd/stts
entry_count = 1 entry 0 = sample_count=142, sample_duration=1800
Framerate calculation: 90000/1800 = 50 fps
or: 2840/142 = 20 ms, 1000/20 = 50 fps
Output of mp4dump --verbosity 1 h264_aac_576i_tff.mp4
Note not relevant parts replaced with (...)
[ftyp] size=8+24 major_brand = isom minor_version = 200 compatible_brand = isom compatible_brand = iso2 compatible_brand = avc1 compatible_brand = mp41 [free] size=8+0 [mdat] size=8+3278041 [moov] size=8+3991 [mvhd] size=12+96 timescale = 1000 duration = 2840 duration(ms) = 2840 [trak] size=8+2261 [tkhd] size=12+80, flags=3 enabled = 1 id = 1 duration = 2840 width = 768.000000 height = 576.000000 [edts] size=8+28 [elst] size=12+16 entry count = 1 entry/segment duration = 2840 entry/media time = 3600 entry/media rate = 1 [mdia] size=8+2125 [mdhd] size=12+20 timescale = 90000 duration = 255600 duration(ms) = 2840 language = und [hdlr] size=12+33 handler_type = vide handler_name = VideoHandler [minf] size=8+2040 [vmhd] size=12+8, flags=1 graphics_mode = 0 op_color = 0000,0000,0000 [dinf] size=8+28 [dref] size=12+16 [url ] size=12+0, flags=1 location = [local to file] [stbl] size=8+1976 [stsd] size=12+176 entry-count = 1 [avc1] size=8+164 data_reference_index = 1 width = 720 height = 576 compressor = [avcC] size=8+62 Configuration Version = 1 Profile = High Profile Compatibility = 0 Level = 41 NALU Length Size = 4 Sequence Parameter = [67 64 00 29 ac 2c a4 02 d0 91 7f e0 02 00 01 e9 41 41 41 50 00 00 03 00 10 00 00 03 03 2e 4a 00 02 71 00 00 07 a1 27 f1 8e 0e d0 a1 48 90] Picture Parameter = [68 e9 8d 35 25] [pasp] size=8+8 [stts] size=12+12 entry_count = 1 entry 0 = sample_count=142, sample_duration=1800 [stss] size=12+20 entry_count = 4 [ctts] size=12+356 entry_count = 44 [stsc] size=12+232 entry_count = 19 entry 0 = first_chunk=1, first_sample*=1, chunk_count*=1, samples_per_chunk=3, sample_desc_index=1 (...) entry 18 = first_chunk=129, first_sample*=139, chunk_count*=0, samples_per_chunk=4, sample_desc_index=1 [stsz] size=12+576 sample_size = 0 sample_count = 142 [stco] size=12+520 entry_count = 129 entry 0 = 48 (...) entry 128 = 3175684 [trak] size=8+1508 (...) [udta] size=8+90 [meta] size=12+78 [hdlr] size=12+21 handler_type = mdir handler_name = [ilst] size=8+37 [.too] size=8+29 [data] size=8+21 type = 1 lang = 0 value = Lavf55.33.100
comment:3 by , 11 years ago
Note that after this patch to h264_parse()
in libavcodec/h264_parser.c
:
diff --git a/libavcodec/h264_parser.c b/libavcodec/h264_parser.c index 4432871..564ae14 100644 --- a/libavcodec/h264_parser.c +++ b/libavcodec/h264_parser.c @@ -471,6 +471,7 @@ static int h264_parse(AVCodecParserContext *s, } } + s->flags |= PARSER_FLAG_COMPLETE_FRAMES; if (s->flags & PARSER_FLAG_COMPLETE_FRAMES) { next = buf_size; } else {
a mp4 is generated which is almost correct;
moov/trak/mdia/mdhd
timescale = 90000 duration = 253800 duration(ms) = 2820
moov/trak/mdia/minf/stbl/stsd/stts
entry_count = 2 entry 0 = sample_count=70, sample_duration=3600 entry 1 = sample_count=1, sample_duration=1800
So the first 70 samples are marked with the correct duration of 3600, except the last one has still a wrong duration.
But this change is not a valid patch as it causes problems elsewhere. The correct way is probably to fix this is h264_find_frame_end()
also in libavcodec/h264_parser.c
. But that is way more complex.
comment:4 by , 11 years ago
Keywords: | h264 added |
---|---|
Reproduced by developer: | set |
Status: | new → open |
Input sample contains 71 video frames:
$ ffmpeg -i h264_aac_576i_tff.ts -f null - ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux) configuration: --enable-gpl libavutil 52. 64.100 / 52. 64.100 libavcodec 55. 52.102 / 55. 52.102 libavformat 55. 33.100 / 55. 33.100 libavdevice 55. 10.100 / 55. 10.100 libavfilter 4. 1.102 / 4. 1.102 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100 Input #0, mpegts, from 'h264_aac_576i_tff.ts': Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s Program 1 Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.33.100 Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 200 kb/s, 90k tbn, 25 tbc Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s Stream mapping: Stream #0:0 -> #0:0 (h264 -> rawvideo) Stream #0:1 -> #0:1 (aac -> pcm_s16le) Press [q] to stop, [?] for help [null @ 0x1980400] Encoder did not produce proper pts, making some up. frame= 71 fps=0.0 q=0.0 Lsize=N/A time=00:00:02.84 bitrate=N/A video:7kB audio:512kB subtitle:0 data:0 global headers:0kB muxing overhead -100.004142%
142 frames are remuxed:
$ ffmpeg -i h264_aac_576i_tff.ts -vcodec copy -strict -2 out.mov ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux) configuration: --enable-gpl libavutil 52. 64.100 / 52. 64.100 libavcodec 55. 52.102 / 55. 52.102 libavformat 55. 33.100 / 55. 33.100 libavdevice 55. 10.100 / 55. 10.100 libavfilter 4. 1.102 / 4. 1.102 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100 Input #0, mpegts, from 'h264_aac_576i_tff.ts': Duration: 00:00:02.80, start: 600.000000, bitrate: 9671 kb/s Program 1 Stream #0:0[0x1011]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 25 fps, 25 tbr, 90k tbn, 50 tbc Stream #0:1[0x1100]: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 127 kb/s Output #0, mov, to 'out.mov': Metadata: encoder : Lavf55.33.100 Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 25 fps, 90k tbn, 90k tbc Stream #0:1: Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (aac -> aac) Press [q] to stop, [?] for help [mov @ 0x29384c0] pts has no value Last message repeated 68 times [mov @ 0x29384c0] pts has no value Last message repeated 1 times frame= 142 fps=0.0 q=-1.0 Lsize= 3171kB time=00:00:02.78 bitrate=9345.6kbits/s video:3159kB audio:8kB subtitle:0 data:0 global headers:0kB muxing overhead 0.129618%
The framerate is incorrect for out.mov:
$ ffmpeg -i out.mov -strict -2 out2.mov ffmpeg version N-60700-g07b4b0c Copyright (c) 2000-2014 the FFmpeg developers built on Feb 18 2014 09:15:18 with gcc 4.7 (SUSE Linux) configuration: --enable-gpl libavutil 52. 64.100 / 52. 64.100 libavcodec 55. 52.102 / 55. 52.102 libavformat 55. 33.100 / 55. 33.100 libavdevice 55. 10.100 / 55. 10.100 libavfilter 4. 1.102 / 4. 1.102 libswscale 2. 5.101 / 2. 5.101 libswresample 0. 17.104 / 0. 17.104 libpostproc 52. 3.100 / 52. 3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out.mov': Metadata: major_brand : qt minor_version : 512 compatible_brands: qt encoder : Lavf55.33.100 Duration: 00:00:02.84, start: 0.014667, bitrate: 9148 kb/s Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg), 720x576 [SAR 16:15 DAR 4:3], 9113 kb/s, 50 fps, 50 tbr, 90k tbn, 50 tbc (default) Metadata: handler_name : DataHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 23 kb/s (default) Metadata: handler_name : DataHandler Output #0, mov, to 'out2.mov': Metadata: major_brand : qt minor_version : 512 compatible_brands: qt encoder : Lavf55.33.100 Stream #0:0(eng): Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 720x576 [SAR 16:15 DAR 4:3], q=2-31, 200 kb/s, 12800 tbn, 50 tbc (default) Metadata: handler_name : DataHandler Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : DataHandler Stream mapping: Stream #0:0 -> #0:0 (h264 -> mpeg4) Stream #0:1 -> #0:1 (aac -> aac) Press [q] to stop, [?] for help frame= 142 fps=0.0 q=2.0 Lsize= 142kB time=00:00:02.84 bitrate= 408.4kbits/s dup=71 drop=0 video:129kB audio:8kB subtitle:0 data:0 global headers:0kB muxing overhead 2.731602%
comment:5 by , 11 years ago
Keywords: | mov added |
---|
comment:6 by , 11 years ago
A patch can be found on the mailing list:
http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/175525
by , 11 years ago
Attachment: | 0003-avcodec-h264-merge-fields-fix-duration.patch added |
---|
fixes duration of output file after previous patches have been applied
follow-up: 9 comment:7 by , 11 years ago
What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.
AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10
access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.
Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.
So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files. I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.
comment:8 by , 11 years ago
Resolution: | → invalid |
---|---|
Status: | open → closed |
Closing as invalid because it violates the ISO spec.
I have seen other encoders encode files this way an thus violating the spec, but I can't prove a de-facto standard which does not agree with the ISO spec.
follow-up: 10 comment:9 by , 11 years ago
Replying to wim_arbor:
What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.
AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10
access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.
Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.
I don't know much about H.264 but I would have expected that it needs two PAFF fields to get a decoded picture.
So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files.
I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.
This sounds to me as if we should do the same, particularly if there is no playback application that fails for such output files.
comment:10 by , 11 years ago
Replying to cehoyos:
Replying to wim_arbor:
What I understand from the discussion on the mailing list is that merging the fields into field pairs violates the ISO specification.
AVC sample: An AVC sample is an access unit as defined in ISO/IEC 14496‐10
access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded picture. (...) The decoding of an access unit always results in a decoded picture.
Each (PAFF) field is encoded as a separate picture, so a sample in a MP4 file may only contain a single field.
I don't know much about H.264 but I would have expected that it needs two PAFF fields to get a decoded picture.
You should not read the english word "picture" here, but picture as defined in the same spec;
picture: A collective term for a field or a frame.
And related definitions:
decoded picture: A decoded picture is derived by decoding a coded picture. A decoded picture is either a decoded frame, or a decoded field. A decoded field is either a decoded top field or a decoded bottom field.
coded picture: A coded representation of a picture. A coded picture may be either a coded field or a coded frame. Coded picture is a collective term referring to a primary coded picture or a redundant coded picture, but not to both together.
So software which uses the sample count in the MP4 file to determine the frame rate is simply wrong. This includes mediainfo, vlc, quicktime and gspot. The same applies to other encoders which generate such files.
I tested sorenson squeeze with the intel and mainconcept encoder. Both merged fields.
This sounds to me as if we should do the same, particularly if there is no playback application that fails for such output files.
AFAIK there is no application which fails with the current output either. Just some software reports a wrong framerate for such files. The reaction on the mailing list is very clear that this violates the spec. And I agree now after reading it.
I will report a bug at mainconcept after I have checked their newest codec version. If they don't agree, I will report back here. But that will take some time, I did not want to keep this ticket open in the mean time.
But of course feel free if you want to implement this, currently I have to merge the patches myself to get a custom version.
Output of
ffmpeg.exe -v 9 -loglevel 99 -i h264_aac_576i_tff.ts