Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#3871 closed defect (wontfix)

FFmpeg MD5 output different with same data #2

Reported by: ahthovaikied Owned by:
Priority: normal Component: avformat
Version: 2.2.4 Keywords: aac h264 mkv
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Computing the MD5 of media streams produce different results between two of my machines, with same input data.

I learned my lesson in bug https://trac.ffmpeg.org/ticket/3524#comment:9 and I am calculating the checksums WITHOUT decoding.

The 2 configurations are :

  • Ubuntu 14.04, Core i7 950, FFmpeg compiled with:
    --enable-gpl --enable-version3 --enable-nonfree --disable-runtime-cpudetect --disable-ffserver --disable-encoder=vorbis --disable-encoder=aac --enable-x11grab --enable-libfdk-aac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopus --enable-librtmp --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxvid --disable-debug --extra-libs=-ldl --cpu=corei7
    
  • Ubuntu 14.04, Atom D525, FFmpeg compiled with:
    --enable-gpl --enable-version3 --enable-nonfree --disable-runtime-cpudetect --disable-ffserver --disable-ffplay --disable-encoders --disable-decoders --disable-debug --cpu=atom
    

I get different MD5 for several video files I tried, ie http://www.auby.no/files/video_tests/h264_720p_hp_3.1_600kbps_aac_mp3_dual_audio_harry_potter.mkv.

Sample output:

$ ffmpeg -i h264_720p_hp_3.1_600kbps_aac_mp3_dual_audio_harry_potter.mkv -c:v copy -c:a copy -f md5 -
ffmpeg version N-65758-g746095b Copyright (c) 2000-2014 the FFmpeg developers
  built on Aug 19 2014 20:11:25 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
  configuration: --enable-gpl --enable-version3 --enable-nonfree --disable-runtime-cpudetect --disable-ffserver --disable-ffplay --disable-encoders --disable-decoders --disable-debug --cpu=atom
  libavutil      54.  5.100 / 54.  5.100
  libavcodec     56.  0.101 / 56.  0.101
  libavformat    56.  1.100 / 56.  1.100
  libavdevice    56.  0.100 / 56.  0.100
  libavfilter     5.  0.100 /  5.  0.100
  libswscale      3.  0.100 /  3.  0.100
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  0.100 / 53.  0.100
[matroska,webm @ 0x259dc20] Could not find codec parameters for stream 2 (Audio: mp3, 48000 Hz, 2 channels, 160 kb/s): unspecified frame size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Guessed Channel Layout for  Input Stream #0.1 : stereo
Guessed Channel Layout for  Input Stream #0.2 : stereo
Input #0, matroska,webm, from 'h264_720p_hp_3.1_600kbps_aac_mp3_dual_audio_harry_potter.mkv':
  Metadata:
    title           : Harry Potter 4[Eng-Hindi]Dual.Audio BRRIP 720p-=[champ_is_here]=-
    encoder         : libebml v1.0.0 + libmatroska v1.0.0
    creation_time   : 2011-01-07 07:42:53
  Duration: 00:00:57.61, start: 0.000000, bitrate: 613 kb/s
    Stream #0:0(eng): Video: h264 (High), yuv420p, 1280x528 [SAR 1:1 DAR 80:33], 25 fps, 25 tbr, 1k tbn, 47.95 tbc
    Metadata:
      title           : -=[champ_is_here]=-
    Stream #0:1(eng): Audio: aac, 48000 Hz, 2 channels
    Metadata:
      title           : -=[champ_is_here]=-
    Stream #0:2(hin): Audio: mp3, 48000 Hz, 2 channels, 160 kb/s
    Metadata:
      title           : -=[champ_is_here]=-
Output #0, md5, to 'pipe:':
  Metadata:
    title           : Harry Potter 4[Eng-Hindi]Dual.Audio BRRIP 720p-=[champ_is_here]=-
    encoder         : Lavf56.1.100
    Stream #0:0(eng): Video: h264, yuv420p, 1280x528 [SAR 1:1 DAR 80:33], q=2-31, 25 fps, 23.98 tbn, 23.98 tbc
    Metadata:
      title           : -=[champ_is_here]=-
    Stream #0:1(eng): Audio: aac, 48000 Hz, stereo
    Metadata:
      title           : -=[champ_is_here]=-
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12 >= 12
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 35 >= 35
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 59 >= 59
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 83 >= 83
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 106 >= 106
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 129 >= 129
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 153 >= 153
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 176 >= 176
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 199 >= 199
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 222 >= 222
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 246 >= 246
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 270 >= 270
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 293 >= 293
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 317 >= 317
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 340 >= 340
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 364 >= 364
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 387 >= 387
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 410 >= 410
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 434 >= 434
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 457 >= 457
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 481 >= 481
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 505 >= 505
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 528 >= 528
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 551 >= 551
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 574 >= 574
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 598 >= 598
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 622 >= 622
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 645 >= 645
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 668 >= 668
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 692 >= 692
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 715 >= 715
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 739 >= 739
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 762 >= 762
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 786 >= 786
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 809 >= 809
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 832 >= 832
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 856 >= 856
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 879 >= 879
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 903 >= 903
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 926 >= 926
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 949 >= 949
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 973 >= 973
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 996 >= 996
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1020 >= 1020
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1044 >= 1044
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1067 >= 1067
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1090 >= 1090
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1114 >= 1114
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1137 >= 1137
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1161 >= 1161
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1184 >= 1184
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1208 >= 1208
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1231 >= 1231
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1254 >= 1254
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1278 >= 1278
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1301 >= 1301
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1325 >= 1325
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1348 >= 1348
[md5 @ 0x273cca0] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1371 >= 1371
MD5=487e837f7c08ee07cf44e8b473911a06
frame= 1439 fps=0.0 q=-1.0 Lsize=       0kB time=00:00:57.56 bitrate=   0.0kbits/s
video:2917kB audio:257kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Here is the summary of the results I get :

So it seems it has been fixed on git master, and the right MD5 is 487e837f7c08ee07cf44e8b473911a06.

Can anyone confirm this is a known bug that has been fixed?
And if yes, is it planned to backport the fix on the 2.2 branch?

Thank you

Change History (20)

comment:1 Changed 5 years ago by kurosu

  • Keywords aac h264 added

I can confirm git master produces MD5=487e837f7c08ee07cf44e8b473911a06 here.

You can replace alternatively -c:a copy by -an and -c:v copy by -vn to narrow down whether audio or video causes different MD5. You can also use git bisect to narrow down the commit that would cause a change in MD5.

Unfortunately, as is, it's difficult to determine for a developer if there's still a bug, as he needs systems producing conflicting results.

comment:2 Changed 5 years ago by cehoyos

Could you elaborate on what you are trying to show with your tests?
I didn't try to reproduce yet, but at least for some input files you can certainly get different md5 outputs with your exact command line if you are using different configure lines. The md5 values may also change depending on the version you test (again without changing the command line). All this cannot be surprising so I wonder now what exactly you want to test...
Please also note that not every change in behaviour is a bug fix, sometimes changes are committed because developers believe that the new behaviour makes more sense (but without claiming the old behaviour indicated a bug).

comment:3 Changed 5 years ago by cehoyos

Additionally please note that your command line can be considered ambiguous for the given input sample (there is not much indication which audio stream is the best one to quote the documentation), so I don't think there is any reason to expect identical md5 output for different FFmpeg versions.

comment:4 Changed 5 years ago by ahthovaikied

Replying to cehoyos:

Additionally please note that your command line can be considered ambiguous for the given input sample (there is not much indication which audio stream is the best one to quote the documentation), so I don't think there is any reason to expect identical md5 output for different FFmpeg versions.

You are right, so I added '-map' switches when relevant in my tests below to remove the ambiguity (although I seriously doubt the code in any version would do other than taking the first audio stream).

Replying to kurosu:

You can replace alternatively -c:a copy by -an and -c:v copy by -vn to narrow down whether audio or video causes different MD5.

Here are the results of my tests (done on the core i7):

  • When taking all streams, md5 do not match:
    $ ./ffmpeg_2.2.7 -loglevel quiet -i ../ref.mkv -map v -map a -c:v copy -c:a copy -f md5 -
    MD5=7b057ee0bbc1333af5955d7f534dbdb3
    $ ./ffmpeg_git_master -loglevel quiet -i ../ref.mkv -map v -map a -c:v copy -c:a copy -f md5 -
    MD5=db38f94668ac6f033b714877eb42e354
    
  • When taking only the video stream, md5 match:
    $ ./ffmpeg_2.2.7 -loglevel quiet -i ../ref.mkv -c:v copy -an -f md5 -
    MD5=5d015e35d9cf9253bf4896baae4b60d6
    $ ./ffmpeg_git_master -loglevel quiet -i ../ref.mkv -c:v copy -an -f md5 -
    MD5=5d015e35d9cf9253bf4896baae4b60d6
    
  • When taking only audio streams, md5 match:
    $ ./ffmpeg_2.2.7 -loglevel quiet -i ../ref.mkv -map a -vn -c:a copy -f md5 -
    MD5=9ac8559b4e2ba521e233567c04869b92
    $ ./ffmpeg_git_master -loglevel quiet -i ../ref.mkv -map a -vn -c:a copy -f md5 -
    MD5=9ac8559b4e2ba521e233567c04869b92
    

What can I conclude? It seems the problem is occurring only when feeding the MD5 calculation with all streams...

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:5 follow-up: Changed 5 years ago by ahthovaikied

Replying to cehoyos:

Could you elaborate on what you are trying to show with your tests?

I am trying to calculate the MD5 of media data contained in a file.
For a same input file, I expect the MD5 calculation to be identical across different machines unless I alter the file by adding, removing, modifying or reordering streams.
The MD5 calculation should also be reproducible across different FFmpeg versions, otherwise I think it's a bug (fixed or introduced).

Replying to cehoyos:

I didn't try to reproduce yet, but at least for some input files you can certainly get different md5 outputs with your exact command line if you are using different configure lines. The md5 values may also change depending on the version you test (again without changing the command line). All this cannot be surprising so I wonder now what exactly you want to test...

I am not following you here. Why should the FFmpeg build configuration have any influence on the MD5 produced?
I do not decode the streams, only feed them through the MD5 calculation. If I have the MKV demuxer and the ability to calculate the MD5 in an FFmpeg build, it should absolutely ALWAYS produce the same MD5.

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:6 Changed 5 years ago by ahthovaikied

Replying to kurosu:

You can also use git bisect to narrow down the commit that would cause a change in MD5.

Yup, I am trying that.
However I had trouble when running :

git bisect start
git bisect good HEAD
git bisect bad n2.2.7
git bisect run script_that_builds_ffmpeg_and_compare_md5

I had the error
Bisecting: a merge base must be tested
And then after the first build:
'bisect_state bad' returned error code 3

I am trying between HEAD and HEAD~5000 now.

EDIT:
Well it's not better:

$ git bisect start
$ git bisect good HEAD
$ git bisect bad HEAD~5000
Some good revs are not ancestor of the bad rev.
git bisect cannot work properly in this case.
Maybe you mistake good and bad revs?

Does anyone know how can I fix this and do a git bisect between git master head and tag n2.2.7?
Thanks

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:7 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by cehoyos

Replying to ahthovaikied:

Replying to cehoyos:

Could you elaborate on what you are trying to show with your tests?

I am trying to calculate the MD5 of media data contained in a file.

My question was: Why do want to calculate the MD5 of the demuxer output? What kind of bugs (problems, regressions) are you hoping to find or to avoid?

For a same input file, I expect the MD5 calculation to be identical across different machines unless I alter the file by adding, removing, modifying or reordering streams.

As said, this is missing several conditions like FFmpeg version and compilation options.

The MD5 calculation should also be reproducible across different FFmpeg versions, otherwise I think it's a bug (fixed or introduced).

Why?
As said, behaviour changes are possible without a bug being fixed or introduced.
This is of course different for decoder output if a specification exists that requests bitexact output as for H.264 or a sample implementation as for VP8.

Replying to cehoyos:

I didn't try to reproduce yet, but at least for some input files you can certainly get different md5 outputs with your exact command line if you are using different configure lines. The md5 values may also change depending on the version you test (again without changing the command line). All this cannot be surprising so I wonder now what exactly you want to test...

I am not following you here. Why should the FFmpeg build configuration have any influence on the MD5 produced?

Since libavformat (the demuxer) depends on libavcodec you shouldn't be surprised that demuxers produce different output depending on the compilation options used.

I do not decode the streams, only feed them through the MD5 calculation. If I have the MKV demuxer and the ability to calculate the MD5 in an FFmpeg build, it should absolutely ALWAYS produce the same MD5.

No.

Concerning the bisect: Did you find a version that produces the output you want and a version that produces a different output on the same system and with the same compilation options?

comment:8 in reply to: ↑ 7 ; follow-up: Changed 5 years ago by ahthovaikied

Replying to cehoyos:

My question was: Why do want to calculate the MD5 of the demuxer output? What kind of bugs (problems, regressions) are you hoping to find or to avoid?

I am not doing this to find regressions. For reasons I won't develop here, I need to calculate a checksum unique for media streams of a file, and I need to reproduce the calculation on various environments (different CPUs and build options, I can freeze the FFmpeg version however). This allows me for example to detect a single bit corruption of the file.
I could calculate the MD5 of the whole file, but:

  • I need to embed the MD5 in the file once it's calculated
  • I want to be able to alter metadata without changing the MD5

Replying to cehoyos:

For a same input file, I expect the MD5 calculation to be identical across different machines unless I alter the file by adding, removing, modifying or reordering streams.

As said, this is missing several conditions like FFmpeg version and compilation options.

The MD5 calculation should also be reproducible across different FFmpeg versions, otherwise I think it's a bug (fixed or introduced).

Why?
As said, behaviour changes are possible without a bug being fixed or introduced.
This is of course different for decoder output if a specification exists that requests bitexact output as for H.264 or a sample implementation as for VP8.

Replying to cehoyos:

I didn't try to reproduce yet, but at least for some input files you can certainly get different md5 outputs with your exact command line if you are using different configure lines. The md5 values may also change depending on the version you test (again without changing the command line). All this cannot be surprising so I wonder now what exactly you want to test...

I am not following you here. Why should the FFmpeg build configuration have any influence on the MD5 produced?

Since libavformat (the demuxer) depends on libavcodec you shouldn't be surprised that demuxers produce different output depending on the compilation options used.

I do not decode the streams, only feed them through the MD5 calculation. If I have the MKV demuxer and the ability to calculate the MD5 in an FFmpeg build, it should absolutely ALWAYS produce the same MD5.

No.

OK, it seems we disagree on only one thing:
You are saying that a demuxer does not necessarily have a bit exact, reproducible output, and that it could change depending on build options or CPU, am I correct?
I can perfectly understand that a decoder can have a non reproducible output, because of floating point errors, integer vs floating point calculations, or because a decoder decides to ignore that type of frame to go faster, etc.
But how can a demuxer not be bit exact?
Isn't there a standard that precisely describes for example that in a Matroska file, if byte w has value x, then the chunk from offset y to z is part of a video stream? Then how can the implementation have any latitude to ignore, add or change some bytes of that stream?

Replying to cehoyos:

Concerning the bisect: Did you find a version that produces the output you want and a version that produces a different output on the same system and with the same compilation options?

Yes, see my comment above (https://trac.ffmpeg.org/ticket/3871#comment:4 ), tests were done on the same core i7 PC, with the same build options.

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:9 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by cehoyos

Replying to ahthovaikied:

OK, it seems we disagree on only one thing:
You are saying that a demuxer does not necessarily have a bit exact, reproducible output, and that it could change depending on build options or CPU, am I correct?

Yes, and the same is true for (at least some) decoders.
(And please remember that in ticket #3524 we found out that you can get different output with identical build options and identical CPU apparently depending on the compiler, and that nobody so far claimed that the used compiler is buggy.)

Since demuxers can depend on libavcodec, one implies the other.

Isn't there a standard that precisely describes for example that in a Matroska file, if byte w has value x, then the chunk from offset y to z is part of a video stream?

If you have a Matroska file for which libavformat returns broken packets with a default configuration, please report it here (or on the user mailing list).
But I really wouldn't rely on FFmpeg internals to check for file validity. You can rely on the output of the decoders but please remember that we do find bugs in decoders regularly.

Replying to cehoyos:

Concerning the bisect: Did you find a version that produces the output you want and a version that produces a different output on the same system and with the same compilation options?

Yes, see my comment above (https://trac.ffmpeg.org/ticket/3871#comment:4 ), tests were done on the same core i7 PC, with the same build options.

Sorry, I expected you to write something like "yes, it works with version x but fails with y". Anyway, please try:

$ make distclean
$ git bisect reset
$ git bisect start
$ git checkout x
$ git bisect good
$ git checkout y
$ git bisect bad

Then build and test and depending on the result either use make distclean && git bisect good or make distclean && git bisect bad to continue testing and find the version introducing the problem you see. I run bisects on FFmpeg every day so I can help you if needed, but you may have to explain how I can reproduce the problem without using a script.

Last edited 5 years ago by cehoyos (previous) (diff)

comment:10 in reply to: ↑ 9 ; follow-up: Changed 5 years ago by ahthovaikied

Replying to cehoyos:

(And please remember that in ticket #3524 we found out that you can get different output with identical build options and identical CPU apparently depending on the compiler, and that nobody so far claimed that the used compiler is buggy.)

My understanding is that it was due to the use of avx instructions, and that in that case did change decoder output.
I am now trying to calculate MD5 without decoding, so this is unrelated.

Replying to cehoyos:

Since demuxers can depend on libavcodec, one implies the other.

I don't understand this dependency, especially since I built FFmpeg with --disable-encoders --disable-decoders.

Replying to cehoyos:

If you have a Matroska file for which libavformat returns broken packets with a default configuration, please report it here (or on the user mailing list).

I am not able to characterize 'broken' here, however I know that in same system with different FFmpeg versions (or with version 2.2.7 on different CPU), the demuxing operation does not produce the same output.
The results in comment:4 that show that taking steams individually produce the same MD5, but taking them together does not, make me quite suspicious.
You seem to think this is normal, can you please explain why the demuxing process could produce different output with the same file?

Replying to cehoyos:

But I really wouldn't rely on FFmpeg internals to check for file validity.

Since 'MD5' is an exposed output format, can you really consider this internal?

Replying to cehoyos:

Sorry, I expected you to write something like "yes, it works with version x but fails with y". Anyway, please try:

$ make distclean
$ git bisect reset
$ git bisect start
$ git checkout x
$ git bisect good
$ git checkout y
$ git bisect bad

Then build and test and depending on the result either use make distclean && git bisect good or make distclean && git bisect bad to continue testing and find the version introducing the problem you see. I run bisects on FFmpeg every day so I can help you if needed, but you may have to explain how I can reproduce the problem without using a script.

Thanks for your help.
I get a difference of MD5 between tag n2.2.7 and tag n2.3.
I tried your command sequence, and I get the same error as previously: Bisecting: a merge base must be tested warning after the first git bisect good, and once I have tested the 2 "anchors", the command git bisect bad returns error code 3 and git does not change the current HEAD.

To check if current HEAD is good or bad, run: ./ffmpeg -loglevel quiet -i ref.mkv -map v -map a -c:v copy -c:a copy -f md5 - (ref.mkv being the file whose URL can be found in my first post).
If the output is MD5=db38f94668ac6f033b714877eb42e354, it's "good", otherwise "bad".

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:11 in reply to: ↑ 10 ; follow-up: Changed 5 years ago by cehoyos

Replying to ahthovaikied:

Replying to cehoyos:

(And please remember that in ticket #3524 we found out that you can get different output with identical build options and identical CPU apparently depending on the compiler, and that nobody so far claimed that the used compiler is buggy.)

My understanding is that it was due to the use of avx instructions

Neither jamal nor I were able to reproduce an issue with avx so I would say this is an impossible explanation.

Replying to cehoyos:

Since demuxers can depend on libavcodec, one implies the other.

I don't understand this dependency

There is nothing to understand, it is just a fact: Just run ldd libavformat on a shared library to see the dependency.

especially since I built FFmpeg with --disable-encoders --disable-decoders.

Meaning you cannot compare the behaviour with a build using default configuration (at least for some files and some command lines).

Replying to cehoyos:

If you have a Matroska file for which libavformat returns broken packets with a default configuration, please report it here (or on the user mailing list).

I am not able to characterize 'broken' here, however I know that in same system with different FFmpeg versions (or with version 2.2.7 on different CPU), the demuxing operation does not produce the same output.
The results in comment:4 that show that taking steams individually produce the same MD5, but taking them together does not, make me quite suspicious.

Yes, it indicates that you refuse to believe me that testing demuxers without decoders has very limited use.

You seem to think this is normal, can you please explain why the demuxing process can produce different output with the same file?

I explained several times that there are multiple reasons (or explanations) why the output can be different (including different FFmpeg configuration). Please note that you refuse to explain your usecase.

Replying to cehoyos:

But I really wouldn't rely on FFmpeg internals to check for file validity.

Since 'MD5' is an exposed output format, can you really consider this internal?

Nothing about the md5 output format is internal. Testing demuxer output means imo that you rely on FFmpeg internals that may change, be it because of a bugfix or because the behaviour of the demuxer changes.

Replying to cehoyos:

Sorry, I expected you to write something like "yes, it works with version x but fails with y". Anyway, please try:

$ make distclean
$ git bisect reset
$ git bisect start
$ git checkout x
$ git bisect good
$ git checkout y
$ git bisect bad

Then build and test and depending on the result either use make distclean && git bisect good or make distclean && git bisect bad to continue testing and find the version introducing the problem you see. I run bisects on FFmpeg every day so I can help you if needed, but you may have to explain how I can reproduce the problem without using a script.

Thanks for your help.

I get a difference of MD5 between tag n2.2.7 and tag n2.3.

Since 2.3 is "newer" than n2.2.7 you did mark n2.3 (and all revisions that behave the same) as bad and n2.2.7 (and all revisions with the same output) as good? This is the only way git bisect works and it will show you the commit that introduces the change.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 5 years ago by ahthovaikied

Replying to cehoyos:

There is nothing to understand, it is just a fact: Just run ldd libavformat on a shared library to see the dependency.

I believe you :)
I just don't understand the logic behind the fact that both libraries have a dependency on each other, since to my understanding (admittedly limited about FFmpeg's codebase and codecs internals), (de)muxing and encoding/decoding are independent operations.

Replying to cehoyos:

especially since I built FFmpeg with --disable-encoders --disable-decoders.

Meaning you cannot compare the behaviour with a build using default configuration (at least for some files and some command lines).

I can if you believe this can lead to interesting results. I did add these switches only to speed up compilation, and because I believed they had no impact on the codepath I use for my use case.

Replying to cehoyos:

Replying to cehoyos:

If you have a Matroska file for which libavformat returns broken packets with a default configuration, please report it here (or on the user mailing list).

I am not able to characterize 'broken' here, however I know that in same system with different FFmpeg versions (or with version 2.2.7 on different CPU), the demuxing operation does not produce the same output.
The results in comment:4 that show that taking steams individually produce the same MD5, but taking them together does not, make me quite suspicious.

Yes, it indicates that you refuse to believe me that testing demuxers without decoders has very limited use.

What do you mean?
I can perfectly understand that most users won't need the MD5 stuff, but for example remuxing streams in another file without transcoding is a pretty common use case, no?

Replying to cehoyos:

I explained several times that there are multiple reasons (or explanations) why the output can be different (including different FFmpeg configuration).

Yes you gave me factors that have an influence on difference I am seeing, but not the root logical explanation.
Sorry if I am insistent. Its the logic that I don't understand. I'm no media format expert, so I will give you a simple example of what I don't understand.
Let's say you write a SRT subtitle file with the characters "123" inside it. I know it's probably not a valid subtitle file but this is just for the example. You mux this in a Matroska file. So the result is a Matroska file with a single stream that is the SRT subtitle data. Now let's say you want to demux the Matroska file to get the original SRT data again.
What I don't understand is how the demuxing operation can result in something else than the original SRT file with the 3 chars.
I am probably missing something stupid, but please explain me.

Replying to cehoyos:

Please note that you refuse to explain your usecase.

What do you want to know?
I believe I did explain my use case in the beginning of comment:8.

Replying to cehoyos:

Nothing about the md5 output format is internal. Testing demuxer output means imo that you rely on FFmpeg internals that may change, be it because of a bugfix or because the behaviour of the demuxer changes.

So let's say I want to copy streams (without transcoding) in another Matroska file. That is the same command line except you replace md5 by matroska, and stdout by a filepath. How is that different?
Or are saying that in this example, the Matroska muxer may symetrically cancel some differences of the demuxer output?

Replying to cehoyos:

Since 2.3 is "newer" than n2.2.7 you did mark n2.3 (and all revisions that behave the same) as bad and n2.2.7 (and all revisions with the same output) as good? This is the only way git bisect works and it will show you the commit that introduces the change.

Yes I know how git bisect works. That is what I did.
I believe the error is due to the fact that n2.2.7 is not a direct ancestor of n2.3.
I believe you should be able to reproduce the error by running:

  git bisect start
  git bisect good n2.3
  git bisect bad n2.2.7
  git bisect bad -> this command appears to succeed, but returns 3, and git does not select another HEAD to continue the bisect

comment:13 in reply to: ↑ 12 ; follow-up: Changed 5 years ago by cehoyos

Replying to ahthovaikied:

So let's say I want to copy streams (without transcoding) in another Matroska file. That is the same command line except you replace md5 by matroska, and stdout by a filepath. How is that different?

It isn't, it can fail for some files depending on your configure options.

Please understand that this isn't necessarily the case for your example in this ticket (your bisect will show once you try it as suggested below), I just want to explain to you that imo your whole process is flawed (at least with the configure line you are apparently using) because it will fail for some examples. And I am not convinced that a demuxers output is specifed the way you seem to believe it is (and the way you test it).

Replying to cehoyos:

Since 2.3 is "newer" than n2.2.7 you did mark n2.3 (and all revisions that behave the same) as bad and n2.2.7 (and all revisions with the same output) as good? This is the only way git bisect works and it will show you the commit that introduces the change.

Yes I know how git bisect works. That is what I did.

Funny that your output shows that you tried something different that cannot work (and this has nothing to do with FFmpeg).

comment:14 in reply to: ↑ 13 Changed 5 years ago by ahthovaikied

Replying to cehoyos:

Replying to ahthovaikied:

So let's say I want to copy streams (without transcoding) in another Matroska file. That is the same command line except you replace md5 by matroska, and stdout by a filepath. How is that different?

It isn't, it can fail for some files depending on your configure options.

Please understand that this isn't necessarily the case for your example in this ticket (your bisect will show once you try it as suggested below), I just want to explain to you that imo your whole process is flawed (at least with the configure line you are apparently using) because it will fail for some examples. And I am not convinced that a demuxers output is specifed the way you seem to believe it is (and the way you test it).

I get your point of view.
Unless you can explain me why demuxing logically depends on decoding (see my SRT example question above), I think there is not point repeating the same arguments.

Replying to cehoyos:

Funny that your output shows that you tried something different that cannot work (and this has nothing to do with FFmpeg).

Why the sarcastic tone? Maybe you can tell me what you think I did wrong.
I just tried on another machine and I get the same error 3.
As you don't seem to believe me, here is the full Bash output (some parts are in French):

$ git bisect reset
Pas de bissection en cours.
$ git bisect start
$ git checkout n2.3
Note: checking out 'n2.3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD est maintenant sur bef4d9b... RELEASE_NOTES: update
$ git bisect good
$ git checkout n2.2.7
La position précédente de HEAD était bef4d9b... RELEASE_NOTES: update
HEAD est maintenant sur 49fa398... Changelog: add entry for proresenc
$ git bisect bad
Bisecting: a merge base must be tested
$ ./configure --enable-gpl \
>               --enable-version3 \
>               --enable-nonfree \
>               --disable-runtime-cpudetect \
>               --disable-ffserver \
>               --disable-ffplay \
>               --disable-encoders \
>               --disable-decoders \
>               --disable-filters \
>               --disable-debug \
>               --cpu=$(get_cpu_arch) &> /dev/null
$ make -j 8 &> /dev/null
$ ./ffmpeg -loglevel quiet -i ../ref.mkv -map a -map v -c:v copy -c:a copy -f md5 -
MD5=d43af925610b84575d609dedf3954310
$ make distclean 
$ git bisect bad
The merge base e4a6310cce5c1663f68253c50f364fc0c055f05a is bad.
This means the bug has been fixed between e4a6310cce5c1663f68253c50f364fc0c055f05a and [bef4d9bf87f755be62c8cc35b1c333596e41b3c6].
$ echo $?
3
$ git rev-parse HEAD
e4a6310cce5c1663f68253c50f364fc0c055f05a

As you can see the last command returned an error and git bisect did not continue (the HEAD is the same).
I believe you can get the same behaviour if you try to do the same on your machine.

comment:15 follow-up: Changed 5 years ago by cehoyos

Please read my comment:11 again, you are trying the opposite of what I suggested there.

comment:16 in reply to: ↑ 15 Changed 5 years ago by ahthovaikied

Replying to cehoyos:

Please read my comment:11 again, you are trying the opposite of what I suggested there.

Sorry, I get it now: good revisions should be marked as "bad".
Thanks for your help :)

So I ran a git bisect with my script, here is the bisect log:

git bisect start
# bad: [bef4d9bf87f755be62c8cc35b1c333596e41b3c6] RELEASE_NOTES: update
git bisect bad bef4d9bf87f755be62c8cc35b1c333596e41b3c6
# good: [49fa398858df1a1e425740672de5fb4819b4d947] Changelog: add entry for proresenc
git bisect good 49fa398858df1a1e425740672de5fb4819b4d947
# good: [e4a6310cce5c1663f68253c50f364fc0c055f05a] update for 2.2
git bisect good e4a6310cce5c1663f68253c50f364fc0c055f05a
# good: [8522dd380b2a0f98cfaafcf0ae64bd46ac031ae1] Merge commit 'c7603b3c243331057300337a61464e6ac4a605cb'
git bisect good 8522dd380b2a0f98cfaafcf0ae64bd46ac031ae1
# good: [da53de07306a301830b234a38bc103c6af9ded7c] tests: add adpcm trellis tests
git bisect good da53de07306a301830b234a38bc103c6af9ded7c
# bad: [a9f7972844b70c8e94520f52080884bb1507171f] Merge commit '1b04eb20f7e3f0a71f73ba91efcc3d60a435e443'
git bisect bad a9f7972844b70c8e94520f52080884bb1507171f
# good: [9025072e6c25ffd4507f0268b53743f9c4d52cd6] avcodec/h264_slice: support skipping loop filtering for non key frames
git bisect good 9025072e6c25ffd4507f0268b53743f9c4d52cd6
# bad: [e3fd263f0b73e4425192d6dd1ab18027ecaa35db] Show duration for large asf files as written in the file header.
git bisect bad e3fd263f0b73e4425192d6dd1ab18027ecaa35db
# bad: [1ebc77bc7d68748598878c08c85a571b526a729f] Merge commit '49a242687cf44f86570b706db3c5912ff06bc6c2'
git bisect bad 1ebc77bc7d68748598878c08c85a571b526a729f
# good: [0608bc65025a29b2ec56aa17dd76d76ed730be11] swresample/audioconvert: fix () in FMT_PAIR_FUNC()
git bisect good 0608bc65025a29b2ec56aa17dd76d76ed730be11
# good: [bd148ce07de08dcd03178e869bacf1e1ef6358df] Merge commit 'cfbdd7ffbd9fe14d110fd1bb89bf52f0f7bde016'
git bisect good bd148ce07de08dcd03178e869bacf1e1ef6358df
# good: [88514378bac99872265dad28072fb30160b26bfa] avcodec/ass: move playres parameters below scripttype
git bisect good 88514378bac99872265dad28072fb30160b26bfa
# bad: [1d54f5108477938268d51162be536cecd746e56a] avformat/mux: simplify ff_choose_timebase()
git bisect bad 1d54f5108477938268d51162be536cecd746e56a
# bad: [ac293b66851f6c4461eab03ca91af59d5ee4e02e] Merge commit '194be1f43ea391eb986732707435176e579265aa'
git bisect bad ac293b66851f6c4461eab03ca91af59d5ee4e02e
# bad: [194be1f43ea391eb986732707435176e579265aa] lavf: switch to AVStream.time_base as the hint for the muxer timebase
git bisect bad 194be1f43ea391eb986732707435176e579265aa
# first bad commit: [194be1f43ea391eb986732707435176e579265aa] lavf: switch to AVStream.time_base as the hint for the muxer timebase

The commit that introduced the change is this one:

commit 194be1f43ea391eb986732707435176e579265aa
Author: Anton Khirnov <>
Date:   Sun May 18 12:12:59 2014 +0200

    lavf: switch to AVStream.time_base as the hint for the muxer timebase
    
    Previously, AVStream.codec.time_base was used for that purpose, which
    was quite confusing for the callers. This change also opens the path for
    removing AVStream.codec.
    
    The change in the lavf-mkv test is due to the native timebase (1/1000)
    being used instead of the default one (1/90000), so the packets are now
    sent to the crc muxer in the same order in which they are demuxed
    (previously some of them got reordered because of inexact timestamp
    conversion).

EDIT : Email removed.

Last edited 5 years ago by ahthovaikied (previous) (diff)

comment:17 follow-up: Changed 5 years ago by cehoyos

  • Component changed from avcodec to avformat
  • Keywords mkv added; md5 removed

I don't think this will (or should) be backported but I will leave the ticket open for a few days to give others the possibility to comment.

comment:18 in reply to: ↑ 17 Changed 5 years ago by ahthovaikied

Replying to cehoyos:

I don't think this will (or should) be backported but I will leave the ticket open for a few days to give others the possibility to comment.

Thank you. If it's not I'll just switch to the 2.3 branch.

comment:19 Changed 5 years ago by cehoyos

  • Resolution set to wontfix
  • Status changed from new to closed

Feel free to send a patch for 2.2 to the developer mailing list if you want more people to comment.

comment:20 Changed 5 years ago by ahthovaikied

I am having a similar problem with some files, but this time reproducible with git master, I'll open a new ticket.

Note: See TracTickets for help on using tickets.