Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#1876 closed defect (invalid)

vc1dec.c multi-threading decode crash issue

Reported by: DonMoir Owned by:
Priority: important Component: avcodec
Version: unspecified Keywords: vc1 crash
Cc: onemda@gmail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

In vc1_decode_frame there is:

if (ff_msmpeg4_decode_init(avctx) < 0

ff_vc1_decode_init_alloc_tables(v) < 0)

If this entire statement is not locked down in a multi-threaded multi-video environment, then crash and burn.

No way I know of to reproduce crash using ffmpeg command line tools.

The way I produce crash is to pump more than one WMV3 files at ffmpeg simultaneously. Multiple instances of a file are using for playback, thumbnails, and other things. Then when calling avcodec_decode_video2 you will get spurious crashes depending on the current state of things.

All other file formats I have tested seem to be ok. Just any format that makes use of vc1_decode_frame is suspect.

The file I used for testing is located here: (283 MB)

http://sms.pangolin.com/temp/WMV3_vc1_decode_frame_crash.wmv

Change History (40)

comment:1 Changed 4 years ago by cehoyos

  • Priority changed from critical to important

Please provide a backtrace etc. as explained on http://ffmpeg.org/bugreports.html

comment:2 Changed 4 years ago by cehoyos

  • Keywords vc1 crash added

comment:3 follow-up: Changed 4 years ago by DonMoir

There is not going to be any backtrace to look at as you can't get it to crash with ffmpeg command line tools and I am using a windows app to create the multi-threaded situation.

Beyond that, crashes are spurious and can be anywhere. This is do to memory overwrites or similar.

Mostly though, the point of crash is at avcodec_decode_video2 and for reasons I have already stated.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:4 Changed 4 years ago by DonMoir

If you think you have of a way I can reproduce the problem with your tools then let me know.

comment:5 in reply to: ↑ 3 Changed 4 years ago by cehoyos

Replying to DonMoir:

There is not going to be any backtrace to look at as you can't get it to crash with ffmpeg command line tools and I am using a windows app to create the multi-threaded situation.

Could you explain why "using a windows app" makes providing a backtrace not going to happen?

comment:6 follow-up: Changed 4 years ago by DonMoir

Can you explain how I am suppose to create a backtrace with a windows app using MSVC where the MinGW debug symbols are not compatible? The only backtrace I get outside of a few app calling functions is a crash in avcodec_decode_video2 but can happen elsewhere and spurious as mentioned.

A thread problem has already been acknowledged in the mentioned code as well.

If you want an asm listing I can give that but it is probably not necessary.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:7 in reply to: ↑ 6 Changed 4 years ago by cehoyos

Replying to DonMoir:

Can you explain how I am suppose to create a backtrace with a windows app using MSVC where the MinGW debug symbols are not compatible?

Compile with MSVC to get debug symbols.

comment:8 follow-ups: Changed 4 years ago by DonMoir

ffmpeg c99 code is not compatible with MSVC c++ code. That is MSVC won't compile c99 code.

I have seen some mention of c99 to c89 but not dealing with that right now.

So I take it you have no tools for testing multi-threaded issues ?

But look at it this way. The crash can essentially happen anywhere and just all depends. So you want a worthless crash report when we already know where the problem is ?

Last edited 4 years ago by DonMoir (previous) (diff)

comment:9 in reply to: ↑ 8 Changed 4 years ago by cehoyos

Replying to DonMoir:

ffmpeg c99 code is not compatible with MSVC c++ code.

FFmpeg is tested regularly with MSVC compiler, please see http://fate.ffmpeg.org/
(And of course http://ffmpeg.org/platform.html#Microsoft-Visual-C_002b_002b)

comment:10 in reply to: ↑ 8 Changed 4 years ago by cehoyos

Replying to DonMoir:

So you want a worthless crash report when we already know where the problem is ?

Patches that fix bugs are even more welcome than bug reports, you can send them directly to ffmpeg-devel, no need to open a ticket, just include an explanation in your mail.

comment:11 follow-up: Changed 4 years ago by DonMoir

Knowing where the problem is and getting in the optimal fix are a couple different things. I was hoping for more input on mailing list but only minimal response. I can kind of understand that since most do not use multi-video and multi-threaded in combination.

If there is a consensus on locking the mentioned code statement then the fix is very easy.

If there is not a consensus then we are back to square one wondering what the best fix is.

comment:12 in reply to: ↑ 11 Changed 4 years ago by cehoyos

Replying to DonMoir:

Knowing where the problem is and getting in the optimal fix are a couple different things. I was hoping for more input on mailing list but only minimal response.

A safe way of getting more input is sending a patch that fixes your crash.

comment:13 Changed 4 years ago by DonMoir

What you call my crash has already been fixed locally and just trying to relate to you that crash is there waiting to happen to someone else. As you have said many times, "Crashes are important to us".

When will I have the time to read thru the patch rules, the submission rules, and the FATE rules? I just don't know.

Several days behind now tracking this issue.

comment:14 Changed 4 years ago by cehoyos

  • Resolution set to needs_more_info
  • Status changed from new to closed

Please reopen this ticket if you can explain how the crash you are seeing can be reproduced or if you can provide a backtrace.

comment:15 Changed 4 years ago by DonMoir

It can be hard to reproduce and it may take me about 10 tries. It came to me as a bug from a beta tester where it would happen on occasion. So then I wrote some code to trigger it more often. Running, opening multiple instances of the above file at the same time is the best way I can reproduce it. It does not happen anywhere else except in code that calls vc1_decode_frame. The crash can be anywhere because of memory overwrites etc.

My only current way to fix this is to have a unique build. After looking around some more and hints from Hendrik, it appears the fix is best put in vc1_decode_frame.

The problem code in vc1_decode_frame is:

    if (!s->context_initialized) {
        if (ff_msmpeg4_decode_init(avctx) < 0 || ff_vc1_decode_init_alloc_tables(v) < 0)
            goto err;
        s->low_delay = !avctx->has_b_frames || v->res_sprite;
        if (v->profile == PROFILE_ADVANCED) {
            s->h_edge_pos = avctx->coded_width;
            s->v_edge_pos = avctx->coded_height;
        }
    }

The offending statement is:

        if (ff_msmpeg4_decode_init(avctx) < 0 || ff_vc1_decode_init_alloc_tables(v) < 0)

After adding avpriv_lock_codec and avpriv_unlock_codec since they don't exist.

A couple ways to organize the fix but this way produces no addional testing or assignments:

    if (!s->context_initialized) {
        avpriv_lock_avcodec ();
        if (ff_msmpeg4_decode_init(avctx) < 0 || ff_vc1_decode_init_alloc_tables(v) < 0) {
            avpriv_unlock_avcodec ();
            goto err;
        }
        avpriv_unlock_avcodec ();
        s->low_delay = !avctx->has_b_frames || v->res_sprite;
        if (v->profile == PROFILE_ADVANCED) {
            s->h_edge_pos = avctx->coded_width;
            s->v_edge_pos = avctx->coded_height;
        }
    }

I don't like this way with the 2 unlocks and I would probably restructure to an initialize_context function but trying to keep it straight forward here.

If there is any way else to fix it with less impact, I don't know.

Version 4, edited 4 years ago by DonMoir (previous) (next) (diff)

comment:16 Changed 4 years ago by DonMoir

  • Resolution needs_more_info deleted
  • Status changed from closed to reopened

comment:17 Changed 4 years ago by heleppkes

I would suggest to simply submit a patchset which introduces the new lock functions, and uses them here (separate patches). We can discuss alternative cleanup ideas on the mailing list better then here.

comment:18 Changed 4 years ago by cehoyos

  • Resolution set to needs_more_info
  • Status changed from reopened to closed

comment:19 follow-up: Changed 4 years ago by DonMoir

There is not much more info I can provide and it appears being arrogant is more important than a crash. For anyone who actually understands the issue, the detail provided is more than enough.

comment:20 Changed 4 years ago by richardpl

  • Analyzed by developer set
  • Cc onemda@gmail.com added
  • Reproduced by developer set
  • Resolution needs_more_info deleted
  • Status changed from closed to reopened

Please stop keep trying to close bugs which are not yet fixed.

comment:21 in reply to: ↑ 19 Changed 4 years ago by cehoyos

Replying to DonMoir:

There is not much more info I can provide

Is there a backtrace that I missed?
Or any code a developer wanting to fix the problem could test?

To me it appears as if you did not provide any relevant information at all.

and it appears being arrogant is more important than a crash.

Yes, this is exactly the feeling I have about your reports.
You never provide any necessary information (including falsely pretending this isn't possibly, see above), instead you expect the developers to fill in the missing details. Combined with the fact that you refuse to send patches (that you say are easy to produce or you have even already tested them but time does not allow you to send them) and your constant ignoring of mailing list rules makes your attitude truly a seldom example of arrogance!

Last edited 4 years ago by cehoyos (previous) (diff)

comment:22 follow-up: Changed 4 years ago by DonMoir

Yeah Carl, to you it probably does seem like there is not enough detail. For a programmer who understands it is more than enough.

If you had sufficient tools so I could show you the problem then even you could understand it.

comment:23 in reply to: ↑ 22 Changed 4 years ago by cehoyos

Replying to DonMoir:

Yeah Carl, to you it probably does seem like there is not enough detail. For a programmer who understands it is more than enough.

Thank you!

If you had sufficient tools so I could show you the problem then even you could understand it.

Could you tell me which tools I need?

comment:24 follow-up: Changed 4 years ago by DonMoir

By the way, I have many more things to do then dealing with ffmpeg. Finding bugs in ffmpeg can take a lot of time. Reporting them sometimes takes even more time then it took to find them.

I could spend all the time working just with ffmpeg but I can't afford to do that.

It seems quite reasonable to me that if I point you to the actual bug which may have taken days to find, someone could spend an hour on it rather than wasting time on this ridiculous correspondence.

You would need tools that can spin off multiple instances of a file in multiple threads and play them at the same time for this particular problem.

Even then it can be hard to reproduce. Sometimes I can get it to crash right away and sometimes not. But I can always get it to crash.

Pretending my ass. The crash report is not very relevant but I don't expect you to understand that either.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:25 in reply to: ↑ 24 Changed 4 years ago by cehoyos

Replying to DonMoir:

By the way, I have many more things to do then dealing with ffmpeg.

Guess what: You are not the only one.

Finding bugs in ffmpeg can take a lot of time. Reporting them sometimes takes even more time then it took to find them.

I already told you numerous times: It is ok if you add an analysis to a report, it is not ok if the analysis replaces the actual report.

I could spend all the time working just with ffmpeg but I can't afford to do that.

It seems quite reasonable to me that if I point you to the actual bug which may have taken days to find, someone could spend an hour on it rather than wasting time on this ridiculous correspondence.

Yes, I agree: Instead of sharing insults, please provide a little more information!
And apart from that: What should the developer do this hour? Don't you think it would make much more sense to work on a ticket that contains all necessary information? Or one where the reporter makes us believe he is actually interested in helping?

You would need tools that can spin off multiple instances of a file in multiple threads and play them at the same time for this particular problem.

Names?
Links?

Even then it can be hard to reproduce. Sometimes I can get it to crash right away and sometimes not. But I can always get it to crash.

Then please provide the backtrace when it crashes, consider reading http://ffmpeg.org/bugreports.html if you believe I invented the idea of a backtrace because you provide such useful reports.

Pretending my ass. The crash report is not very relevant but I don't expect you to understand that either.

See above.

comment:26 Changed 4 years ago by DonMoir

I have no clue what tools you can use. Someone in the ffmpeg group should probably add it to the list of things to do.

The developer will know what to do as outlined above.

comment:27 Changed 4 years ago by DonMoir

Which crash report would you like ? pick a number 1 thru 10. It can crash anywhere do to memory overwrites and so not relevant.

I have also made steps in the direction of providing patches. I now have a linux machine with all necessary build tools. But still patches from me will have to wait.

comment:28 Changed 4 years ago by thegeek

I suggest using a mingw-compiled gdb to produce backtraces, this has worked for me before.
I think you should be able to run your custom reproduction tools under gdb and then have gdb catch the exception (hopefully with backtrace) when it occurs.

comment:29 Changed 4 years ago by DonMoir

Can't do that geek, because the app is a windows app and complex. Again stack trace in this scenario is not useful anyway.

comment:30 Changed 4 years ago by cehoyos

MSVC compilation is regularly tested, you can provide Windows backtraces now.

Which of course brings me to the next question: Why do you need a Linux box?

comment:31 Changed 4 years ago by DonMoir

Yeah I suppose if I wanted to mess around with c99-to-c89 and spend more time on dealing with that I could do that. But if c99-to-c89 is so bullet proof why don't you just convert over all of ffmpeg to c89.

Most people I know went to c++ from c89 and c99 did not even exist back then. Even if you did not want to use classes etc, c++ is a better c.

The ffmpeg group chose to use the syntax version of c that is used less often and so all hassle with that has been passed on to everyone else.

linux gives me a cleaner and more well defined environment, grabbing libraries, etc for dealing with ffmpeg rather than dealing with msys.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:32 Changed 4 years ago by DonMoir

Appears to be fixed. I don't have a WMV3 file where the width and height changes so could not test very well in vc1_decode_frame when it has to be reinitialized but I faked it and seems ok.

comment:33 Changed 4 years ago by cehoyos

  • Resolution set to invalid
  • Status changed from reopened to closed

comment:34 follow-up: Changed 3 years ago by michael

  • Resolution changed from invalid to fixed

Ticket reporter confirmed that this has been fixed
The ticket may be invalid too but we should care more about what is fixed and what not.

comment:35 in reply to: ↑ 34 Changed 3 years ago by cehoyos

Replying to michael:

Ticket reporter confirmed that this has been fixed

How can I backport the (important looking) fix to our release branches?

comment:36 Changed 3 years ago by DonMoir

The problem was, it was known to be a problem and god knows I spent a lot of time in mailing list and here on it and tracking it down, but was never closed properly and so difficult to tell exactly who fixed it and what release. You can probably search the mailing list to find more info if you want. It is indeed fixed and I beat the hell out of it with hundreds of files.

comment:37 Changed 3 years ago by DonMoir

By the way, I already mentioned this above, but in this case, neither command line applied and a back trace was useless. Command line tools were not capable of reproducing it because of their single file instance nature. Back trace was useless because it could crash anywhere in totally unrelated ffmpeg code or in my own code. It varied every time and it was nuts.

While it was hard to reproduce, it would happen every now and then with normal application usage. I ended up having to automate it so I could get it to happen at will. I wrote some code that would open up many instances of an offending file at the same time and just kept beating on it.

comment:38 Changed 3 years ago by cehoyos

  • Analyzed by developer unset
  • Reproduced by developer unset
  • Resolution changed from fixed to invalid

Sorry but this ticket always costs me time when scanning through old reports.

It should be set to "fixed" if either:

  • The commit fixing the issue is known

or

  • A way to reproduce the issue with old versions of FFmpeg is explained

or

comment:39 Changed 3 years ago by DonMoir

The mailing list discussion starts here with comments from Michael, Hendrik, Reimar, myself, and possibly others.

http://ffmpeg.org/pipermail/ffmpeg-devel/2012-October/133043.html

Toward the end of that discussion, it points to this ticket. The problem was understood by the people involved.

Last edited 3 years ago by DonMoir (previous) (diff)

comment:40 Changed 3 years ago by DonMoir

o - The person who made the commit fixing the issue should have posted that here. This ticket was created so it could be made official and documented

o - A way to reproduce the issue was stated above in the initial comment, but your tools are too weak.

o - The source of the problem was pointed to in the initial comment.

o - A backtrace was useless and stated many times.

Note: See TracTickets for help on using tickets.