Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#8452 closed defect (needs_more_info)

slow png decode starting with ffmpeg-3.3.1

Reported by: DonMoir Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords: regression
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

It had been a long time since I have used the latest ffmpeg and have been using an old version. So recently I modified my code to use the latest ffmpeg 4.2.x. Ran a few test with some mp4 files and seemed ok. Then I ran a test with a 1920x1080 mov video that used AV_CODEC_ID_PNG. I noticed the CPU seemed high so I cross checked with older ffmpeg version and decoding was faster. I narrowed it down and last known good was ffmpeg-3.2.9.

Using ffmpeg 3.2.9 and ffmpeg 3.3.1 I came up with some stats. I used a 1920x1080 RGBA 30 FPS video. Windows 7, i7 CPU, 1060 GPU. I checked the time to decode 30 frames of video. This is just the decoding time and nothing else.

decoding 30 frames of png video

               threads   decode time    CPU%
ffmpeg 3.2.9      1        745ms        8-10%
ffmpeg 3.2.9      2         35ms        8-10%

ffmpeg 3.3.1      1       1500ms       10-12%
ffmpeg 3.3.1      2        540ms       18-20%

I am assuming the problem comes from the use of the atomic functions added to pthread_frame.c in 3.3.1. Not sure but seems most likely case.

Here is the commitdiff where that was added:
https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/64a31b2854c589e4f27cd68ebe3bcceb915704e5?hp=db2733256db323e4b88a34b135320f33274148e2

The ffmpeg source was downloaded from here:
https://ffmpeg.org/releases/?C=N;O=D

Best way to check this is to time the decode only as I did using an appropriate sample file which I will put up. Just looking at the playback is probably not enough to verify.

The slow_png_decode_cut.mov sample is not as complex and decodes a bit faster but still slow.

Change History (11)

comment:1 by DonMoir, 4 years ago

Even when cut the files are large while still being useful so had to upload slow_png_decode_cut.mov sample here:
http://www.tellyvisuals.com/slow_png_decode_cut.zip

comment:2 by gdgsdg123, 4 years ago

Unsure if it's related... I seem to have observed similar behavior with x264rgb.

Tremendously slower decoding speed compared to the same clip encoded with the same parameters but different pixel format (YUV 4:4:4 8-bit).

comment:3 by Carl Eugen Hoyos, 4 years ago

Keywords: regression added
Resolution: worksforme
Status: newclosed

comment:4 by DonMoir, 4 years ago

Resolution: worksforme
Status: closedreopened

Carl's response has got to be in the top five worse responses.

Carl says worksforme. We have to guess what it means. Assume he means he used ffplay and he could not see any difference between versions. I mentioned above: "Just looking at the playback is probably not enough to verify." Using ffplay I see about the same CPU usage in either version which is about 20 percent. If you look at the table above that would be expected. Using a single thread the CPU usage is about the same. But it takes 1.5 seconds to decode the 30 frames. Older versions decode the 30 frames in .745 seconds.

I mentioned that the best way to check is to time the decode and cross check my numbers. I assume this would have been done by someone who knows what he is doing. This is a simple thing to do if you know what you are doing. Show your stats and OS. Is it a windows thing? It could be some flag or something I am not doing right for the new versions but I tried several variations. I tried avcodec_decode_video2 and avcodec_send_packet and other things. All the time was in avcodec_send_packet.

worksforme basically says there is no problem and no need to look at it further.

Before I saw the increase in CPU for PNG, it appeared to me there was a slight increase in other formats as well. Maybe 2 percent. Did not concern me too much at the time. gdgsdg123 mentions he is seeing slower decoding with x264rgb. I am thinking the problem is widespread but not very noticeable for most formats. Probably the reason it has gone unnoticed for years. PNG has a slow decode in the first place. It also appears to do more locking which again may point to the atomic functions. The increase I see of about 10 percent could be an increase of 20 to 40 percent on slower machines. In the new versions the CPU usage increases proportionately to the complexity of the PNG. Sure the usages increases likewise in the old versions but in the new versions it climbs at higher rate. It does not appear to be related directly to the PNG decoding code. Since PNG shows the problem the best it should be used for testing no matter the scope of the problem.

When using any new version of ffmpeg I test it extensively before I use it publicly. Since I was using an old API and I assumed there would be problems using the new ffmpeg I made my code ffmpeg API independent so I can easily switch back. I ran into this problem early on in testing and ran into road blocks when reported. What else in new? You benefit from my testing but I might as well stop now if this is the kind of BS I have to deal with. I wonder how many thousands of times issues have been closed incorrectly. During my first go around I wasted far more time reporting a problem than it would take to fix the problem. I don't have that kind of time now. If the issue is closed again without any resolution I am out of here and will fix the problems myself without reporting them.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:5 by Carl Eugen Hoyos, 4 years ago

Resolution: worksforme
Status: reopenedclosed

comment:6 by DonMoir, 4 years ago

Why does anyone give someone who does not know a damn thing (Carl) the authority to make any decisions here. Maybe someone who knows what they are doing will do something and maybe not.

If your arrogant and know what you are doing, then that is fine and you have earned the right. If your arrogant and don't know what you are doing your an idiot. Take that as constructive and you should know it already and I should not have to tell you that.

I guess it all means you just don't give a damn.

Last edited 4 years ago by DonMoir (previous) (diff)

comment:7 by pdr0, 4 years ago

I can replicate the issue with an 8bit PNG sequence. Windows x64

I tested 3.2.4 vs. current. The old version is roughly 2x faster, similar to DonMoir's observations

My observations:

  • RGB24 (suggests problem isn't alpha channel related)
  • image sequence (suggests it's not MOV or other container related)
  • 1000 frames (so test isn't too short)
  • multiple runs, repeatable (not some "outlier" or random event)
  • SSD (not I/O limited)
  • forcing -threads "x" doesn't help (where "x" is some number low or high)
  • relationship holds whether decoded as rawvideo (-c:v rawvideo), or just passed through as null (wrapped AV frame)

comment:8 by pdr0, 4 years ago

Resolution: worksforme
Status: closedreopened

in reply to:  4 comment:9 by gdgsdg123, 4 years ago

Replying to DonMoir:

worksforme basically says there is no problem and no need to look at it further.

"worksforme"... does not necessarily work for others.

"closed" doesn't really mean it's closed neither.


Maybe some rework should be done on the descriptors... Just nobody bothered to.

comment:10 by Carl Eugen Hoyos, 4 years ago

Resolution: needs_more_info
Status: reopenedclosed

Please reopen this ticket if you can reproduce it with current FFmpeg git head and if you provide the command line you tested together with the complete, uncut console output.

comment:11 by DonMoir, 4 years ago

The first thing I did was run the current get head which I mentioned in initial post and became aware of the problem. Then looked back to find version where it became a problem. I mentioned using ffplay and observation is not really the best way to test at least for me. There is extra overhead in play and sort of guessing game. It is really easy to time the decode and then you know without a doubt. Also gives you ability to fine tune and get better stats.

I realize worksforme does not mean it works for others. But I have had to fight to get bugs fixed that were closed and tagged incorrectly. A few I had to mention it to Michael and were fixed in minutes. Much less time than then the hours, days, weeks arguing with Carl. So in this case worksforme and closed with no comments and no information how it was tested (if it was tested at all) was enough to convince me it was dead.

If I was setup for it I could probably tell you exactly where the problem is quickly as it appears somewhat straight forward (maybe). Will do that soon but busy for a few weeks. I found another problem with png which is unrelated to this thread and some problems with hardware decoding. This after just a few hours of testing. I don't know what all I will run into with more complete testing. One thing for sure is I am not going to waste time arguing common sense things with Carl.

PS: Does not appear to be a problem with alpha and not directly a png problem. Appears to be a general problem. But in the png case with it's inherent slower decoding and extra locking, it shows the problem best.

Last edited 4 years ago by DonMoir (previous) (diff)
Note: See TracTickets for help on using tickets.