Opened 7 years ago
Closed 4 years ago
#3651 closed enhancement (wontfix)
UT Video Codec is inefficient compared to libutvideo
Reported by: | Zerowalker | Owned by: | |
---|---|---|---|
Priority: | wish | Component: | avcodec |
Version: | git-master | Keywords: | utvideo |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
Summary of the bug:
Not really sure if i am supposed to write these things here, as it's not really a bug, but here goes.
LAV Filter use ffmpeg for Decoding, and hence i direct this here.
The performance in decoding Lagarith and UT Video Codec are extremely bad, most of the time it's over 100% slower.
Originally i thought it was faster, either i have been mistaken or something has happened.
However, worth noting, Lagarith is limited to 2 threads in it's original decoder, however comparing the performance make this insignificant as ffmpeg will use more and still not be at the same pace.
How to reproduce:
Pretty sure you can just use:
ffmpeg -i "lagarith.avi" -o "Raw.avi"
so just decode a lagarith file to raw, and you will see the performance, than compare it to using the original decoder.
Change History (37)
comment:1 Changed 7 years ago by cehoyos
- Priority changed from normal to wish
comment:2 Changed 7 years ago by Zerowalker
What command should i run?
Can't really do "rawvideo" as the HDD will be the bottleneck then.
comment:3 Changed 7 years ago by cehoyos
- Keywords utvideo added
You reported that the FFmpeg decoders are slower than the original decoders: How do you know and how much slower are they?
comment:4 Changed 7 years ago by Zerowalker
I don't know the precise number, but i have used LAV Filter which uses ffmpeg to Decode, hence how i could compare the playback performance simply watching CPU Usage.
comment:5 Changed 7 years ago by cehoyos
- Reproduced by developer set
- Status changed from new to open
- Summary changed from Lagarith and UT Video Codec has Bad Performance to UT Video Codec is inefficient compared to libutvideo
- Version changed from unspecified to git-master
While the native utvideo decoder is significantly faster here than libutvideo (because it uses eight threads but libutvideo only four afaict) I can confirm that the libutvideo decoder is much more efficient (takes less CPU cycles for the same input).
comment:6 Changed 7 years ago by Zerowalker
Hmm, this is very weird.
For example, if i playback an UT Video Codec video of a certain caliber, it will take about 50% CPU usage for me (i got 4 core, so 2 x 100%), while the original decoder only uses about 1 core.
The same goes for everything i tried, the original decoder won by quite a large difference, same goes for Lagarith.
For me that should mean it's faster at the same threads, in pure efficiency right?
As i can't even use 8 threads in an optimal way as i only got 4 cores with No-HT.
Thanks
comment:7 Changed 7 years ago by cehoyos
As said, please post numbers.
If you don't want to post any numbers, please accept my tests.
comment:8 Changed 7 years ago by Zerowalker
But i don't know what numbers you want, what commands do you want me to run?
If possible, i would like to to try LAV Filter to decode, and also the original decoder, and see the CPU Usage.
Thanks
comment:9 Changed 7 years ago by cehoyos
I tested with the following file:
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on May 24 2014 11:12:43 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.40.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (utvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0xb1fca40] Encoder did not produce proper pts, making some up. frame= 1500 fps=162 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=69.881s bench: maxrss=71984kB real 0m9.276s user 1m9.883s sys 0m0.707s
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on May 24 2014 11:17:02 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 libpostproc 52. 3.100 / 52. 3.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.40.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (libutvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0xac54c00] Encoder did not produce proper pts, making some up. frame= 1500 fps=133 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=40.733s bench: maxrss=24320kB real 0m11.317s user 0m40.739s sys 0m2.059s
comment:10 follow-up: ↓ 11 Changed 7 years ago by Zerowalker
Is it possible to make a benchmark using the same file?
You see i got a file that's UT Video RGB.
and with LAV Filter it uses a bit above 50% for it's 30fps speed.
With UT Video Codec it uses 25% all the time.
LAV Filter is possibly faster, but but at twice the cost for me.
(25% = 1 core)
If both use the same amount of threads, won't LAV lose by quite the amount for you as well?
comment:11 in reply to: ↑ 10 Changed 7 years ago by cehoyos
Replying to Zerowalker:
Is it possible to make a benchmark using the same file?
Isn't this what I have done?
Or do I misunderstand?
You see i got a file that's UT Video RGB.
That is what I tested or am I missing something?
Do you agree that in my comment:5 and my comment:9 I confirm your original post or is there still something unconfirmed about your utvideo issue?
As said the lagarith issue is different and while performance improvements are always possible, the (non-portable and non-future-proof) original decoder will probably always be faster than our (portable and future-proof) lagarith decoder.
comment:12 follow-up: ↓ 13 Changed 7 years ago by Zerowalker
Oh, wait so you are saying.
Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)
Lagarith is slower cause of some issue, which i don't understand, which sadly is quite bad as it's A Lot slower, but as you understand what i mean and still see the issue, then i can't but agree it's not possible for it to be better at it's current state.
If this is correct, then that indeed confirms my issue,
only thing left would be a suggestion to improve the Lagarith decoder anymore if possible to make it closer to the original, but that is probably something you will do if you find the time and worth anyway:)
Thanks
comment:13 in reply to: ↑ 12 ; follow-up: ↓ 17 Changed 7 years ago by cehoyos
Replying to Zerowalker:
Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)
Isn't that what I wrote in comment:5?
Lagarith is slower cause of some issue, which i don't understand
The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any (non-broken) C compiler and you will always get correct (bitexact) output. Fixed-point arithmetic is often slower than floating-point on typical hardware.
See also (for example):
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2009-September/079680.html
http://mod16.org/hurfdurf/?p=142
comment:14 follow-up: ↓ 15 Changed 7 years ago by Zerowalker
Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?
Actually heard, or read about this, ppl pointed out the issues that it may not always be correct when decoding, but i could never get a real answer from it, so it ended up being ("a bug that probably has been solved").
But now that you confirm it, this makes things a bit, scary even as i use Lagarith a bit.
I knew Floating Point was superfast but lacking in precision in cases (not a programmer here;P), but i didn't know the difference in speed could be this big.
But just to confirm this now, my CPU is an i5 760, Intel with x64.
Is this a non-x86 or not (I know it's x64, but i also think it's in the x86 system, so if you could please clarify).
Cause if i use the Native decoder, will i always get the expected results if i have this CPU, and it's only occuring on other builds, and other systems (i am guessing, ARM etc?).
If so, that is "okay" for me, but indeed not alright for a lossless codec, but it's hard to give up on the Codec, in some cases it yields unparalleled results (Pixelated Game Capture), but it's pretty much only there.
Sorry for many extra questions, but you certainly bring it up in a good way, hope i am not wasting to much of your time!
comment:15 in reply to: ↑ 14 Changed 7 years ago by cehoyos
Replying to Zerowalker:
Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?
On the FFmpeg bug tracker the native utvideo decoder is of course the libavcodec utvideo decoder (as opposed to libutvideo).
comment:16 Changed 7 years ago by Zerowalker
Oh, so got it backwards, Native = ffmpeg?
If so, than we are in an agreement:)
comment:17 in reply to: ↑ 13 ; follow-up: ↓ 19 Changed 7 years ago by michael
Replying to cehoyos:
Replying to Zerowalker:
[...]
Lagarith is slower cause of some issue, which i don't understand
The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any
Thats not an excuse to be slower on x86 though, also making softfloat_mul() 30% faster by using floats has no meassureable effect on the overall speed (with the file i tested).
did someone compare the decoders part by part speedwise ?
comment:18 follow-up: ↓ 21 Changed 7 years ago by Zerowalker
The only comparison i have done is on Playback, looking at CPU Usage etc.
And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.
But on LAV Filter, it's so slow on some parts that i would call it unwatchable.
That's where i noticed that there was a huge difference, but i have no numbers on it.
(The slow parts are Noisy stuff at 1920x1080p, so is probably quite possible to make a testbench with just much noise going on).
comment:19 in reply to: ↑ 17 Changed 7 years ago by cehoyos
Replying to michael:
did someone compare the decoders part by part speedwise ?
I don't know how to test the original Lagarith decoder / how to do a performance comparison.
comment:20 Changed 7 years ago by cehoyos
Lagarith performance can be tested with MPlayer and -vo gl (-vo null crashes currently):
The dll in http://samples.ffmpeg.org/V-codecs/lagarith/ is 10% faster for lagarith.avi (RGB) but much slower than FFmpeg for lagarith422.avi (even if I do the conversion from yuv422p to rgb that lagarith seems to do internally). It is possible that the dll is outdated, I don't know how to find out.
For a better test, a longer and larger RGB sample would be needed.
comment:21 in reply to: ↑ 18 ; follow-up: ↓ 23 Changed 7 years ago by michael
Replying to Zerowalker:
The only comparison i have done is on Playback, looking at CPU Usage etc.
And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.
But on LAV Filter, it's so slow on some parts that i would call it unwatchable.
where can i find this file ?
comment:22 Changed 7 years ago by cehoyos
Lagarith version 1323 also works with MPlayer and is >20% faster than FFmpeg's decoder. The performance disadvantage with FFmpeg is apparently not so bad for Lagarith than it is for utvideo.
comment:23 in reply to: ↑ 21 Changed 7 years ago by Zerowalker
Replying to michael:
where can i find this file ?
It's just a video i got, made myself in After Effects, nothing special or something that's available.
I am sure any file will show the same results, just make a video with much noise and movement at 1080p or more, than compare Lagarith with ffmpeg.
cehoyos, so utvideo is worse of than the Lagarith decoder, do you include the efficiency in a single thread then, or do you mean overall?
comment:24 Changed 4 years ago by richardpl
- Resolution set to fixed
- Status changed from open to closed
libutvideo is no more and I gonna push patch that makes utvideo decoder faster.
comment:25 Changed 4 years ago by cehoyos
- Resolution fixed deleted
- Status changed from closed to reopened
Implemented for median prediction in ea93052db3594f93f2d10be085a770184da0513d and 68e5598e22b6b51cd796b55c4111ccd1638474d9
comment:26 Changed 4 years ago by richardpl
- Resolution set to wontfix
- Status changed from reopened to closed
There is no way to do same for left prediction. Left prediction is not SIMDable. And there is no other predictions available.
Please reopen only if you can provide real benchmarks numbers proving that decoder is slower than reference one. And with steps to reproduce
comment:27 Changed 4 years ago by cehoyos
- Resolution wontfix deleted
- Status changed from closed to reopened
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-83022-g0006384 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' libavutil 55. 43.100 / 55. 43.100 libavcodec 57. 71.100 / 57. 71.100 libavformat 57. 62.100 / 57. 62.100 libavdevice 57. 2.100 / 57. 2.100 libavfilter 6. 68.100 / 6. 68.100 libswscale 4. 3.101 / 4. 3.101 libswresample 2. 4.100 / 2. 4.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.48.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf57.62.100 Stream #0:0: Video: wrapped_avframe, rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc Metadata: encoder : Lavc57.71.100 wrapped_avframe Stream mapping: Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) Press [q] to stop, [?] for help frame= 1500 fps=156 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=6.22x video:545kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670s
$ time ./ffmpeg -benchmark -i out.avi -f null - ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers built on Jan 7 2017 16:23:03 with gcc 4.7 (SUSE Linux) configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl libavutil 52. 86.100 / 52. 86.100 libavcodec 55. 64.100 / 55. 64.100 libavformat 55. 40.100 / 55. 40.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 libpostproc 52. 3.100 / 52. 3.100 Input #0, avi, from 'out.avi': Metadata: encoder : Lavf55.48.100 Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc Output #0, null, to 'pipe:': Metadata: encoder : Lavf55.40.100 Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc Metadata: encoder : Lavc55.64.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (libutvideo -> rawvideo) Press [q] to stop, [?] for help [null @ 0x9cd2c00] Encoder did not produce proper pts, making some up. frame= 1500 fps=132 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000% bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014s
comment:28 Changed 4 years ago by richardpl
- Resolution set to worksforme
- Status changed from reopened to closed
You are comparing decoding speed with crystal ball?
Compare single threaded decoding.
comment:29 follow-up: ↓ 31 Changed 4 years ago by Cigaes
I read in the report:
Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670s
and
Stream #0:0 -> #0:0 (libutvideo -> rawvideo) bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014s
I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?
comment:30 Changed 4 years ago by cehoyos
- Resolution worksforme deleted
- Status changed from closed to reopened
comment:31 in reply to: ↑ 29 Changed 4 years ago by richardpl
Replying to Cigaes:
I read in the report:
Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native)) bench: utime=69.007s bench: maxrss=80832kB real 0m9.651s user 1m9.011s sys 0m0.670sand
Stream #0:0 -> #0:0 (libutvideo -> rawvideo) bench: utime=40.979s bench: maxrss=22828kB real 0m11.342s user 0m40.982s sys 0m2.014sI find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?
One gives more FPS than other.
comment:32 Changed 4 years ago by Cigaes
So I guess you are looking at the "real" time instead of the "user" time. The "user" time is usually more relevant; differences in "real" time may mean other processes getting scheduled or slow input from disk.
Still, let us assume it is not the case here. We can compute the efficiency. For libutvideo: 40.982 user for 11.342 real means 361% CPU use; for the native decoder, 69.011 user for 9.651 real means 715% CPU use.
361% versus 715% looks like ~90% of respectively 4 and 8 threads. Would this be a quad-code hyper-threaded system?
Anyway, the native decoder is still not on par with the library, this bug cannot be closed.
comment:33 Changed 4 years ago by richardpl
I will close it until he provides numbers for single thread decoding.
comment:34 Changed 4 years ago by Cigaes
Please do not. The quoted report already show more than 60% in user time with default options, which is significant.
comment:35 Changed 4 years ago by richardpl
Also he is only testing rgb input.
comment:36 Changed 4 years ago by Cigaes
... and the library performs better. Your point?
comment:37 Changed 4 years ago by richardpl
- Resolution set to wontfix
- Status changed from reopened to closed
Sounds like two completely independent issues to me or do I miss something?
The original Lagarith decoder is not portable and cannot be compared to FFmpeg's decoder afaik.
Could you add some numbers and post FFmpeg console output to make this a complete ticket?