Opened 10 years ago

Closed 7 years ago

#3651 closed enhancement (wontfix)

UT Video Codec is inefficient compared to libutvideo

Reported by: Zerowalker Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: utvideo
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:
Not really sure if i am supposed to write these things here, as it's not really a bug, but here goes.

LAV Filter use ffmpeg for Decoding, and hence i direct this here.

The performance in decoding Lagarith and UT Video Codec are extremely bad, most of the time it's over 100% slower.

Originally i thought it was faster, either i have been mistaken or something has happened.

However, worth noting, Lagarith is limited to 2 threads in it's original decoder, however comparing the performance make this insignificant as ffmpeg will use more and still not be at the same pace.

How to reproduce:

Pretty sure you can just use:

ffmpeg -i "lagarith.avi" -o "Raw.avi"

so just decode a lagarith file to raw, and you will see the performance, than compare it to using the original decoder.

Change History (37)

comment:1 by Carl Eugen Hoyos, 10 years ago

Priority: normalwish

Sounds like two completely independent issues to me or do I miss something?
The original Lagarith decoder is not portable and cannot be compared to FFmpeg's decoder afaik.
Could you add some numbers and post FFmpeg console output to make this a complete ticket?

comment:2 by Zerowalker, 10 years ago

What command should i run?
Can't really do "rawvideo" as the HDD will be the bottleneck then.

comment:3 by Carl Eugen Hoyos, 10 years ago

Keywords: utvideo added

You reported that the FFmpeg decoders are slower than the original decoders: How do you know and how much slower are they?

comment:4 by Zerowalker, 10 years ago

I don't know the precise number, but i have used LAV Filter which uses ffmpeg to Decode, hence how i could compare the playback performance simply watching CPU Usage.

http://forum.doom9.org/showthread.php?t=156191

comment:5 by Carl Eugen Hoyos, 10 years ago

Reproduced by developer: set
Status: newopen
Summary: Lagarith and UT Video Codec has Bad PerformanceUT Video Codec is inefficient compared to libutvideo
Version: unspecifiedgit-master

While the native utvideo decoder is significantly faster here than libutvideo (because it uses eight threads but libutvideo only four afaict) I can confirm that the libutvideo decoder is much more efficient (takes less CPU cycles for the same input).

comment:6 by Zerowalker, 10 years ago

Hmm, this is very weird.

For example, if i playback an UT Video Codec video of a certain caliber, it will take about 50% CPU usage for me (i got 4 core, so 2 x 100%), while the original decoder only uses about 1 core.

The same goes for everything i tried, the original decoder won by quite a large difference, same goes for Lagarith.

For me that should mean it's faster at the same threads, in pure efficiency right?
As i can't even use 8 threads in an optimal way as i only got 4 cores with No-HT.

Thanks

comment:7 by Carl Eugen Hoyos, 10 years ago

As said, please post numbers.

If you don't want to post any numbers, please accept my tests.

comment:8 by Zerowalker, 10 years ago

But i don't know what numbers you want, what commands do you want me to run?
If possible, i would like to to try LAV Filter to decode, and also the original decoder, and see the CPU Usage.

Thanks

comment:9 by Carl Eugen Hoyos, 10 years ago

I tested with the following file:

$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$  time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on May 24 2014 11:12:43 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32'
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.40.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (utvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0xb1fca40] Encoder did not produce proper pts, making some up.
frame= 1500 fps=162 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=69.881s
bench: maxrss=71984kB

real    0m9.276s
user    1m9.883s
sys     0m0.707s
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on May 24 2014 11:17:02 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.40.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0xac54c00] Encoder did not produce proper pts, making some up.
frame= 1500 fps=133 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=40.733s
bench: maxrss=24320kB

real    0m11.317s
user    0m40.739s
sys     0m2.059s

comment:10 by Zerowalker, 10 years ago

Is it possible to make a benchmark using the same file?

You see i got a file that's UT Video RGB.

and with LAV Filter it uses a bit above 50% for it's 30fps speed.
With UT Video Codec it uses 25% all the time.

LAV Filter is possibly faster, but but at twice the cost for me.

(25% = 1 core)

If both use the same amount of threads, won't LAV lose by quite the amount for you as well?

in reply to:  10 comment:11 by Carl Eugen Hoyos, 10 years ago

Replying to Zerowalker:

Is it possible to make a benchmark using the same file?

Isn't this what I have done?
Or do I misunderstand?

You see i got a file that's UT Video RGB.

That is what I tested or am I missing something?

Do you agree that in my comment:5 and my comment:9 I confirm your original post or is there still something unconfirmed about your utvideo issue?
As said the lagarith issue is different and while performance improvements are always possible, the (non-portable and non-future-proof) original decoder will probably always be faster than our (portable and future-proof) lagarith decoder.

comment:12 by Zerowalker, 10 years ago

Oh, wait so you are saying.

Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)

Lagarith is slower cause of some issue, which i don't understand, which sadly is quite bad as it's A Lot slower, but as you understand what i mean and still see the issue, then i can't but agree it's not possible for it to be better at it's current state.

If this is correct, then that indeed confirms my issue,
only thing left would be a suggestion to improve the Lagarith decoder anymore if possible to make it closer to the original, but that is probably something you will do if you find the time and worth anyway:)

Thanks

in reply to:  12 ; comment:13 by Carl Eugen Hoyos, 10 years ago

Replying to Zerowalker:

Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)

Isn't that what I wrote in comment:5?

Lagarith is slower cause of some issue, which i don't understand

The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any (non-broken) C compiler and you will always get correct (bitexact) output. Fixed-point arithmetic is often slower than floating-point on typical hardware.
See also (for example):
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2009-September/079680.html
http://mod16.org/hurfdurf/?p=142

comment:14 by Zerowalker, 10 years ago

Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?

Actually heard, or read about this, ppl pointed out the issues that it may not always be correct when decoding, but i could never get a real answer from it, so it ended up being ("a bug that probably has been solved").

But now that you confirm it, this makes things a bit, scary even as i use Lagarith a bit.

I knew Floating Point was superfast but lacking in precision in cases (not a programmer here;P), but i didn't know the difference in speed could be this big.

But just to confirm this now, my CPU is an i5 760, Intel with x64.
Is this a non-x86 or not (I know it's x64, but i also think it's in the x86 system, so if you could please clarify).

Cause if i use the Native decoder, will i always get the expected results if i have this CPU, and it's only occuring on other builds, and other systems (i am guessing, ARM etc?).

If so, that is "okay" for me, but indeed not alright for a lossless codec, but it's hard to give up on the Codec, in some cases it yields unparalleled results (Pixelated Game Capture), but it's pretty much only there.

Sorry for many extra questions, but you certainly bring it up in a good way, hope i am not wasting to much of your time!

in reply to:  14 comment:15 by Carl Eugen Hoyos, 10 years ago

Replying to Zerowalker:

Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?

On the FFmpeg bug tracker the native utvideo decoder is of course the libavcodec utvideo decoder (as opposed to libutvideo).

comment:16 by Zerowalker, 10 years ago

Oh, so got it backwards, Native = ffmpeg?
If so, than we are in an agreement:)

in reply to:  13 ; comment:17 by Michael Niedermayer, 10 years ago

Replying to cehoyos:

Replying to Zerowalker:

[...]

Lagarith is slower cause of some issue, which i don't understand

The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any

Thats not an excuse to be slower on x86 though, also making softfloat_mul() 30% faster by using floats has no meassureable effect on the overall speed (with the file i tested).
did someone compare the decoders part by part speedwise ?

comment:18 by Zerowalker, 10 years ago

The only comparison i have done is on Playback, looking at CPU Usage etc.

And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.

But on LAV Filter, it's so slow on some parts that i would call it unwatchable.

That's where i noticed that there was a huge difference, but i have no numbers on it.

(The slow parts are Noisy stuff at 1920x1080p, so is probably quite possible to make a testbench with just much noise going on).

in reply to:  17 comment:19 by Carl Eugen Hoyos, 10 years ago

Replying to michael:

did someone compare the decoders part by part speedwise ?

I don't know how to test the original Lagarith decoder / how to do a performance comparison.

comment:20 by Carl Eugen Hoyos, 10 years ago

Lagarith performance can be tested with MPlayer and -vo gl (-vo null crashes currently):
The dll in http://samples.ffmpeg.org/V-codecs/lagarith/ is 10% faster for lagarith.avi (RGB) but much slower than FFmpeg for lagarith422.avi (even if I do the conversion from yuv422p to rgb that lagarith seems to do internally). It is possible that the dll is outdated, I don't know how to find out.
For a better test, a longer and larger RGB sample would be needed.

in reply to:  18 ; comment:21 by Michael Niedermayer, 10 years ago

Replying to Zerowalker:

The only comparison i have done is on Playback, looking at CPU Usage etc.

And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.

But on LAV Filter, it's so slow on some parts that i would call it unwatchable.

where can i find this file ?

comment:22 by Carl Eugen Hoyos, 10 years ago

Lagarith version 1323 also works with MPlayer and is >20% faster than FFmpeg's decoder. The performance disadvantage with FFmpeg is apparently not so bad for Lagarith than it is for utvideo.

in reply to:  21 comment:23 by Zerowalker, 10 years ago

Replying to michael:

where can i find this file ?

It's just a video i got, made myself in After Effects, nothing special or something that's available.
I am sure any file will show the same results, just make a video with much noise and movement at 1080p or more, than compare Lagarith with ffmpeg.

cehoyos, so utvideo is worse of than the Lagarith decoder, do you include the efficiency in a single thread then, or do you mean overall?

comment:24 by Elon Musk, 7 years ago

Resolution: fixed
Status: openclosed

libutvideo is no more and I gonna push patch that makes utvideo decoder faster.

comment:25 by Carl Eugen Hoyos, 7 years ago

Resolution: fixed
Status: closedreopened

comment:26 by Elon Musk, 7 years ago

Resolution: wontfix
Status: reopenedclosed

There is no way to do same for left prediction. Left prediction is not SIMDable. And there is no other predictions available.

Please reopen only if you can provide real benchmarks numbers proving that decoder is slower than reference one. And with steps to reproduce

comment:27 by Carl Eugen Hoyos, 7 years ago

Resolution: wontfix
Status: closedreopened
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-83022-g0006384 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32'
  libavutil      55. 43.100 / 55. 43.100
  libavcodec     57. 71.100 / 57. 71.100
  libavformat    57. 62.100 / 57. 62.100
  libavdevice    57.  2.100 / 57.  2.100
  libavfilter     6. 68.100 /  6. 68.100
  libswscale      4.  3.101 /  4.  3.101
  libswresample   2.  4.100 /  2.  4.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.48.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf57.62.100
    Stream #0:0: Video: wrapped_avframe, rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc57.71.100 wrapped_avframe
Stream mapping:
  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
frame= 1500 fps=156 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=6.22x
video:545kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on Jan  7 2017 16:23:03 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.48.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0x9cd2c00] Encoder did not produce proper pts, making some up.
frame= 1500 fps=132 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

comment:28 by Elon Musk, 7 years ago

Resolution: worksforme
Status: reopenedclosed

You are comparing decoding speed with crystal ball?
Compare single threaded decoding.

Last edited 7 years ago by Elon Musk (previous) (diff)

comment:29 by Cigaes, 7 years ago

I read in the report:

  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s

and

  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?

comment:30 by Carl Eugen Hoyos, 7 years ago

Resolution: worksforme
Status: closedreopened

in reply to:  29 comment:31 by Elon Musk, 7 years ago

Replying to Cigaes:

I read in the report:

  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s

and

  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?

One gives more FPS than other.

comment:32 by Cigaes, 7 years ago

So I guess you are looking at the "real" time instead of the "user" time. The "user" time is usually more relevant; differences in "real" time may mean other processes getting scheduled or slow input from disk.

Still, let us assume it is not the case here. We can compute the efficiency. For libutvideo: 40.982 user for 11.342 real means 361% CPU use; for the native decoder, 69.011 user for 9.651 real means 715% CPU use.

361% versus 715% looks like ~90% of respectively 4 and 8 threads. Would this be a quad-code hyper-threaded system?

Anyway, the native decoder is still not on par with the library, this bug cannot be closed.

comment:33 by Elon Musk, 7 years ago

I will close it until he provides numbers for single thread decoding.

comment:34 by Cigaes, 7 years ago

Please do not. The quoted report already show more than 60% in user time with default options, which is significant.

comment:35 by Elon Musk, 7 years ago

Also he is only testing rgb input.

comment:36 by Cigaes, 7 years ago

... and the library performs better. Your point?

comment:37 by Elon Musk, 7 years ago

Resolution: wontfix
Status: reopenedclosed
Note: See TracTickets for help on using tickets.