Opened 4 years ago

Closed 15 months ago

#3651 closed enhancement (wontfix)

UT Video Codec is inefficient compared to libutvideo

Reported by: Zerowalker Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: utvideo
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:
Not really sure if i am supposed to write these things here, as it's not really a bug, but here goes.

LAV Filter use ffmpeg for Decoding, and hence i direct this here.

The performance in decoding Lagarith and UT Video Codec are extremely bad, most of the time it's over 100% slower.

Originally i thought it was faster, either i have been mistaken or something has happened.

However, worth noting, Lagarith is limited to 2 threads in it's original decoder, however comparing the performance make this insignificant as ffmpeg will use more and still not be at the same pace.

How to reproduce:

Pretty sure you can just use:

ffmpeg -i "lagarith.avi" -o "Raw.avi"

so just decode a lagarith file to raw, and you will see the performance, than compare it to using the original decoder.

Change History (37)

comment:1 Changed 4 years ago by cehoyos

  • Priority changed from normal to wish

Sounds like two completely independent issues to me or do I miss something?
The original Lagarith decoder is not portable and cannot be compared to FFmpeg's decoder afaik.
Could you add some numbers and post FFmpeg console output to make this a complete ticket?

comment:2 Changed 4 years ago by Zerowalker

What command should i run?
Can't really do "rawvideo" as the HDD will be the bottleneck then.

comment:3 Changed 4 years ago by cehoyos

  • Keywords utvideo added

You reported that the FFmpeg decoders are slower than the original decoders: How do you know and how much slower are they?

comment:4 Changed 4 years ago by Zerowalker

I don't know the precise number, but i have used LAV Filter which uses ffmpeg to Decode, hence how i could compare the playback performance simply watching CPU Usage.

http://forum.doom9.org/showthread.php?t=156191

comment:5 Changed 4 years ago by cehoyos

  • Reproduced by developer set
  • Status changed from new to open
  • Summary changed from Lagarith and UT Video Codec has Bad Performance to UT Video Codec is inefficient compared to libutvideo
  • Version changed from unspecified to git-master

While the native utvideo decoder is significantly faster here than libutvideo (because it uses eight threads but libutvideo only four afaict) I can confirm that the libutvideo decoder is much more efficient (takes less CPU cycles for the same input).

comment:6 Changed 4 years ago by Zerowalker

Hmm, this is very weird.

For example, if i playback an UT Video Codec video of a certain caliber, it will take about 50% CPU usage for me (i got 4 core, so 2 x 100%), while the original decoder only uses about 1 core.

The same goes for everything i tried, the original decoder won by quite a large difference, same goes for Lagarith.

For me that should mean it's faster at the same threads, in pure efficiency right?
As i can't even use 8 threads in an optimal way as i only got 4 cores with No-HT.

Thanks

comment:7 Changed 4 years ago by cehoyos

As said, please post numbers.

If you don't want to post any numbers, please accept my tests.

comment:8 Changed 4 years ago by Zerowalker

But i don't know what numbers you want, what commands do you want me to run?
If possible, i would like to to try LAV Filter to decode, and also the original decoder, and see the CPU Usage.

Thanks

comment:9 Changed 4 years ago by cehoyos

I tested with the following file:

$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$  time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on May 24 2014 11:12:43 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32'
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.40.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (utvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0xb1fca40] Encoder did not produce proper pts, making some up.
frame= 1500 fps=162 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=69.881s
bench: maxrss=71984kB

real    0m9.276s
user    1m9.883s
sys     0m0.707s
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on May 24 2014 11:17:02 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.40.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171716 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171824 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0xac54c00] Encoder did not produce proper pts, making some up.
frame= 1500 fps=133 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=40.733s
bench: maxrss=24320kB

real    0m11.317s
user    0m40.739s
sys     0m2.059s

comment:10 follow-up: Changed 4 years ago by Zerowalker

Is it possible to make a benchmark using the same file?

You see i got a file that's UT Video RGB.

and with LAV Filter it uses a bit above 50% for it's 30fps speed.
With UT Video Codec it uses 25% all the time.

LAV Filter is possibly faster, but but at twice the cost for me.

(25% = 1 core)

If both use the same amount of threads, won't LAV lose by quite the amount for you as well?

comment:11 in reply to: ↑ 10 Changed 4 years ago by cehoyos

Replying to Zerowalker:

Is it possible to make a benchmark using the same file?

Isn't this what I have done?
Or do I misunderstand?

You see i got a file that's UT Video RGB.

That is what I tested or am I missing something?

Do you agree that in my comment:5 and my comment:9 I confirm your original post or is there still something unconfirmed about your utvideo issue?
As said the lagarith issue is different and while performance improvements are always possible, the (non-portable and non-future-proof) original decoder will probably always be faster than our (portable and future-proof) lagarith decoder.

comment:12 follow-up: Changed 4 years ago by Zerowalker

Oh, wait so you are saying.

Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)

Lagarith is slower cause of some issue, which i don't understand, which sadly is quite bad as it's A Lot slower, but as you understand what i mean and still see the issue, then i can't but agree it's not possible for it to be better at it's current state.

If this is correct, then that indeed confirms my issue,
only thing left would be a suggestion to improve the Lagarith decoder anymore if possible to make it closer to the original, but that is probably something you will do if you find the time and worth anyway:)

Thanks

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by cehoyos

Replying to Zerowalker:

Your UT Video Codec is faster cause of more threads, but slower on the same, meaning it has less efficiency overall (if i am not mistaken?)

Isn't that what I wrote in comment:5?

Lagarith is slower cause of some issue, which i don't understand

The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any (non-broken) C compiler and you will always get correct (bitexact) output. Fixed-point arithmetic is often slower than floating-point on typical hardware.
See also (for example):
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2009-September/079680.html
http://mod16.org/hurfdurf/?p=142

comment:14 follow-up: Changed 4 years ago by Zerowalker

Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?

Actually heard, or read about this, ppl pointed out the issues that it may not always be correct when decoding, but i could never get a real answer from it, so it ended up being ("a bug that probably has been solved").

But now that you confirm it, this makes things a bit, scary even as i use Lagarith a bit.

I knew Floating Point was superfast but lacking in precision in cases (not a programmer here;P), but i didn't know the difference in speed could be this big.

But just to confirm this now, my CPU is an i5 760, Intel with x64.
Is this a non-x86 or not (I know it's x64, but i also think it's in the x86 system, so if you could please clarify).

Cause if i use the Native decoder, will i always get the expected results if i have this CPU, and it's only occuring on other builds, and other systems (i am guessing, ARM etc?).

If so, that is "okay" for me, but indeed not alright for a lossless codec, but it's hard to give up on the Codec, in some cases it yields unparalleled results (Pixelated Game Capture), but it's pretty much only there.

Sorry for many extra questions, but you certainly bring it up in a good way, hope i am not wasting to much of your time!

comment:15 in reply to: ↑ 14 Changed 4 years ago by cehoyos

Replying to Zerowalker:

Actually at comment:5, you say the opposite from what i mean.
I am saying, doesn't the Native decoder use less threads and is much faster, compared to ffmpeg?

On the FFmpeg bug tracker the native utvideo decoder is of course the libavcodec utvideo decoder (as opposed to libutvideo).

comment:16 Changed 4 years ago by Zerowalker

Oh, so got it backwards, Native = ffmpeg?
If so, than we are in an agreement:)

comment:17 in reply to: ↑ 13 ; follow-up: Changed 4 years ago by michael

Replying to cehoyos:

Replying to Zerowalker:

[...]

Lagarith is slower cause of some issue, which i don't understand

The original codec implementation uses floating point arithmetic which will fail on processors != x86 and might fail if you use another compiler. FFmpeg's implementation is fixed-point (which is what you expect for a lossless codec) meaning you can compile it for any hardware with any

Thats not an excuse to be slower on x86 though, also making softfloat_mul() 30% faster by using floats has no meassureable effect on the overall speed (with the file i tested).
did someone compare the decoders part by part speedwise ?

comment:18 follow-up: Changed 4 years ago by Zerowalker

The only comparison i have done is on Playback, looking at CPU Usage etc.

And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.

But on LAV Filter, it's so slow on some parts that i would call it unwatchable.

That's where i noticed that there was a huge difference, but i have no numbers on it.

(The slow parts are Noisy stuff at 1920x1080p, so is probably quite possible to make a testbench with just much noise going on).

comment:19 in reply to: ↑ 17 Changed 4 years ago by cehoyos

Replying to michael:

did someone compare the decoders part by part speedwise ?

I don't know how to test the original Lagarith decoder / how to do a performance comparison.

comment:20 Changed 4 years ago by cehoyos

Lagarith performance can be tested with MPlayer and -vo gl (-vo null crashes currently):
The dll in http://samples.ffmpeg.org/V-codecs/lagarith/ is 10% faster for lagarith.avi (RGB) but much slower than FFmpeg for lagarith422.avi (even if I do the conversion from yuv422p to rgb that lagarith seems to do internally). It is possible that the dll is outdated, I don't know how to find out.
For a better test, a longer and larger RGB sample would be needed.

comment:21 in reply to: ↑ 18 ; follow-up: Changed 4 years ago by michael

Replying to Zerowalker:

The only comparison i have done is on Playback, looking at CPU Usage etc.

And i also have a file that's barely playable, with Lagarith Decoder i think it's playable but it may drop frames, not truly sure.

But on LAV Filter, it's so slow on some parts that i would call it unwatchable.

where can i find this file ?

comment:22 Changed 4 years ago by cehoyos

Lagarith version 1323 also works with MPlayer and is >20% faster than FFmpeg's decoder. The performance disadvantage with FFmpeg is apparently not so bad for Lagarith than it is for utvideo.

comment:23 in reply to: ↑ 21 Changed 4 years ago by Zerowalker

Replying to michael:

where can i find this file ?

It's just a video i got, made myself in After Effects, nothing special or something that's available.
I am sure any file will show the same results, just make a video with much noise and movement at 1080p or more, than compare Lagarith with ffmpeg.

cehoyos, so utvideo is worse of than the Lagarith decoder, do you include the efficiency in a single thread then, or do you mean overall?

comment:24 Changed 21 months ago by richardpl

  • Resolution set to fixed
  • Status changed from open to closed

libutvideo is no more and I gonna push patch that makes utvideo decoder faster.

comment:25 Changed 21 months ago by cehoyos

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:26 Changed 21 months ago by richardpl

  • Resolution set to wontfix
  • Status changed from reopened to closed

There is no way to do same for left prediction. Left prediction is not SIMDable. And there is no other predictions available.

Please reopen only if you can provide real benchmarks numbers proving that decoder is slower than reference one. And with steps to reproduce

comment:27 Changed 21 months ago by cehoyos

  • Resolution wontfix deleted
  • Status changed from closed to reopened
$ ffmpeg -f lavfi -i testsrc=s=hd1080 -vcodec utvideo -t 60 out.avi
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-83022-g0006384 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32'
  libavutil      55. 43.100 / 55. 43.100
  libavcodec     57. 71.100 / 57. 71.100
  libavformat    57. 62.100 / 57. 62.100
  libavdevice    57.  2.100 / 57.  2.100
  libavfilter     6. 68.100 /  6. 68.100
  libswscale      4.  3.101 /  4.  3.101
  libswresample   2.  4.100 /  2.  4.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.48.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), rgb24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf57.62.100
    Stream #0:0: Video: wrapped_avframe, rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc57.71.100 wrapped_avframe
Stream mapping:
  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
frame= 1500 fps=156 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=6.22x
video:545kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s
$ time ./ffmpeg -benchmark -i out.avi -f null -
ffmpeg version N-63402-g308188b Copyright (c) 2000-2014 the FFmpeg developers
  built on Jan  7 2017 16:23:03 with gcc 4.7 (SUSE Linux)
  configuration: --cc='gcc -m32' cxx='gcc -m32' --enable-libutvideo --disable-decoder=utvideo --enable-gpl
  libavutil      52. 86.100 / 52. 86.100
  libavcodec     55. 64.100 / 55. 64.100
  libavformat    55. 40.100 / 55. 40.100
  libavdevice    55. 13.101 / 55. 13.101
  libavfilter     4.  5.100 /  4.  5.100
  libswscale      2.  6.100 /  2.  6.100
  libswresample   0. 19.100 /  0. 19.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, avi, from 'out.avi':
  Metadata:
    encoder         : Lavf55.48.100
  Duration: 00:01:00.00, start: 0.000000, bitrate: 171715 kb/s
    Stream #0:0: Video: utvideo (ULRG / 0x47524C55), bgr24, 1920x1080, 171823 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 25 tbn, 25 tbc
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf55.40.100
    Stream #0:0: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 90k tbn, 25 tbc
    Metadata:
      encoder         : Lavc55.64.100 rawvideo
Stream mapping:
  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
Press [q] to stop, [?] for help
[null @ 0x9cd2c00] Encoder did not produce proper pts, making some up.
frame= 1500 fps=132 q=0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
video:94kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 19215359412273152.000000%
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

comment:28 Changed 21 months ago by richardpl

  • Resolution set to worksforme
  • Status changed from reopened to closed

You are comparing decoding speed with crystal ball?
Compare single threaded decoding.

Last edited 21 months ago by richardpl (previous) (diff)

comment:29 follow-up: Changed 21 months ago by Cigaes

I read in the report:

  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s

and

  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?

comment:30 Changed 21 months ago by cehoyos

  • Resolution worksforme deleted
  • Status changed from closed to reopened

comment:31 in reply to: ↑ 29 Changed 21 months ago by richardpl

Replying to Cigaes:

I read in the report:

  Stream #0:0 -> #0:0 (utvideo (native) -> wrapped_avframe (native))
bench: utime=69.007s
bench: maxrss=80832kB

real    0m9.651s
user    1m9.011s
sys     0m0.670s

and

  Stream #0:0 -> #0:0 (libutvideo -> rawvideo)
bench: utime=40.979s
bench: maxrss=22828kB

real    0m11.342s
user    0m40.982s
sys     0m2.014s

I find that rather convincing: the older ffmpeg with libutvideo decoding is 68% faster than the current one with native decoding. Is there something wrong?

One gives more FPS than other.

comment:32 Changed 21 months ago by Cigaes

So I guess you are looking at the "real" time instead of the "user" time. The "user" time is usually more relevant; differences in "real" time may mean other processes getting scheduled or slow input from disk.

Still, let us assume it is not the case here. We can compute the efficiency. For libutvideo: 40.982 user for 11.342 real means 361% CPU use; for the native decoder, 69.011 user for 9.651 real means 715% CPU use.

361% versus 715% looks like ~90% of respectively 4 and 8 threads. Would this be a quad-code hyper-threaded system?

Anyway, the native decoder is still not on par with the library, this bug cannot be closed.

comment:33 Changed 21 months ago by richardpl

I will close it until he provides numbers for single thread decoding.

comment:34 Changed 21 months ago by Cigaes

Please do not. The quoted report already show more than 60% in user time with default options, which is significant.

comment:35 Changed 21 months ago by richardpl

Also he is only testing rgb input.

comment:36 Changed 21 months ago by Cigaes

... and the library performs better. Your point?

comment:37 Changed 15 months ago by richardpl

  • Resolution set to wontfix
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.