Opened 5 years ago

Closed 5 years ago

#2540 closed defect (fixed)

-threads with libx264rgb do not work

Reported by: hirschhornsalz Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: libx264
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

-threads options is ignored when using it with -c:v libx264rgb.

Details:

I use ffmpeg to grab video from a video game (World of Warcraft). The minimum requirements to successfully grab are 1920x1080 (HD) at 30 fps, with sound. The video data needs to be comressed somewhat, because the raw video stream of about 240 MByte/s isn't exactly easy manageable. I use libx264 with -preset ultrafast, which works reasonably well, because it runs on multiple cores (I use -threads 4), which get rarely over 30% usage.

After upgrading from 0.11.7 to 1.2 I was no longer able to grab at the required frame rate - the bottleneck seems to be the RGB to YUV conversion, which seems to be a lot slower in newer versions.

So I tried -c:v libx264rgb to avoid the costly rgb to yuv conversion, only to discover that it was even slower. The culprit is that threading is disabled. After enabling threading in the source code libx264rgb runs reasonably, and it outperforms the YUV version as expected.

How to reproduce:

% ffmpeg -f x11grab -r 100 -s 1920x1080 -c:v libx264rgb -preset ultrafast -threads 4 -crf 20 test.avi

frame=  208 fps= 27 q=14.0 Lsize=    1524kB time=00:00:09.09 bitrate=1373.8kbits/s 


After changing line 746 in libavcodec/libx264.c from

    .capabilities   = CODEC_CAP_DELAY,

to

    .capabilities   = CODEC_CAP_DELAY | CODEC_CAP_AUTO_THREADS,


% ffmpeg -f x11grab -r 100 -s 1920x1080 -c:v libx264rgb -preset ultrafast -threads 4 -crf 20 test.avi

frame=  757 fps= 94 q=-1.0 Lsize=    5238kB time=00:00:09.38 bitrate=4574.7kbits/s 

Change History (5)

comment:2 Changed 5 years ago by cehoyos

  • Keywords libx264 added

Please send a patch fixing the threading issue to ffmpeg-devel.

comment:3 follow-up: Changed 5 years ago by Cigaes

Regarding your original problem, maybe look at the pixel format, I believe the default format negotiation has changed at some point before 1.2: you probably were encoding for yuv420p and now for yuv444p, which has better quality but is slower; -pix_fmt yuv420p should fix it.

I do not know what will be faster: on one hand, yuv420p has the colorspace conversion, on the other hand rgb is not subsamples, benchmark is needed. Also, please remember that H.264 RGB is not standard.

And of course, you should submit your patch to ffmpeg-devel as cehoyos suggested.

comment:4 in reply to: ↑ 3 Changed 5 years ago by hirschhornsalz

Replying to Cigaes:

Regarding your original problem, maybe look at the pixel format, I believe the default format negotiation has changed at some point before 1.2: you probably were encoding for yuv420p and now for yuv444p, which has better quality but is slower; -pix_fmt yuv420p should fix it.

I do not know what will be faster: on one hand, yuv420p has the colorspace conversion, on the other hand rgb is not subsamples, benchmark is needed.

Thank you for this suggestion, very good idea. I did a short test with 1.2 and yuv420p - and it is indeed faster than yuv444p but OTOH not fast as 0.11.3.

Maximum frame rate for HD video capturing using x11grab and x264

1.2 with yuv444p        27 fps
1.2 with yuv420p        37 fps
1.2 with rgb            94 fps
0.11.3 with yuv420p     74 fps

Interesting is the oprofile sample test for the 1.2+yuv420p:

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
1137688  42.1727  libswscale.so.2.2.100    hScale16To15_c
244454    9.0616  libswscale.so.2.2.100    bgr32ToUV_half_c
227943    8.4496  libswscale.so.2.2.100    bgr32ToY_c
160825    5.9616  libswscale.so.2.2.100    yuv2plane1_8_c
130781    4.8479  libc-2.17.so             __memcpy_ssse3_back
105577    3.9136  libx264.so.125           x264_prefetch_ref_mmx2
......... more libx264 stuff....

The hScale16To15_c function doesn't even show up on 0.11.3, and it seems to be a bottleneck.

And of course, you should submit your patch to ffmpeg-devel as cehoyos suggested.

Will do.

comment:5 Changed 5 years ago by cehoyos

Please open a new ticket for the performance regression, mixing different problems in one ticket makes following the tickets impossible. Please don't forget to post command lines and console output of 0.11 and 1.2 to allow us reproducing the issue.

comment:6 Changed 5 years ago by cehoyos

  • Resolution set to fixed
  • Status changed from new to closed

Fixed by you.
If there is a performance regression, please report it in a new ticket!

Note: See TracTickets for help on using tickets.