Opened 2 months ago

Last modified 4 days ago

#7797 new defect

AVC->MPEG-2 transcoding with VA-API 2-3x slower than with QSV

Reported by: eero-t Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Setup:

  • Ubuntu 18.04 with drm-tip 5.x kernel from Git
  • iHD media driver, MediaSDK and FFmpeg built from Git

Summary of the bug:

  • Bad VA-API performance with transcoding. Doing AVC -> MPEG-2 transcoding with QSV is 2-3x faster than using VA-API

How to reproduce:

$ export LIBVA_DRIVER_NAME=iHD
$ ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v h264_qsv -i 720x480p_30.00_4mb_h264_cabac_180s.264 -c:v mpeg2_qsv -b:v 2000K -compression_level 4 -y output.mpg
ffmpeg version N-93330-g7ff89574c7 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-27ubuntu1~18.04)
...
Input #0, h264, from 'input/720x480p_30.00_4mb_h264_cabac_180s.264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (High), 1 reference frame, yuv420p(tv, smpte170m, progressive, left), 720x480 [SAR 10:11 DAR 15:11], 30 fps, 30 tbr, 1200k tbn, 60 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mpeg2video (mpeg2_vaapi))
Press [q] to stop, [?] for help
[h264 @ 0x55a62883c380] Reinit context to 720x480, pix_fmt: vaapi_vld
[graph 0 input from stream 0:0 @ 0x55a628872f40] w:720 h:480 pixfmt:vaapi_vld tb:1/1200000 fr:30/1 sar:10/11 sws_param:flags=2
[mpeg2_vaapi @ 0x55a62883eb80] Input surface format is nv12.
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI profile VAProfileMPEG2Main (1).
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI entrypoint VAEntrypointEncSlice (6).
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI render target format YUV420 (0x1).
[mpeg2_vaapi @ 0x55a62883eb80] RC mode: VBR.
[mpeg2_vaapi @ 0x55a62883eb80] RC target: 50% of 4000000 bps over 500 ms.
[mpeg2_vaapi @ 0x55a62883eb80] RC buffer: 2000000 bits, initial fullness 1500000 bits.
[mpeg2_vaapi @ 0x55a62883eb80] RC framerate: 30/1 (30.00 fps).
[mpeg2_vaapi @ 0x55a62883eb80] Using intra, P- and B-frames (supported references: 1 / 1).
[mpeg2_vaapi @ 0x55a62883eb80] Driver does not support some wanted packed headers (wanted 0x3, found 0x10).
[mpeg2_vaapi @ 0x55a62883eb80] Sample aspect ratio 10:11 is not representable, signalling square pixels instead.
[mpeg @ 0x55a62883a580] VBV buffer size not set, using default size of 230KB
If you want the mpeg file to be compliant to some specification
Like DVD, VCD or others, make sure you set the correct buffer size
Output #0, mpeg, to 'output/0039_SD03MP2_1.0.mpg':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Video: mpeg2video (mpeg2_vaapi) (Main), vaapi_vld, 720x480 [SAR 10:11 DAR 15:11], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc
    Metadata:
      encoder         : Lavc58.47.103 mpeg2_vaapi
...

And QSV:

$ ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i 720x480p_30.00_4mb_h264_cabac_180s.264 -c:v mpeg2_vaapi -b:v 2000K -compression_level 4 -y output.mpg
...
[AVHWDeviceContext @ 0x565392bd4280] Initialize MFX session: API version is 1.28, implementation version is 1.28
[AVHWDeviceContext @ 0x565392bd4280] MFX compile/runtime API: 1.28/1.28
[AVHWDeviceContext @ 0x565392bf2f00] VAAPI driver: Intel iHD driver - 1.0.0.
[AVHWDeviceContext @ 0x565392bf2f00] Driver not found in known nonstandard list, using standard behaviour.
[graph 0 input from stream 0:0 @ 0x565392d785c0] w:720 h:480 pixfmt:qsv tb:1/1200000 fr:30/1 sar:10/11 sws_param:flags=2
[mpeg2_qsv @ 0x565392bd1f40] Using the variable bitrate (VBR) ratecontrol method
[AVHWDeviceContext @ 0x565392cfc340] VAAPI driver: Intel iHD driver - 1.0.0.
[AVHWDeviceContext @ 0x565392cfc340] Driver not found in known nonstandard list, using standard behaviour.
[mpeg2_qsv @ 0x565392bd1f40] profile: main; level: 8
[mpeg2_qsv @ 0x565392bd1f40] GopPicSize: 250; GopRefDist: 4; GopOptFlag: closed ; IdrInterval: 0
[mpeg2_qsv @ 0x565392bd1f40] TargetUsage: 4; RateControlMethod: VBR
[mpeg2_qsv @ 0x565392bd1f40] BufferSizeInKB: 500; InitialDelayInKB: 500; TargetKbps: 2000; MaxKbps: 2000; BRCParamMultiplier: 1
[mpeg2_qsv @ 0x565392bd1f40] NumSlice: 30; NumRefFrame: 0
[mpeg2_qsv @ 0x565392bd1f40] RateDistortionOpt: unknown
[mpeg2_qsv @ 0x565392bd1f40] RecoveryPointSEI: unknown IntRefType: 0; IntRefCycleSize: 0; IntRefQPDelta: 0
[mpeg2_qsv @ 0x565392bd1f40] MaxFrameSize: 0; MaxSliceSize: 0; 
[mpeg2_qsv @ 0x565392bd1f40] BitrateLimit: unknown; MBBRC: unknown; ExtBRC: unknown
[mpeg2_qsv @ 0x565392bd1f40] Trellis: auto
[mpeg2_qsv @ 0x565392bd1f40] VDENC: OFF
[mpeg2_qsv @ 0x565392bd1f40] RepeatPPS: unknown; NumMbPerSlice: 0; LookAheadDS: unknown
[mpeg2_qsv @ 0x565392bd1f40] AdaptiveI: unknown; AdaptiveB: unknown; BRefType: auto
[mpeg2_qsv @ 0x565392bd1f40] MinQPI: 0; MaxQPI: 0; MinQPP: 0; MaxQPP: 0; MinQPB: 0; MaxQPB: 0
[mpeg2_qsv @ 0x565392bd1f40] FrameRateExtD: 1; FrameRateExtN: 30 
[mpeg @ 0x565392bd1500] VBV buffer size not set, using default size of 230KB
If you want the mpeg file to be compliant to some specification
Like DVD, VCD or others, make sure you set the correct buffer size
Output #0, mpeg, to 'output/0039_SD03MP2_1.0.mpg':
    Metadata:
      encoder         : Lavc58.47.103 mpeg2_qsv
    Side data:
      cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 0 vbv_delay: -1
...

GPU is running at full speed in both cases, so this isn't related to ticket #7690. It could be related to regression #7706, but I can't test it because ticket #7650 ("invalid RC mode") was fixed only after that regression.

When looking at CPU utilization and power usage, QSV utilizes more CPU, but has also more iowait, and correspondingly, it's using both more CPU and GPU power than VA-API. Maybe VA-API isn't running asynchronously enough?

There are also (AVC) transcode single-stream cases where VA-API is slower, but gap is much smaller, and if one runs multiple processes in parallel, VA-API is actually slightly faster. In this case, VA-API is slower also with multiple parallel transcode processes.

I'm seeing similar perf gap on all the Core devices [1] currently supported by iHD: BDW, SKL, KBL & CFL, both on GT2 & GT3e devices i.e. issue isn't platform specific.

[1] This test-case doesn't work on the only GEN9+ non-core device I have (BXT/APL).

Extra info:

  • With a larger 1280x720p_29.97_10mb_h264_cabac input, performance gap was still about same >2x
  • When using even larger 1920x1080i_29.97_20mb_mpeg2_high as input, gap decreased to ~25%, but performance with both APIs had also dropped to a fraction.

Change History (1)

comment:1 Changed 4 days ago by Brainiarc7

Same issue here.
QSV, with the same source, is literally 2x faster than VAAPI.
Tested on CFL, with current ffmpeg git head.

Note: See TracTickets for help on using tickets.