Changes between Initial Version and Version 1 of Hardware/VAAPI

Jul 19, 2017, 10:51:14 PM (2 years ago)



  • Hardware/VAAPI

    v1 v1  
     1== Platform Support ==
     3=== Intel / i965 ===
     5See QuickSync.
     7=== AMD / Mesa ===
     9The Mesa VAAPI driver uses the UVD (Unified Video Decoder) and VCE (Video Coding Engine) hardware found in all recent AMD graphics cards and APUs.
     11H.264, MPEG-2, MPEG-4 part 2 and VC-1 decode are supported by all GCN GPUs (since Southern Islands).  H.265 support was added with GCN 3 (Volcanic Islands) and H.265 10-bit with GCN 4 (Arctic Islands).  Older GPUs may or may not be supported.
     13MPEG-4 part 2 is disabled by default due to VAAPI limitations (the main Intel driver never implemented it, so it didn't get much testing).  Set the environment variable VAAPI_MPEG4_ENABLED to 1 to try to use it anyway.
     15H.264 encode is working on GCN GPUs, but is still incomplete.  No other codecs are supported by Mesa for encode yet.
     17Encoding and interlacing support in Mesa are incompatible because of the data layout in GPU memory.  By default, frames are separated into fields and interlaced video is supported but encoding is not.  Set the environment variable VAAPI_DISABLE_INTERLACE to 1 to be able to use the encoder (but without any interlaced video support).
     21== Device Selection ==
     23The libva driver needs to be attached to a DRM device to work.  This can be connected either directly or via a running X server.  When working standlone, it is generally best to use a DRM render node (`/dev/dri/render*`) - only use a connection via X if you actually want to deal with surfaces inside X (with DRI2, for example).
     25In ffmpeg, a named global device can be created using the `-init_hw_device` option:
     27ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD128
     30With the decode hwaccel, each input stream can then be given a previously initialised device with the `-hwaccel_device` option:
     32ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD128 -hwaccel vaapi -hwaccel_device foo -i ...
     35If only one stream is being used, `-hwaccel_device` can also accept a device path directly:
     37ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i ...
     40Where filters require a device (for example, the `hwupload` filter), the device used in a filter graph can be specified with the `-filter_hw_device` option:
     42ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD128 -i ... -filter_hw_device foo -filter_complex ...hwupload... ...
     45If you have multiple usable devices in the same machine (for example, an Intel integrated GPU and an AMD discrete graphics card), they can be used simultaneously to decode different streams:
     47ffmpeg -init_hw_device vaapi=intel:/dev/dri/renderD128 -init_hw_device vaapi=amd:/dev/dri/renderD129 -hwaccel vaapi -hwaccel_device intel -i ... -hwaccel vaapi -hwaccel_device amd -i ...
     50(See <> for more detail about these options.)
     52Finally, the `-vaapi_device` option may be more convenient in single-device cases with filters.
     54ffmpeg -vaapi_device /dev/dri/renderD128
     56acts equivalently to:
     58ffmpeg -init_hw_device vaapi=vaapi0:/dev/dri/renderD128 -filter_hw_device vaapi0
     62== Surface Formats ==
     64The hardware codecs used by VAAPI are not able to access frame data in arbitrary memory.  Therefore, all frame data needs to be uploaded to hardware surfaces connected to the appropriate device before being used.  All VAAPI hardware surfaces in ffmpeg are represented by the `vaapi` pixfmt (the internal layout is not visible here, though).
     66The hwaccel decoders normally output frames in the associated hardware format, but by default the ffmpeg utility download the output frames to normal memory before passing them to the next component.  This allows the decoder to work standlone to make decoding faster without any additional options:
     68ffmpeg -hwaccel vaapi ... -i input.mp4 -c:v libx264 ... output.mp4
     71For other outputs, the option `-hwaccel_output_format` can be used to specify the format to be used.  This can be a software format (which formats are usable depends on the driver), or it can be the `vaapi` hardware format to indicate that the surface should not be downloaded.
     73For example, to decode only and do nothing with the result:
     75ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi ... -i input.mp4 -f null -
     77This can be used to test the speed / CPU use of the decoder only (the download operation typically adds a large amount of additional overhead).
     79When decoder output is in hardware surfaces, the frames will be given to following filters or encoders in that form.  The `scale_vaapi` and `deinterlace_vaapi` filters act on `vaapi` format frames to scale and deinterlace them respecitvely.  There are also some generic filters - `hwdownload`, `hwupload` and `hwmap` - which support all hardware formats, including VAAPI (see <>).
     81For example, take an interlaced input, decode, deinterlace, scale to 720p, download to normal memory and encode with libx264:
     83ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi ... -i interlaced_input.mp4 -vf 'deinterlace_vaapi,scale_vaapi=w=1280:h=720,hwdownload,format=nv12' -c:v libx264 ... progressive_output.mp4
     87== Encoding ==
     89The encoders only accept input as VAAPI surfaces.  If the input is in normal memory, it will need to be uploaded before giving the frames to the encoder - in the ffmpeg utility, the `hwupload` filter can be used for this.  It will upload to a surface with the same layout as the software frame, so it may be necessary to add a `format` filter immediately before to get the input into the right format (hardware generally wants the `nv12` layout, but most software functions use the `yuv420p` layout).  The `hwupload` filter also requires a device to upload to, which needs to be defined before the filter graph is created.
     91So, to use the default decoder for some input, then upload frames to VAAPI and encode with H.264 and default settings:
     93ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi output.mp4
     95If the input is known to be hardware-decodable, then we can use the hwaccel:
     97ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device /dev/dri/renderD128 -i input.mp4 -c:v h264_vaapi output.mp4
     99Finally, when the input may or may not be hardware decodable we can do:
     101ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD128 -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device foo -i input.mp4 -filter_hw_device foo -vf 'format=nv12|vaapi,hwupload' -c:v h264_vaapi output.mp4
     103This works because the decoder will output either `vaapi` surfaces (if the hwaccel is usable) or software frames (if it isn't).  In the first case, it matches the `vaapi` format and `hwupload` does nothing (it passes through hardware frames unchanged).  Im the second case, it matches the `nv12` format and converts whatever the input is to that, then uploads.  Performance will likely vary by a large amount depending which path is chosen, though.
     105The supported encoders are:
     107|| H.262 / MPEG-2 part 2 || mpeg2_vaapi ||
     108|| H.264 / MPEG-4 part 10 (AVC) || h264_vaapi ||
     109|| H.265 / MPEG-H part 2 (HEVC) || hevc_vaapi ||
     110|| MJPEG / JPEG || mjpeg_vaapi ||
     111|| VP8 || vp8_vaapi ||
     112|| VP9 || vp9_vaapi ||
     114For an explanation of codec options, see <>.
     116=== Mapping options from libx264 ===
     118No CRF-like mode is currently supported.  The only constant-quality mode is CQP (constant quantisation parameter), which has no adaptivity to scene content.  It does, however, allow different quality settings for different frame types, to improve compression by spending fewer bits on unreferenced B-frames - see the `(i|b)_q(factor|offset)` options.  CQP mode cannot be combined with a maximum bitrate or buffer size.
     120CBR and VBR modes are supported, though the output of them varies significantly by driver and device (default is VBR, set `-maxrate` equal to `-b:v` for CBR).  HRD buffering options (rc_max_rate, rc_buffer_size) are functional, and the encoder will generate buffering_period and pic_timing SEI when appropriate.
     122There is no complete analogue of the `-preset` option.  The `-quality` option controls local encode quality (that is, the amount of effort expended on trying to get the best results from local choices like motion estimation and mode decision), using a nebulous per-device scale.   The argument is a small integer, from 1 up to some limit dependent on the device (not more than 8) - higher values are faster / lower stream quality.  Separately, some hardware (Intel gen9) supports a separate low-power mode with more restricted features.  It is accessible via the `-low_power` option.
     124Neither two-pass encoding nor lookahead are supported at all - only local rate control is possible.  VBR mode should do a reasonably good job at getting close to an overall bitrate target, but quality will vary significantly through a stream if the complexity varies.
     128== Full Examples ==
     130All of these examples assume the input and output files will contain one video stream (audio will need to be considered separately).  It is assumed that VAAPI is usable via the DRM device node `/dev/dri/renderD128`.
     133=== Decode-only ===
     135Decode an input with hardware if possible, output in normal memory to encode with libx264:
     137ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i input.mp4 -c:v libx264 -crf 20 output.mp4
     140Decode an input with hardware, deinterlace it if it was interlaced, downscale, then download to normal memory to encode with libx264 (will fail if the input is not supported by the hardware decoder):
     142ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -vf 'deinterlace_vaapi=rate=field:auto=1,scale_vaapi=w=640:h=360,hwdownload,format=nv12' -c:v libx264 -crf 20 output.mp4
     145Decode an input and discard the output (this can be used as a crude benchmark of the decoder):
     147ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -f null -
     151=== Encode-only ===
     153Encode an input with H.264 at 5Mbps VBR:
     155ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi -b:v 5M output.mp4
     158As previous, but use constrained baseline profile only for compatibility with old devices:
     160ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi -b:v 5M -profile 578 -bf 0 output.mp4
     163Encode with H.264 at good constant quality:
     165ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v h264_vaapi -qp 18 output.mp4
     168Encode with 10-bit H.265 at 15Mbps VBR (recent hardware required - Kaby Lake or later Intel):
     170ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=p010,hwupload' -c:v hevc_vaapi -b:v 15M -profile 2 output.mp4
     173Scale to 720p and encode with H.264 at 5Mbps CBR:
     175ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'hwupload,scale_vaapi=w=1280:h=720:format=nv12' -c:v h264_vaapi -b:v 5M -maxrate 5M output.mp4
     178Encode with VP9 at 5Mbps VBR:
     180ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v vp9_vaapi -b:v 5M output.webm
     183Encode with VP9 at good constant quality, using pseudo-B-frames to improve compression:
     185ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload' -c:v vp9_vaapi -global_quality 50 -bf 1 -bsf:v vp9_raw_reorder,vp9_superframe output.webm
     188Capture the screen from X and encode with H.264 at reasonable constant-quality:
     190ffmpeg -vaapi_device /dev/dri/renderD128 -f x11grab -video_size 1920x1080 -i :0 -vf 'hwupload,scale_vaapi=format=nv12' -c:v h264_vaapi -qp 24 output.mp4
     192Note that it is also possible to do the format conversion (RGB to YUV) on the CPU - this is slower, but might be desirable if other filters are going to be applied:
     194ffmpeg -vaapi_device /dev/dri/renderD128 -f x11grab -video_size 1920x1080 -i :0 -vf 'format=nv12,hwupload' -c:v h264_vaapi -qp 24 output.mp4
     198=== Transcode ===
     200Hardware-only transcode to H.264 at 2Mbps CBR:
     202ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -c:v h264_vaapi -b:v 2M -maxrate 2M output.mp4
     205Decode, deinterlace if interlaced, scale to 720p, encode with H.265 at 5Mbps VBR:
     207ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -vf 'deinterlace_vaapi=rate=field:auto=1,scale_vaapi=w=1280:h=720' -c:v hevc_vaapi -b:v 5M output.mp4
     210Transcode to 10-bit H.265 at 15Mbps VBR (the input can be 10-bit, but need not be):
     212ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -vf 'scale_vaapi=format=p010' -c:v hevc_vaapi -profile 2 -b:v 15M output.mp4
     215Transcode to H.264 in constrained baseline profile at level 3 and 1Mbps CBR for compatibility with old devices:
     217ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -vf 'fps=30,scale_vaapi=w=640:h=-2:format=nv12' -c:v h264_vaapi -profile 578 -level 30 -bf 0 -b:v 1M -maxrate 1M output.mp4
     220Decode the input, then pick a frame from it every 10 seconds to make a sequence of JPEG screenshots at high quality:
     222ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -r 1/10 -c:v mjpeg_vaapi -global_quality 90 -f image2 output%03d.jpeg
     225Burn subtitles into the video while transcoding:
     227ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -vf 'scale_vaapi,hwmap=mode=read+write+direct,format=nv12,ass=subtitles.ass,hwmap' -c:v h264_vaapi -b:v 2M -maxrate 2M output.mp4
     229(Note that the `scale_vaapi` filter is required here to copy the frames - without it, the subtitles would be drawn directly on the reference frames being used by the decoder at the same time.)
     231Transcode to two different outputs (one at constant-quality and one at constant-bitrate) from the same input:
     233ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -filter_complex 'split[cq][cb]' -map '[cq]' -c:v h264_vaapi -qp 18 output-cq.mp4 -map '[cb]' -c:v h264_vaapi -b:v 5M -maxrate 5M output-cb.mp4
     236Transcode for multiple streaming formats (one H.264 and one VP9, with the same parameters) from the same input:
     238ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i input.mp4 -filter_complex 'split[h264][vp9]' -map '[cq]' -c:v h264_vaapi -b:v 5M output-h264.mp4 -map '[vp9]' -c:v vp9_vaapi -b:v 5M output-vp9.webm
     241Decode on one device, download, upload to a second device, and encode:
     243ffmpeg -init_hw_device vaapi=decdev:/dev/dri/renderD128 -init_hw_device vaapi=encdev:/dev/dri/renderD129 -hwaccel vaapi -hwaccel_device decdev -hwaccel_output_format vaapi -i input.mp4 -filter_hw_device encdev -vf 'hwdownload,format=nv12,hwupload' -c:v h264_vaapi -b:v 5M output.mp4
     247=== Other ===
     249Use the VAAPI deinterlacer standalone to attempt to make a software transcode run faster (this may actually make things slower - the additional copying to the GPU and back is quite a large overhead):
     251ffmpeg -vaapi_device /dev/dri/renderD128 -i input.mp4 -vf 'format=nv12,hwupload,deinterlace_vaapi=rate=field,hwdownload,format=nv12' -c:v libx264 -crf 24 output.mp4