Opened 6 years ago
Last modified 3 years ago
#7456 new defect
aomdec decodes video faster than libaom-av1 decoder in ffmpeg
Reported by: | kagami | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | git-master | Keywords: | libaom |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
I'm using libaom@af3e5cc compiled with -DCONFIG_LOWBITDEPTH=1
and aomdec seems to decode sample video faster than ffmpeg wrapper for it:
Get sample
$ youtube-dl -f 399 -o dua.mp4 k2qgadSvNyU $ ffmpeg -i dua.mp4 -c copy -frames:v 250 dua.ivf
Compare
$ time aomdec -o /dev/null --threads=8 dua.ivf 246% cpu 2.987 total $ time ffmpeg -i dua.ivf -nostats -f null - ffmpeg version git-2018-09-22-59256de Copyright (c) 2000-2018 the FFmpeg developers built with gcc 8.2.0 (Gentoo 8.2.0-r2 p1.2) configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --docdir=/usr/share/doc/ffmpeg-9999/html --mandir=/usr/share/man --enable-shared --cc=x86_64-pc-linux-gnu-gcc --cxx=x86_64-pc-linux-gnu-g++ --ar=x86_64-pc-linux-gnu-ar --optflags='-O2 -pipe -march=native' --extra-cflags=-I/opt/cuda/include --nvccflags='-O2 -v --compiler-bindir "/usr/x86_64-pc-linux-gnu/gcc-bin/7.3.0" --compiler-options "-O2 -pipe -march=native" --linker-options "-O1,--as-needed"' --enable-cuda-sdk --disable-static --enable-avfilter --enable-avresample --disable-stripping --disable-libcelt --enable-nonfree --disable-indev=v4l2 --disable-outdev=v4l2 --disable-indev=oss --disable-indev=jack --disable-outdev=oss --enable-bzlib --disable-runtime-cpudetect --disable-debug --disable-gcrypt --disable-gnutls --disable-gmp --enable-gpl --enable-hardcoded-tables --enable-iconv --disable-libtls --disable-lzma --enable-network --enable-opencl --enable-openssl --enable-postproc --disable-libsmbclient --enable-ffplay --enable-sdl2 --disable-vaapi --enable-vdpau --enable-xlib --enable-libxcb --enable-libxcb-shm --enable-libxcb-xfixes --enable-zlib --disable-libcdio --disable-libiec61883 --disable-libdc1394 --disable-libcaca --disable-openal --enable-opengl --disable-libv4l2 --disable-libpulse --disable-libdrm --disable-libopencore-amrwb --disable-libopencore-amrnb --disable-libcodec2 --disable-libfdk-aac --disable-libopenjpeg --disable-libbluray --disable-libgme --disable-libgsm --disable-mmal --disable-libmodplug --enable-libopus --disable-libilbc --disable-librtmp --disable-libssh --disable-libspeex --enable-librsvg --enable-ffnvcodec --enable-libvorbis --enable-libvpx --disable-libzvbi --disable-appkit --disable-libbs2b --disable-chromaprint --disable-libflite --disable-frei0r --disable-libfribidi --enable-fontconfig --disable-ladspa --enable-libass --disable-lv2 --enable-libfreetype --disable-librubberband --disable-libzmq --enable-libzimg --disable-libsoxr --enable-pthreads --disable-libvo-amrwbenc --enable-libmp3lame --disable-libkvazaar --enable-libaom --disable-libopenh264 --disable-libsnappy --disable-libtheora --disable-libtwolame --disable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --disable-libxvid --disable-gnutls --disable-armv5te --disable-armv6 --disable-armv6t2 --disable-neon --disable-vfp --disable-vfpv3 --disable-armv8 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu --disable-altivec --disable-amd3dnow --disable-amd3dnowext --disable-avx2 --disable-fma3 --disable-fma4 --disable-xop --cpu=host --disable-doc --disable-htmlpages --enable-manpages libavutil 56. 19.101 / 56. 19.101 libavcodec 58. 30.100 / 58. 30.100 libavformat 58. 18.102 / 58. 18.102 libavdevice 58. 4.103 / 58. 4.103 libavfilter 7. 32.100 / 7. 32.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 2.100 / 5. 2.100 libswresample 3. 2.100 / 3. 2.100 libpostproc 55. 2.100 / 55. 2.100 [libaom-av1 @ 0x564868d0ee20] v1.0.0 Input #0, ivf, from 'dua.ivf': Duration: 00:00:10.00, start: 0.000000, bitrate: 940 kb/s Stream #0:0: Video: av1 (Main) (av01 / 0x31307661), yuv420p(tv, bt709), 1920x1080, 25 fps, 25 tbr, 12800 tbn, 12800 tbc [libaom-av1 @ 0x564868d0f5a0] v1.0.0 Stream mapping: Stream #0:0 -> #0:0 (av1 (libaom-av1) -> wrapped_avframe (native)) Press [q] to stop, [?] for help Output #0, null, to 'pipe:': Metadata: encoder : Lavf58.18.102 Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc Metadata: encoder : Lavc58.30.100 wrapped_avframe frame= 250 fps= 64 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.55x video:131kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 249% cpu 3.974 total
With -DCONFIG_LOWBITDEPTH=0
decoding speed is about the same (slower for aomdec).
Attachments (1)
Change History (10)
by , 6 years ago
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Yes, timing is about the same:
$ time ./ffmpeg -i dua.ivf -nostats -f null - ffmpeg version N-92054-ga7429d853d Copyright (c) 2000-2018 the FFmpeg developers built with gcc 8.2.0 (Gentoo 8.2.0-r2 p1.2) configuration: --enable-libaom libavutil 56. 19.101 / 56. 19.101 libavcodec 58. 31.100 / 58. 31.100 libavformat 58. 18.102 / 58. 18.102 libavdevice 58. 4.104 / 58. 4.104 libavfilter 7. 32.100 / 7. 32.100 libswscale 5. 2.100 / 5. 2.100 libswresample 3. 2.100 / 3. 2.100 [libaom-av1 @ 0x559d38735fc0] v1.0.0 Input #0, ivf, from 'dua.ivf': Duration: 00:00:10.00, start: 0.000000, bitrate: 940 kb/s Stream #0:0: Video: av1 (Main) (av01 / 0x31307661), yuv420p(tv, bt709), 1920x1080, 25 fps, 25 tbr, 12800 tbn, 12800 tbc [libaom-av1 @ 0x559d38739a80] v1.0.0 Stream mapping: Stream #0:0 -> #0:0 (av1 (libaom-av1) -> wrapped_avframe (native)) Press [q] to stop, [?] for help Output #0, null, to 'pipe:': Metadata: encoder : Lavf58.18.102 Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc Metadata: encoder : Lavc58.31.100 wrapped_avframe frame= 250 fps= 64 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.58x video:131kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 253% cpu 3.925 total
comment:4 by , 6 years ago
For comparison, perf output for aomdec:
Children Self Command Shared Object Symbol + 62.70% 0.00% aom tile worker libpthread-2.27.so [.] start_thread + 62.70% 0.01% aom tile worker aomdec [.] thread_loop + 62.67% 0.00% aom tile worker libc-2.27.so [.] __GI___clone (inlined) + 41.88% 0.06% aom tile worker aomdec [.] row_mt_worker_hook + 40.55% 0.76% aom tile worker aomdec [.] decode_partition + 31.72% 0.69% aom tile worker aomdec [.] decode_token_recon_block + 25.32% 0.32% aom tile worker aomdec [.] predict_inter_block + 24.63% 1.04% aom tile worker aomdec [.] dec_build_inter_predictors + 22.37% 0.22% aom tile worker aomdec [.] av1_make_inter_predictor + 20.27% 0.00% aomdec aomdec [.] main_loop + 17.83% 0.00% aomdec libc-2.27.so [.] __libc_start_main + 17.83% 0.00% aomdec aomdec [.] main + 17.79% 0.00% aomdec aomdec [.] _start + 14.45% 0.03% aom tile worker aomdec [.] loop_filter_row_worker + 14.37% 0.00% aomdec aomdec [.] decoder_decode + 14.36% 0.00% aomdec aomdec [.] execute + 14.35% 0.00% aomdec aomdec [.] aom_decode_frame_from_obus + 14.35% 0.01% aomdec aomdec [.] av1_receive_compressed_data + 14.34% 0.00% aomdec aomdec [.] frame_worker_hook + 14.30% 0.00% aomdec aomdec [.] aom_codec_decode + 13.42% 13.38% aom tile worker aomdec [.] av1_highbd_jnt_convolve_2d_avx2 + 11.74% 0.02% aomdec aomdec [.] av1_decode_tg_tiles_and_wrapup + 10.97% 2.65% aom tile worker aomdec [.] parse_decode_block + 10.57% 0.00% aomdec [unknown] [.] 0xffffffffffffffff + 8.91% 3.78% aom tile worker aomdec [.] av1_filter_block_plane_vert + 7.61% 0.67% aomdec aomdec [.] av1_cdef_frame + 7.37% 0.01% aomdec aomdec [.] row_mt_worker_hook + 7.19% 0.13% aomdec aomdec [.] decode_partition + 6.19% 0.01% aom tile worker aomdec [.] loop_restoration_row_worker + 5.89% 5.87% aom tile worker libc-2.27.so [.] __memmove_avx_unaligned_erms + 5.74% 0.40% aomdec aomdec [.] cdef_filter_fb + 5.47% 2.61% aom tile worker aomdec [.] av1_filter_block_plane_horz + 4.91% 0.13% aomdec aomdec [.] decode_token_recon_block + 4.69% 0.26% aom tile worker aomdec [.] av1_read_mode_info + 4.53% 0.00% aom tile worker aomdec [.] av1_foreach_rest_unit_in_row + 4.40% 0.02% aom tile worker aomdec [.] av1_loop_restoration_filter_unit + 4.39% 0.00% aom tile worker aomdec [.] filter_frame_on_unit + 4.11% 4.08% aomdec aomdec [.] aom_img_downshift + 4.08% 0.05% aomdec aomdec [.] predict_inter_block + 3.98% 0.19% aomdec aomdec [.] dec_build_inter_predictors + 3.91% 0.71% aom tile worker aomdec [.] read_inter_block_mode_info
vs. ffmpeg
+ 49.85% 0.00% ffmpeg ffmpeg [.] decode_receive_frame_internal + 49.85% 0.00% ffmpeg ffmpeg [.] decode_simple_receive_frame (inlined) + 49.85% 0.00% ffmpeg ffmpeg [.] decode_simple_internal (inlined) + 49.85% 8.42% ffmpeg ffmpeg [.] aom_decode + 49.84% 0.00% ffmpeg ffmpeg [.] avcodec_send_packet + 49.75% 0.01% ffmpeg ffmpeg [.] transcode + 49.69% 0.00% ffmpeg ffmpeg [.] transcode_step (inlined) + 49.66% 0.00% ffmpeg ffmpeg [.] process_input (inlined) + 49.62% 0.00% ffmpeg ffmpeg [.] process_input_packet + 49.62% 0.01% ffmpeg ffmpeg [.] decode_video + 49.60% 0.00% ffmpeg ffmpeg [.] decode (inlined) + 41.75% 0.00% ffmpeg ffmpeg [.] main + 41.66% 0.00% ffmpeg libc-2.27.so [.] __libc_start_main + 41.37% 0.00% ffmpeg ffmpeg [.] aom_codec_decode + 41.37% 0.00% ffmpeg ffmpeg [.] decoder_decode + 41.36% 0.00% ffmpeg ffmpeg [.] decode_one (inlined) + 41.35% 0.00% ffmpeg ffmpeg [.] execute + 41.34% 0.00% ffmpeg ffmpeg [.] frame_worker_hook + 41.34% 0.01% ffmpeg ffmpeg [.] av1_receive_compressed_data + 41.32% 0.01% ffmpeg ffmpeg [.] aom_decode_frame_from_obus + 40.94% 0.00% ffmpeg ffmpeg [.] _start + 38.08% 0.00% ffmpeg ffmpeg [.] read_one_tile_group_obu (inlined) + 38.08% 0.02% ffmpeg ffmpeg [.] av1_decode_tg_tiles_and_wrapup + 25.70% 1.13% ffmpeg ffmpeg [.] dec_build_inter_predictors + 24.06% 0.33% ffmpeg ffmpeg [.] av1_make_inter_predictor + 23.14% 0.00% ffmpeg ffmpeg [.] highbd_inter_predictor (inlined) + 21.21% 0.00% ffmpeg ffmpeg [.] decode_tiles (inlined) + 21.21% 0.47% ffmpeg ffmpeg [.] decode_partition + 21.17% 0.00% ffmpeg ffmpeg [.] decode_tile (inlined) + 20.62% 0.73% ffmpeg ffmpeg [.] parse_decode_block + 19.63% 0.00% ffmpeg [unknown] [k] 0xffffffffffffffff + 13.60% 13.56% ffmpeg ffmpeg [.] av1_highbd_jnt_convolve_2d_avx2 + 13.29% 0.41% ffmpeg ffmpeg [.] decode_token_recon_block + 13.11% 0.01% ffmpeg ffmpeg [.] av1_loop_filter_frame + 13.11% 0.00% ffmpeg ffmpeg [.] loop_filter_rows (inlined) + 12.43% 0.00% ffmpeg ffmpeg [.] inv_txfm2d_add_facade (inlined) + 12.37% 0.00% ffmpeg ffmpeg [.] inv_txfm2d_add_c (inlined) + 9.72% 0.00% ffmpeg ffmpeg [.] _mm256_loadu_si256 (inlined) + 8.70% 3.67% ffmpeg ffmpeg [.] av1_filter_block_plane_vert + 8.45% 0.00% ffmpeg ffmpeg [.] image_copy_16_to_8 (inlined) + 8.38% 0.45% ffmpeg ffmpeg [.] av1_cdef_frame + 7.53% 0.00% ffmpeg ffmpeg [.] inverse_transform_block (inlined) + 7.42% 0.00% ffmpeg ffmpeg [.] decode_reconstruct_tx (inlined) + 7.38% 0.02% ffmpeg ffmpeg [.] av1_inverse_transform_block + 6.83% 0.00% ffmpeg ffmpeg [.] decode_mbmi_block (inlined) + 6.57% 0.58% ffmpeg ffmpeg [.] cdef_filter_fb + 6.18% 0.26% ffmpeg ffmpeg [.] av1_read_mode_info + 5.77% 0.00% ffmpeg ffmpeg [.] set_lpf_parameters (inlined) + 5.70% 0.00% ffmpeg ffmpeg [.] read_inter_frame_mode_info (inlined) + 5.24% 0.83% ffmpeg ffmpeg [.] read_inter_block_mode_info
comment:5 by , 6 years ago
comment:6 by , 6 years ago
row-mt is enabled by default in recent libaom versions. I don't think the patch is needed.
comment:9 by , 3 years ago
I just tested this with the latest git (3.1.2-702-g3adb660d for aom). libaom-av1 via FFmpeg seems to be about 20% slower than aomdec by itself.
Also, -DCONFIG_LOWBITDEPTH=1 is no longer a valid build flag, and a message at the start of the build says to use -DFORCE_HIGHBITDEPTH_DECODING=0 instead. However, it's unclear whether this actually does anything, since performance seems identical with and without this option.
My test on a system with 4 logical cores:
$ youtube-dl -f 399 -o av1.mp4 umyglbDr4IE $ ffmpeg -i av1.mp4 -c:v copy -frames:v 1800 av1.ivf $ /usr/bin/time -f "\ntime\t%E\nCPU\t%P\nRAM\t%Mk" ffmpeg -c:v libaom-av1 -threads 4 -i av1.ivf -nostats -f null /dev/null $ /usr/bin/time -f "\ntime\t%E\nCPU\t%P\nRAM\t%Mk" aomdec -o /dev/null --threads=4 av1.ivf
dav1d is the default encoder nowadays in builds that include it, and therefore it's necessary to define libaom-av1 as the decoder.
FFmpeg libaom-av1 performance:
ffmpeg time 0:33.81 CPU 282% RAM 151252k
aomdec performance:
time 0:27.08 CPU 283% RAM 68280k
The results stayed the same across multiple runs with very little variance.
Please test if you see the same performance loss if you build FFmpeg with
./configure --enable-libaom && make ffmpeg
, Gentoo builds are known to have performance issues.