Opened 6 years ago

Last modified 3 years ago

#7456 new defect

aomdec decodes video faster than libaom-av1 decoder in ffmpeg

Reported by: kagami Owned by:
Priority: normal Component: avcodec
Version: git-master Keywords: libaom
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

I'm using libaom@af3e5cc compiled with -DCONFIG_LOWBITDEPTH=1 and aomdec seems to decode sample video faster than ffmpeg wrapper for it:

Get sample

$ youtube-dl -f 399 -o dua.mp4 k2qgadSvNyU
$ ffmpeg -i dua.mp4 -c copy -frames:v 250 dua.ivf

Compare

$ time aomdec -o /dev/null --threads=8 dua.ivf           
246% cpu 2.987 total
$ time ffmpeg -i dua.ivf -nostats -f null -
ffmpeg version git-2018-09-22-59256de Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.0 (Gentoo 8.2.0-r2 p1.2)
  configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --docdir=/usr/share/doc/ffmpeg-9999/html --mandir=/usr/share/man --enable-shared --cc=x86_64-pc-linux-gnu-gcc --cxx=x86_64-pc-linux-gnu-g++ --ar=x86_64-pc-linux-gnu-ar --optflags='-O2 -pipe -march=native' --extra-cflags=-I/opt/cuda/include --nvccflags='-O2 -v --compiler-bindir "/usr/x86_64-pc-linux-gnu/gcc-bin/7.3.0" --compiler-options "-O2 -pipe -march=native" --linker-options "-O1,--as-needed"' --enable-cuda-sdk --disable-static --enable-avfilter --enable-avresample --disable-stripping --disable-libcelt --enable-nonfree --disable-indev=v4l2 --disable-outdev=v4l2 --disable-indev=oss --disable-indev=jack --disable-outdev=oss --enable-bzlib --disable-runtime-cpudetect --disable-debug --disable-gcrypt --disable-gnutls --disable-gmp --enable-gpl --enable-hardcoded-tables --enable-iconv --disable-libtls --disable-lzma --enable-network --enable-opencl --enable-openssl --enable-postproc --disable-libsmbclient --enable-ffplay --enable-sdl2 --disable-vaapi --enable-vdpau --enable-xlib --enable-libxcb --enable-libxcb-shm --enable-libxcb-xfixes --enable-zlib --disable-libcdio --disable-libiec61883 --disable-libdc1394 --disable-libcaca --disable-openal --enable-opengl --disable-libv4l2 --disable-libpulse --disable-libdrm --disable-libopencore-amrwb --disable-libopencore-amrnb --disable-libcodec2 --disable-libfdk-aac --disable-libopenjpeg --disable-libbluray --disable-libgme --disable-libgsm --disable-mmal --disable-libmodplug --enable-libopus --disable-libilbc --disable-librtmp --disable-libssh --disable-libspeex --enable-librsvg --enable-ffnvcodec --enable-libvorbis --enable-libvpx --disable-libzvbi --disable-appkit --disable-libbs2b --disable-chromaprint --disable-libflite --disable-frei0r --disable-libfribidi --enable-fontconfig --disable-ladspa --enable-libass --disable-lv2 --enable-libfreetype --disable-librubberband --disable-libzmq --enable-libzimg --disable-libsoxr --enable-pthreads --disable-libvo-amrwbenc --enable-libmp3lame --disable-libkvazaar --enable-libaom --disable-libopenh264 --disable-libsnappy --disable-libtheora --disable-libtwolame --disable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --disable-libxvid --disable-gnutls --disable-armv5te --disable-armv6 --disable-armv6t2 --disable-neon --disable-vfp --disable-vfpv3 --disable-armv8 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu --disable-altivec --disable-amd3dnow --disable-amd3dnowext --disable-avx2 --disable-fma3 --disable-fma4 --disable-xop --cpu=host --disable-doc --disable-htmlpages --enable-manpages
  libavutil      56. 19.101 / 56. 19.101
  libavcodec     58. 30.100 / 58. 30.100
  libavformat    58. 18.102 / 58. 18.102
  libavdevice    58.  4.103 / 58.  4.103
  libavfilter     7. 32.100 /  7. 32.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
  libpostproc    55.  2.100 / 55.  2.100
[libaom-av1 @ 0x564868d0ee20] v1.0.0
Input #0, ivf, from 'dua.ivf':
  Duration: 00:00:10.00, start: 0.000000, bitrate: 940 kb/s
    Stream #0:0: Video: av1 (Main) (av01 / 0x31307661), yuv420p(tv, bt709), 1920x1080, 25 fps, 25 tbr, 12800 tbn, 12800 tbc
[libaom-av1 @ 0x564868d0f5a0] v1.0.0
Stream mapping:
  Stream #0:0 -> #0:0 (av1 (libaom-av1) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.18.102
    Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.30.100 wrapped_avframe
frame=  250 fps= 64 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.55x    
video:131kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
249% cpu 3.974 total

With -DCONFIG_LOWBITDEPTH=0 decoding speed is about the same (slower for aomdec).

Attachments (1)

dua.ivf (1.1 MB ) - added by kagami 6 years ago.

Download all attachments as: .zip

Change History (10)

by kagami, 6 years ago

Attachment: dua.ivf added

comment:1 by Carl Eugen Hoyos, 6 years ago

Please test if you see the same performance loss if you build FFmpeg with ./configure --enable-libaom && make ffmpeg, Gentoo builds are known to have performance issues.

comment:2 by kagami, 6 years ago

Yes, timing is about the same:

$ time ./ffmpeg -i dua.ivf -nostats -f null -
ffmpeg version N-92054-ga7429d853d Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.0 (Gentoo 8.2.0-r2 p1.2)
  configuration: --enable-libaom
  libavutil      56. 19.101 / 56. 19.101
  libavcodec     58. 31.100 / 58. 31.100
  libavformat    58. 18.102 / 58. 18.102
  libavdevice    58.  4.104 / 58.  4.104
  libavfilter     7. 32.100 /  7. 32.100
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
[libaom-av1 @ 0x559d38735fc0] v1.0.0
Input #0, ivf, from 'dua.ivf':
  Duration: 00:00:10.00, start: 0.000000, bitrate: 940 kb/s
    Stream #0:0: Video: av1 (Main) (av01 / 0x31307661), yuv420p(tv, bt709), 1920x1080, 25 fps, 25 tbr, 12800 tbn, 12800 tbc
[libaom-av1 @ 0x559d38739a80] v1.0.0
Stream mapping:
  Stream #0:0 -> #0:0 (av1 (libaom-av1) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.18.102
    Stream #0:0: Video: wrapped_avframe, yuv420p, 1920x1080, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.31.100 wrapped_avframe
frame=  250 fps= 64 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.58x    
video:131kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
253% cpu 3.925 total

comment:3 by Elon Musk, 6 years ago

wrapper uses memcpy to copy video frames, not our bug.

comment:4 by Tristan Matthews, 6 years ago

For comparison, perf output for aomdec:

  Children      Self  Command          Shared Object        Symbol
+   62.70%     0.00%  aom tile worker  libpthread-2.27.so   [.] start_thread
+   62.70%     0.01%  aom tile worker  aomdec               [.] thread_loop
+   62.67%     0.00%  aom tile worker  libc-2.27.so         [.] __GI___clone (inlined)
+   41.88%     0.06%  aom tile worker  aomdec               [.] row_mt_worker_hook
+   40.55%     0.76%  aom tile worker  aomdec               [.] decode_partition
+   31.72%     0.69%  aom tile worker  aomdec               [.] decode_token_recon_block
+   25.32%     0.32%  aom tile worker  aomdec               [.] predict_inter_block
+   24.63%     1.04%  aom tile worker  aomdec               [.] dec_build_inter_predictors
+   22.37%     0.22%  aom tile worker  aomdec               [.] av1_make_inter_predictor
+   20.27%     0.00%  aomdec           aomdec               [.] main_loop
+   17.83%     0.00%  aomdec           libc-2.27.so         [.] __libc_start_main
+   17.83%     0.00%  aomdec           aomdec               [.] main
+   17.79%     0.00%  aomdec           aomdec               [.] _start
+   14.45%     0.03%  aom tile worker  aomdec               [.] loop_filter_row_worker
+   14.37%     0.00%  aomdec           aomdec               [.] decoder_decode
+   14.36%     0.00%  aomdec           aomdec               [.] execute
+   14.35%     0.00%  aomdec           aomdec               [.] aom_decode_frame_from_obus
+   14.35%     0.01%  aomdec           aomdec               [.] av1_receive_compressed_data
+   14.34%     0.00%  aomdec           aomdec               [.] frame_worker_hook
+   14.30%     0.00%  aomdec           aomdec               [.] aom_codec_decode
+   13.42%    13.38%  aom tile worker  aomdec               [.] av1_highbd_jnt_convolve_2d_avx2
+   11.74%     0.02%  aomdec           aomdec               [.] av1_decode_tg_tiles_and_wrapup
+   10.97%     2.65%  aom tile worker  aomdec               [.] parse_decode_block
+   10.57%     0.00%  aomdec           [unknown]            [.] 0xffffffffffffffff
+    8.91%     3.78%  aom tile worker  aomdec               [.] av1_filter_block_plane_vert
+    7.61%     0.67%  aomdec           aomdec               [.] av1_cdef_frame
+    7.37%     0.01%  aomdec           aomdec               [.] row_mt_worker_hook
+    7.19%     0.13%  aomdec           aomdec               [.] decode_partition
+    6.19%     0.01%  aom tile worker  aomdec               [.] loop_restoration_row_worker
+    5.89%     5.87%  aom tile worker  libc-2.27.so         [.] __memmove_avx_unaligned_erms
+    5.74%     0.40%  aomdec           aomdec               [.] cdef_filter_fb
+    5.47%     2.61%  aom tile worker  aomdec               [.] av1_filter_block_plane_horz
+    4.91%     0.13%  aomdec           aomdec               [.] decode_token_recon_block
+    4.69%     0.26%  aom tile worker  aomdec               [.] av1_read_mode_info
+    4.53%     0.00%  aom tile worker  aomdec               [.] av1_foreach_rest_unit_in_row
+    4.40%     0.02%  aom tile worker  aomdec               [.] av1_loop_restoration_filter_unit
+    4.39%     0.00%  aom tile worker  aomdec               [.] filter_frame_on_unit
+    4.11%     4.08%  aomdec           aomdec               [.] aom_img_downshift
+    4.08%     0.05%  aomdec           aomdec               [.] predict_inter_block
+    3.98%     0.19%  aomdec           aomdec               [.] dec_build_inter_predictors
+    3.91%     0.71%  aom tile worker  aomdec               [.] read_inter_block_mode_info

vs. ffmpeg

+   49.85%     0.00%  ffmpeg   ffmpeg                      [.] decode_receive_frame_internal                                                                                                               
+   49.85%     0.00%  ffmpeg   ffmpeg                      [.] decode_simple_receive_frame (inlined)                                                                                                       
+   49.85%     0.00%  ffmpeg   ffmpeg                      [.] decode_simple_internal (inlined)                                                                                                            
+   49.85%     8.42%  ffmpeg   ffmpeg                      [.] aom_decode                                                                                                                                  
+   49.84%     0.00%  ffmpeg   ffmpeg                      [.] avcodec_send_packet                                                                                                                         
+   49.75%     0.01%  ffmpeg   ffmpeg                      [.] transcode                                                                                                                                   
+   49.69%     0.00%  ffmpeg   ffmpeg                      [.] transcode_step (inlined)                                                                                                                    
+   49.66%     0.00%  ffmpeg   ffmpeg                      [.] process_input (inlined)                                                                                                                     
+   49.62%     0.00%  ffmpeg   ffmpeg                      [.] process_input_packet                                                                                                                        
+   49.62%     0.01%  ffmpeg   ffmpeg                      [.] decode_video                                                                                                                                
+   49.60%     0.00%  ffmpeg   ffmpeg                      [.] decode (inlined)                                                                                                                            
+   41.75%     0.00%  ffmpeg   ffmpeg                      [.] main                                                                                                                                        
+   41.66%     0.00%  ffmpeg   libc-2.27.so                [.] __libc_start_main                                                                                                                           
+   41.37%     0.00%  ffmpeg   ffmpeg                      [.] aom_codec_decode                                                                                                                            
+   41.37%     0.00%  ffmpeg   ffmpeg                      [.] decoder_decode                                                                                                                              
+   41.36%     0.00%  ffmpeg   ffmpeg                      [.] decode_one (inlined)                                                                                                                        
+   41.35%     0.00%  ffmpeg   ffmpeg                      [.] execute                                                                                                                                     
+   41.34%     0.00%  ffmpeg   ffmpeg                      [.] frame_worker_hook                                                                                                                           
+   41.34%     0.01%  ffmpeg   ffmpeg                      [.] av1_receive_compressed_data                                                                                                                 
+   41.32%     0.01%  ffmpeg   ffmpeg                      [.] aom_decode_frame_from_obus                                                                                                                  
+   40.94%     0.00%  ffmpeg   ffmpeg                      [.] _start                                                                                                                                      
+   38.08%     0.00%  ffmpeg   ffmpeg                      [.] read_one_tile_group_obu (inlined)                                                                                                           
+   38.08%     0.02%  ffmpeg   ffmpeg                      [.] av1_decode_tg_tiles_and_wrapup                                                                                                              
+   25.70%     1.13%  ffmpeg   ffmpeg                      [.] dec_build_inter_predictors                                                                                                                  
+   24.06%     0.33%  ffmpeg   ffmpeg                      [.] av1_make_inter_predictor                                                                                                                    
+   23.14%     0.00%  ffmpeg   ffmpeg                      [.] highbd_inter_predictor (inlined)                                                                                                            
+   21.21%     0.00%  ffmpeg   ffmpeg                      [.] decode_tiles (inlined)                                                                                                                      
+   21.21%     0.47%  ffmpeg   ffmpeg                      [.] decode_partition                                                                                                                            
+   21.17%     0.00%  ffmpeg   ffmpeg                      [.] decode_tile (inlined)                                                                                                                       
+   20.62%     0.73%  ffmpeg   ffmpeg                      [.] parse_decode_block                                                                                                                          
+   19.63%     0.00%  ffmpeg   [unknown]                   [k] 0xffffffffffffffff                                                                                                                          
+   13.60%    13.56%  ffmpeg   ffmpeg                      [.] av1_highbd_jnt_convolve_2d_avx2                                                                                                             
+   13.29%     0.41%  ffmpeg   ffmpeg                      [.] decode_token_recon_block                                                                                                                    
+   13.11%     0.01%  ffmpeg   ffmpeg                      [.] av1_loop_filter_frame                                                                                                                       
+   13.11%     0.00%  ffmpeg   ffmpeg                      [.] loop_filter_rows (inlined)                                                                                                                  
+   12.43%     0.00%  ffmpeg   ffmpeg                      [.] inv_txfm2d_add_facade (inlined)                                                                                                             
+   12.37%     0.00%  ffmpeg   ffmpeg                      [.] inv_txfm2d_add_c (inlined)                                                                                                                  
+    9.72%     0.00%  ffmpeg   ffmpeg                      [.] _mm256_loadu_si256 (inlined)                                                                                                                
+    8.70%     3.67%  ffmpeg   ffmpeg                      [.] av1_filter_block_plane_vert                                                                                                                 
+    8.45%     0.00%  ffmpeg   ffmpeg                      [.] image_copy_16_to_8 (inlined)                                                                                                                
+    8.38%     0.45%  ffmpeg   ffmpeg                      [.] av1_cdef_frame                                                                                                                              
+    7.53%     0.00%  ffmpeg   ffmpeg                      [.] inverse_transform_block (inlined)                                                                                                           
+    7.42%     0.00%  ffmpeg   ffmpeg                      [.] decode_reconstruct_tx (inlined)                                                                                                             
+    7.38%     0.02%  ffmpeg   ffmpeg                      [.] av1_inverse_transform_block                                                                                                                 
+    6.83%     0.00%  ffmpeg   ffmpeg                      [.] decode_mbmi_block (inlined)                                                                                                                 
+    6.57%     0.58%  ffmpeg   ffmpeg                      [.] cdef_filter_fb                                                                                                                              
+    6.18%     0.26%  ffmpeg   ffmpeg                      [.] av1_read_mode_info                                                                                                                          
+    5.77%     0.00%  ffmpeg   ffmpeg                      [.] set_lpf_parameters (inlined)                                                                                                                
+    5.70%     0.00%  ffmpeg   ffmpeg                      [.] read_inter_frame_mode_info (inlined)                                                                                                        
+    5.24%     0.83%  ffmpeg   ffmpeg                      [.] read_inter_block_mode_info
Last edited 6 years ago by Tristan Matthews (previous) (diff)

comment:6 by Hendrik, 6 years ago

row-mt is enabled by default in recent libaom versions. I don't think the patch is needed.

comment:7 by Tristan Matthews, 6 years ago

Even with row-mt enabled, ffmpeg is roughly 13% slower here.

comment:8 by Balling, 3 years ago

Is this still the case? Please retest.

comment:9 by veikk0, 3 years ago

I just tested this with the latest git (3.1.2-702-g3adb660d for aom). libaom-av1 via FFmpeg seems to be about 20% slower than aomdec by itself.

Also, -DCONFIG_LOWBITDEPTH=1 is no longer a valid build flag, and a message at the start of the build says to use -DFORCE_HIGHBITDEPTH_DECODING=0 instead. However, it's unclear whether this actually does anything, since performance seems identical with and without this option.

My test on a system with 4 logical cores:

$ youtube-dl -f 399 -o av1.mp4 umyglbDr4IE
$ ffmpeg -i av1.mp4 -c:v copy -frames:v 1800 av1.ivf
$ /usr/bin/time -f "\ntime\t%E\nCPU\t%P\nRAM\t%Mk" ffmpeg -c:v libaom-av1 -threads 4 -i av1.ivf -nostats -f null /dev/null
$ /usr/bin/time -f "\ntime\t%E\nCPU\t%P\nRAM\t%Mk" aomdec -o /dev/null --threads=4 av1.ivf

dav1d is the default encoder nowadays in builds that include it, and therefore it's necessary to define libaom-av1 as the decoder.

FFmpeg libaom-av1 performance:

ffmpeg

time	0:33.81
CPU	282%
RAM	151252k

aomdec performance:

time	0:27.08
CPU	283%
RAM	68280k

The results stayed the same across multiple runs with very little variance.

Note: See TracTickets for help on using tickets.