Opened 3 years ago

Last modified 3 years ago

#3921 new enhancement

Faster RDFT

Reported by: ubitux Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: fft rdft
Cc: alexander@kojevnikov.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Our RDFT code is too slow, at least on x86. See http://kojevnikov.com/faster-fast-fourier-transform.html

We probably want to add a ff_rdft_calc_sse/avx (and/or maybe consider another algorithm?).

Change History (5)

comment:1 Changed 3 years ago by cehoyos

  • Priority changed from normal to wish

comment:2 Changed 3 years ago by alexk

  • Cc alexander@kojevnikov.com added

comment:3 follow-up: Changed 3 years ago by kurosu

Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).

comment:4 Changed 3 years ago by ubitux

The blog post references https://github.com/alexkay/fft-bench which works with a few adjustments.

The benchmark was using an i7 920 (no AVX). On an AVX capable CPU (i7-3520M) I obtained this:

lavc/i  9       56
fftw/i  9       53
fftw/o  9       51
djbf/i  9       88
lavc/i  10      59
fftw/i  10      52
fftw/o  10      52
djbf/i  10      98
lavc/i  11      60
fftw/i  11      50
fftw/o  11      53
djbf/i  11      106
lavc/i  12      66
fftw/i  12      71
fftw/o  12      56
djbf/i  12      115
lavc/i  13      81
fftw/i  13      71
fftw/o  13      65
djbf/i  13      126

I started having a look but no ASM is yet written, only the C boilerplate to enable the potential optimized code.

comment:5 in reply to: ↑ 3 Changed 3 years ago by ubitux

Replying to kurosu:

Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).

It tests for nbits from 9 to 13, so from 512 to 8192. It sounds pretty appropriate.

Note: we use the RDFT code in area similar to what spek does; see showspectrum filter.

Note: See TracTickets for help on using tickets.