Opened 7 years ago

Last modified 7 years ago

#3921 new enhancement

Faster RDFT

Reported by: Clément Bœsch Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: fft rdft
Cc: alexander@kojevnikov.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Our RDFT code is too slow, at least on x86. See http://kojevnikov.com/faster-fast-fourier-transform.html

We probably want to add a ff_rdft_calc_sse/avx (and/or maybe consider another algorithm?).

Change History (5)

comment:1 by Carl Eugen Hoyos, 7 years ago

Priority: normalwish

comment:2 by Alexander Kojevnikov, 7 years ago

Cc: alexander@kojevnikov.com added

comment:3 by Christophe, 7 years ago

Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).

comment:4 by Clément Bœsch, 7 years ago

The blog post references https://github.com/alexkay/fft-bench which works with a few adjustments.

The benchmark was using an i7 920 (no AVX). On an AVX capable CPU (i7-3520M) I obtained this:

lavc/i  9       56
fftw/i  9       53
fftw/o  9       51
djbf/i  9       88
lavc/i  10      59
fftw/i  10      52
fftw/o  10      52
djbf/i  10      98
lavc/i  11      60
fftw/i  11      50
fftw/o  11      53
djbf/i  11      106
lavc/i  12      66
fftw/i  12      71
fftw/o  12      56
djbf/i  12      115
lavc/i  13      81
fftw/i  13      71
fftw/o  13      65
djbf/i  13      126

I started having a look but no ASM is yet written, only the C boilerplate to enable the potential optimized code.

in reply to:  3 comment:5 by Clément Bœsch, 7 years ago

Replying to kurosu:

Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).

It tests for nbits from 9 to 13, so from 512 to 8192. It sounds pretty appropriate.

Note: we use the RDFT code in area similar to what spek does; see showspectrum filter.

Note: See TracTickets for help on using tickets.