Opened 10 years ago
Closed 2 years ago
#3921 closed enhancement (fixed)
Faster RDFT
Reported by: | Clément Bœsch | Owned by: | |
---|---|---|---|
Priority: | wish | Component: | avcodec |
Version: | git-master | Keywords: | fft rdft |
Cc: | alexander@kojevnikov.com | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Our RDFT code is too slow, at least on x86. See http://kojevnikov.com/faster-fast-fourier-transform.html
We probably want to add a ff_rdft_calc_sse/avx
(and/or maybe consider another algorithm?).
Change History (6)
comment:1 by , 10 years ago
Priority: | normal → wish |
---|
comment:2 by , 10 years ago
Cc: | added |
---|
follow-up: 5 comment:3 by , 10 years ago
comment:4 by , 10 years ago
The blog post references https://github.com/alexkay/fft-bench which works with a few adjustments.
The benchmark was using an i7 920 (no AVX). On an AVX capable CPU (i7-3520M) I obtained this:
lavc/i 9 56 fftw/i 9 53 fftw/o 9 51 djbf/i 9 88 lavc/i 10 59 fftw/i 10 52 fftw/o 10 52 djbf/i 10 98 lavc/i 11 60 fftw/i 11 50 fftw/o 11 53 djbf/i 11 106 lavc/i 12 66 fftw/i 12 71 fftw/o 12 56 djbf/i 12 115 lavc/i 13 81 fftw/i 13 71 fftw/o 13 65 djbf/i 13 126
I started having a look but no ASM is yet written, only the C boilerplate to enable the potential optimized code.
comment:5 by , 10 years ago
Replying to kurosu:
Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).
It tests for nbits from 9 to 13, so from 512 to 8192. It sounds pretty appropriate.
Note: we use the RDFT code in area similar to what spek does; see showspectrum filter.
Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).