Opened 7 years ago
Last modified 7 years ago
#3921 new enhancement
Faster RDFT
| Reported by: | ubitux | Owned by: | |
|---|---|---|---|
| Priority: | wish | Component: | avcodec |
| Version: | git-master | Keywords: | fft rdft |
| Cc: | alexander@kojevnikov.com | Blocked By: | |
| Blocking: | Reproduced by developer: | no | |
| Analyzed by developer: | no |
Description
Our RDFT code is too slow, at least on x86. See http://kojevnikov.com/faster-fast-fourier-transform.html
We probably want to add a ff_rdft_calc_sse/avx (and/or maybe consider another algorithm?).
Change History (5)
comment:1 Changed 7 years ago by cehoyos
- Priority changed from normal to wish
comment:2 Changed 7 years ago by alexk
- Cc alexander@kojevnikov.com added
comment:3 follow-up: ↓ 5 Changed 7 years ago by kurosu
comment:4 Changed 7 years ago by ubitux
The blog post references https://github.com/alexkay/fft-bench which works with a few adjustments.
The benchmark was using an i7 920 (no AVX). On an AVX capable CPU (i7-3520M) I obtained this:
lavc/i 9 56 fftw/i 9 53 fftw/o 9 51 djbf/i 9 88 lavc/i 10 59 fftw/i 10 52 fftw/o 10 52 djbf/i 10 98 lavc/i 11 60 fftw/i 11 50 fftw/o 11 53 djbf/i 11 106 lavc/i 12 66 fftw/i 12 71 fftw/o 12 56 djbf/i 12 115 lavc/i 13 81 fftw/i 13 71 fftw/o 13 65 djbf/i 13 126
I started having a look but no ASM is yet written, only the C boilerplate to enable the potential optimized code.
comment:5 in reply to: ↑ 3 Changed 7 years ago by ubitux
Replying to kurosu:
Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).
It tests for nbits from 9 to 13, so from 512 to 8192. It sounds pretty appropriate.
Note: we use the RDFT code in area similar to what spek does; see showspectrum filter.



Would ne nice to have a (short) sample program or a FFmpeg command line, ie anything needed to just measure the test case. I haven't looked but isn't the reason for this slowness the use of atypical lengths. IIRC audio codecs may have specifical short windows (max 2048 samples?).