cannam@127: TODO before FFTW-$2\pi$: cannam@127: cannam@127: * figure out how to autodetect NEON at runtime cannam@127: cannam@127: * figure out the arm cycle counter business cannam@127: cannam@127: * Wisdom: make it clear that it is specific to the exact fftw version cannam@127: and configuration. Report error codes when reading wisdom. Maybe cannam@127: have multiple system wisdom files, one per version? cannam@127: cannam@127: * DCT/DST codelets? which kinds? cannam@127: cannam@127: * investigate the addition-chain trig computation cannam@127: cannam@127: * I can't believe that there isn't a closed form for the omega cannam@127: array in Rader. cannam@127: cannam@127: * convolution problem type(s) cannam@127: cannam@127: * Explore the idea of having n < 0 in tensors, possibly to mean cannam@127: inverse DFT. cannam@127: cannam@127: * better estimator: possibly, let "other" cost be coef * n, where cannam@127: coef is a per-solver constant determined via some big numerical cannam@127: optimization/fit. cannam@127: cannam@127: * vector radix, multidimensional codelets cannam@127: cannam@127: * it may be a good idea to unify all those little loops that do cannam@127: copying, (X[i], X[n-i]) <- (X[i] + X[n-i], X[i] - X[n-i]), cannam@127: and multiplication of vectors by twiddle factors. cannam@127: cannam@127: * Pruned FFTs (basically, a vecloop that skips zeros). cannam@127: cannam@127: * Try FFTPACK-style back-and-forth (Stockham) FFT. (We tried this a cannam@127: few years ago and it was slower, but perhaps matters have changed.) cannam@127: cannam@127: * Generate assembly directly for more processors, or maybe fork gcc. =) cannam@127: cannam@127: * ensure that threaded solvers generate (block_size % 4 == 0) cannam@127: to allow SIMD to be used. cannam@127: cannam@127: * memoize triggen.