d@0: d@0:
d@0:d@0: d@0: Previous: SIMD alignment and fftw_malloc, d@0: Up: Data Alignment d@0:
On the Pentium and subsequent x86 processors, there is a substantial d@0: performance penalty if double-precision variables are not stored d@0: 8-byte aligned; a factor of two or more is not unusual. d@0: Unfortunately, the stack (the place that local variables and d@0: subroutine arguments live) is not guaranteed by the Intel ABI to be d@0: 8-byte aligned. d@0: d@0:
Recent versions of gcc
(as well as most other compilers, we are
d@0: told, such as Intel's, Metrowerks', and Microsoft's) are able to keep
d@0: the stack 8-byte aligned; gcc
does this by default (see
d@0: -mpreferred-stack-boundary
in the gcc
documentation).
d@0: If you are not certain whether your compiler maintains stack alignment
d@0: by default, it is a good idea to make sure.
d@0:
d@0:
Unfortunately, gcc
only preserves the stack
d@0: alignment—as a result, if the stack starts off misaligned, it will
d@0: always be misaligned, with a disastrous effect on performance (in
d@0: double precision). To prevent this, FFTW includes hacks to align its
d@0: own stack if necessary, so it should perform well even if you call it
d@0: from a program with a misaligned stack. Currently, our hacks support
d@0: gcc
and the Intel C compiler; if you use another compiler you
d@0: are on your own. Fortunately, recent versions of glibc (on GNU/Linux)
d@0: provide a properly-aligned starting stack, but this was not the case
d@0: with a number of older versions, and we are not certain of the
d@0: situation on other operating systems. Hopefully, as time goes by this
d@0: will become less of a concern.
d@0:
d@0:
d@0:
d@0: