sv-dependency-builds: src/fftw-3.3.3/doc/other.texi annotate

annotate src/fftw-3.3.3/doc/other.texi @ 54:5f67a29f0fc7

Rebuild MAD with 64-bit FPM

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Wed, 30 Nov 2016 20:59:17 +0000
parents	37bf6b4a2645
children

rev	line source
Chris@10	1 @node Other Important Topics, FFTW Reference, Tutorial, Top
Chris@10	2 @chapter Other Important Topics
Chris@10	3 @menu
Chris@10	4 * SIMD alignment and fftw_malloc::
Chris@10	5 * Multi-dimensional Array Format::
Chris@10	6 * Words of Wisdom-Saving Plans::
Chris@10	7 * Caveats in Using Wisdom::
Chris@10	8 @end menu
Chris@10	9
Chris@10	10 @c ------------------------------------------------------------
Chris@10	11 @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics
Chris@10	12 @section SIMD alignment and fftw_malloc
Chris@10	13
Chris@10	14 SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of
Chris@10	15 special operations supported by some processors to perform a single
Chris@10	16 operation on several numbers (usually 2 or 4) simultaneously. SIMD
Chris@10	17 floating-point instructions are available on several popular CPUs:
Chris@10	18 SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision)
Chris@10	19 on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single
Chris@10	20 (currently only in FFTW 3.2.x). FFTW can be compiled to support the
Chris@10	21 SIMD instructions on any of these systems.
Chris@10	22 @cindex SIMD
Chris@10	23 @cindex SSE
Chris@10	24 @cindex SSE2
Chris@10	25 @cindex AVX
Chris@10	26 @cindex AltiVec
Chris@10	27 @cindex MIPS PS
Chris@10	28 @cindex precision
Chris@10	29
Chris@10	30
Chris@10	31 A program linking to an FFTW library compiled with SIMD support can
Chris@10	32 obtain a nonnegligible speedup for most complex and r2c/c2r
Chris@10	33 transforms. In order to obtain this speedup, however, the arrays of
Chris@10	34 complex (or real) data passed to FFTW must be specially aligned in
Chris@10	35 memory (typically 16-byte aligned), and often this alignment is more
Chris@10	36 stringent than that provided by the usual @code{malloc} (etc.)
Chris@10	37 allocation routines.
Chris@10	38
Chris@10	39 @cindex portability
Chris@10	40 In order to guarantee proper alignment for SIMD, therefore, in case
Chris@10	41 your program is ever linked against a SIMD-using FFTW, we recommend
Chris@10	42 allocating your transform data with @code{fftw_malloc} and
Chris@10	43 de-allocating it with @code{fftw_free}.
Chris@10	44 @findex fftw_malloc
Chris@10	45 @findex fftw_free
Chris@10	46 These have exactly the same interface and behavior as
Chris@10	47 @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure
Chris@10	48 that the returned pointer has the necessary alignment (by calling
Chris@10	49 @code{memalign} or its equivalent on your OS).
Chris@10	50
Chris@10	51 You are not @emph{required} to use @code{fftw_malloc}. You can
Chris@10	52 allocate your data in any way that you like, from @code{malloc} to
Chris@10	53 @code{new} (in C++) to a fixed-size array declaration. If the array
Chris@10	54 happens not to be properly aligned, FFTW will not use the SIMD
Chris@10	55 extensions.
Chris@10	56 @cindex C++
Chris@10	57
Chris@10	58 @findex fftw_alloc_real
Chris@10	59 @findex fftw_alloc_complex
Chris@10	60 Since @code{fftw_malloc} only ever needs to be used for real and
Chris@10	61 complex arrays, we provide two convenient wrapper routines
Chris@10	62 @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are
Chris@10	63 equivalent to @code{(double)fftw_malloc(sizeof(double) N)} and
Chris@10	64 @code{(fftw_complex)fftw_malloc(sizeof(fftw_complex) N)},
Chris@10	65 respectively (or their equivalents in other precisions).
Chris@10	66
Chris@10	67 @c ------------------------------------------------------------
Chris@10	68 @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics
Chris@10	69 @section Multi-dimensional Array Format
Chris@10	70
Chris@10	71 This section describes the format in which multi-dimensional arrays
Chris@10	72 are stored in FFTW. We felt that a detailed discussion of this topic
Chris@10	73 was necessary. Since several different formats are common, this topic
Chris@10	74 is often a source of confusion.
Chris@10	75
Chris@10	76 @menu
Chris@10	77 * Row-major Format::
Chris@10	78 * Column-major Format::
Chris@10	79 * Fixed-size Arrays in C::
Chris@10	80 * Dynamic Arrays in C::
Chris@10	81 * Dynamic Arrays in C-The Wrong Way::
Chris@10	82 @end menu
Chris@10	83
Chris@10	84 @c =========>
Chris@10	85 @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
Chris@10	86 @subsection Row-major Format
Chris@10	87 @cindex row-major
Chris@10	88
Chris@10	89 The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera
Chris@10	90 are expected to be stored as a single contiguous block in
Chris@10	91 @dfn{row-major} order (sometimes called ``C order''). Basically, this
Chris@10	92 means that as you step through adjacent memory locations, the first
Chris@10	93 dimension's index varies most slowly and the last dimension's index
Chris@10	94 varies most quickly.
Chris@10	95
Chris@10	96 To be more explicit, let us consider an array of rank @math{d} whose
Chris@10	97 dimensions are @ndims{}. Now, we specify a location in the array by a
Chris@10	98 sequence of @math{d} (zero-based) indices, one for each dimension:
Chris@10	99 @tex
Chris@10	100 $(i_0, i_1, i_2, \ldots, i_{d-1})$.
Chris@10	101 @end tex
Chris@10	102 @ifinfo
Chris@10	103 (i[0], i[1], ..., i[d-1]).
Chris@10	104 @end ifinfo
Chris@10	105 @html
Chris@10	106 (i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>).
Chris@10	107 @end html
Chris@10	108 If the array is stored in row-major
Chris@10	109 order, then this element is located at the position
Chris@10	110 @tex
Chris@10	111 $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$.
Chris@10	112 @end tex
Chris@10	113 @ifinfo
Chris@10	114 i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
Chris@10	115 @end ifinfo
Chris@10	116 @html
Chris@10	117 i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)).
Chris@10	118 @end html
Chris@10	119
Chris@10	120 Note that, for the ordinary complex DFT, each element of the array
Chris@10	121 must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of
Chris@10	122 (double-precision) numbers.
Chris@10	123
Chris@10	124 In the advanced FFTW interface, the physical dimensions @math{n} from
Chris@10	125 which the indices are computed can be different from (larger than)
Chris@10	126 the logical dimensions of the transform to be computed, in order to
Chris@10	127 transform a subset of a larger array.
Chris@10	128 @cindex advanced interface
Chris@10	129 Note also that, in the advanced interface, the expression above is
Chris@10	130 multiplied by a @dfn{stride} to get the actual array index---this is
Chris@10	131 useful in situations where each element of the multi-dimensional array
Chris@10	132 is actually a data structure (or another array), and you just want to
Chris@10	133 transform a single field. In the basic interface, however, the stride
Chris@10	134 is 1.
Chris@10	135 @cindex stride
Chris@10	136
Chris@10	137 @c =========>
Chris@10	138 @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format
Chris@10	139 @subsection Column-major Format
Chris@10	140 @cindex column-major
Chris@10	141
Chris@10	142 Readers from the Fortran world are used to arrays stored in
Chris@10	143 @dfn{column-major} order (sometimes called ``Fortran order''). This is
Chris@10	144 essentially the exact opposite of row-major order in that, here, the
Chris@10	145 @emph{first} dimension's index varies most quickly.
Chris@10	146
Chris@10	147 If you have an array stored in column-major order and wish to
Chris@10	148 transform it using FFTW, it is quite easy to do. When creating the
Chris@10	149 plan, simply pass the dimensions of the array to the planner in
Chris@10	150 @emph{reverse order}. For example, if your array is a rank three
Chris@10	151 @code{N x M x L} matrix in column-major order, you should pass the
Chris@10	152 dimensions of the array as if it were an @code{L x M x N} matrix
Chris@10	153 (which it is, from the perspective of FFTW). This is done for you
Chris@10	154 @emph{automatically} by the FFTW legacy-Fortran interface
Chris@10	155 (@pxref{Calling FFTW from Legacy Fortran}), but you must do it
Chris@10	156 manually with the modern Fortran interface (@pxref{Reversing array
Chris@10	157 dimensions}).
Chris@10	158 @cindex Fortran interface
Chris@10	159
Chris@10	160 @c =========>
Chris@10	161 @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
Chris@10	162 @subsection Fixed-size Arrays in C
Chris@10	163 @cindex C multi-dimensional arrays
Chris@10	164
Chris@10	165 A multi-dimensional array whose size is declared at compile time in C
Chris@10	166 is @emph{already} in row-major order. You don't have to do anything
Chris@10	167 special to transform it. For example:
Chris@10	168
Chris@10	169 @example
Chris@10	170 @{
Chris@10	171 fftw_complex data[N0][N1][N2];
Chris@10	172 fftw_plan plan;
Chris@10	173 ...
Chris@10	174 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
Chris@10	175 FFTW_FORWARD, FFTW_ESTIMATE);
Chris@10	176 ...
Chris@10	177 @}
Chris@10	178 @end example
Chris@10	179
Chris@10	180 This will plan a 3d in-place transform of size @code{N0 x N1 x N2}.
Chris@10	181 Notice how we took the address of the zero-th element to pass to the
Chris@10	182 planner (we could also have used a typecast).
Chris@10	183
Chris@10	184 However, we tend to @emph{discourage} users from declaring their
Chris@10	185 arrays in this way, for two reasons. First, this allocates the array
Chris@10	186 on the stack (``automatic'' storage), which has a very limited size on
Chris@10	187 most operating systems (declaring an array with more than a few
Chris@10	188 thousand elements will often cause a crash). (You can get around this
Chris@10	189 limitation on many systems by declaring the array as
Chris@10	190 @code{static} and/or global, but that has its own drawbacks.)
Chris@10	191 Second, it may not optimally align the array for use with a SIMD
Chris@10	192 FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend
Chris@10	193 using @code{fftw_malloc}, as described below.
Chris@10	194
Chris@10	195 @c =========>
Chris@10	196 @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format
Chris@10	197 @subsection Dynamic Arrays in C
Chris@10	198
Chris@10	199 We recommend allocating most arrays dynamically, with
Chris@10	200 @code{fftw_malloc}. This isn't too hard to do, although it is not as
Chris@10	201 straightforward for multi-dimensional arrays as it is for
Chris@10	202 one-dimensional arrays.
Chris@10	203
Chris@10	204 Creating the array is simple: using a dynamic-allocation routine like
Chris@10	205 @code{fftw_malloc}, allocate an array big enough to store N
Chris@10	206 @code{fftw_complex} values (for a complex DFT), where N is the product
Chris@10	207 of the sizes of the array dimensions (i.e. the total number of complex
Chris@10	208 values in the array). For example, here is code to allocate a
Chris@10	209 @threedims{5,12,27} rank-3 array:
Chris@10	210 @findex fftw_malloc
Chris@10	211
Chris@10	212 @example
Chris@10	213 fftw_complex *an_array;
Chris@10	214 an_array = (fftw_complex) fftw_malloc(51227 sizeof(fftw_complex));
Chris@10	215 @end example
Chris@10	216
Chris@10	217 Accessing the array elements, however, is more tricky---you can't
Chris@10	218 simply use multiple applications of the @samp{[]} operator like you
Chris@10	219 could for fixed-size arrays. Instead, you have to explicitly compute
Chris@10	220 the offset into the array using the formula given earlier for
Chris@10	221 row-major arrays. For example, to reference the @math{(i,j,k)}-th
Chris@10	222 element of the array allocated above, you would use the expression
Chris@10	223 @code{an_array[k + 27 * (j + 12 * i)]}.
Chris@10	224
Chris@10	225 This pain can be alleviated somewhat by defining appropriate macros,
Chris@10	226 or, in C++, creating a class and overloading the @samp{()} operator.
Chris@10	227 The recent C99 standard provides a way to reinterpret the dynamic
Chris@10	228 array as a ``variable-length'' multi-dimensional array amenable to
Chris@10	229 @samp{[]}, but this feature is not yet widely supported by compilers.
Chris@10	230 @cindex C99
Chris@10	231 @cindex C++
Chris@10	232
Chris@10	233 @c =========>
Chris@10	234 @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format
Chris@10	235 @subsection Dynamic Arrays in C---The Wrong Way
Chris@10	236
Chris@10	237 A different method for allocating multi-dimensional arrays in C is
Chris@10	238 often suggested that is incompatible with FFTW: @emph{using it will
Chris@10	239 cause FFTW to die a painful death}. We discuss the technique here,
Chris@10	240 however, because it is so commonly known and used. This method is to
Chris@10	241 create arrays of pointers of arrays of pointers of @dots{}etcetera.
Chris@10	242 For example, the analogue in this method to the example above is:
Chris@10	243
Chris@10	244 @example
Chris@10	245 int i,j;
Chris@10	246 fftw_complex **a_bad_array; / @r{another way to make a 5x12x27 array} */
Chris@10	247
Chris@10	248 a_bad_array = (fftw_complex **) malloc(5 sizeof(fftw_complex **));
Chris@10	249 for (i = 0; i < 5; ++i) @{
Chris@10	250 a_bad_array[i] =
Chris@10	251 (fftw_complex *) malloc(12 sizeof(fftw_complex *));
Chris@10	252 for (j = 0; j < 12; ++j)
Chris@10	253 a_bad_array[i][j] =
Chris@10	254 (fftw_complex ) malloc(27 sizeof(fftw_complex));
Chris@10	255 @}
Chris@10	256 @end example
Chris@10	257
Chris@10	258 As you can see, this sort of array is inconvenient to allocate (and
Chris@10	259 deallocate). On the other hand, it has the advantage that the
Chris@10	260 @math{(i,j,k)}-th element can be referenced simply by
Chris@10	261 @code{a_bad_array[i][j][k]}.
Chris@10	262
Chris@10	263 If you like this technique and want to maximize convenience in accessing
Chris@10	264 the array, but still want to pass the array to FFTW, you can use a
Chris@10	265 hybrid method. Allocate the array as one contiguous block, but also
Chris@10	266 declare an array of arrays of pointers that point to appropriate places
Chris@10	267 in the block. That sort of trick is beyond the scope of this
Chris@10	268 documentation; for more information on multi-dimensional arrays in C,
Chris@10	269 see the @code{comp.lang.c}
Chris@10	270 @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}.
Chris@10	271
Chris@10	272 @c ------------------------------------------------------------
Chris@10	273 @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics
Chris@10	274 @section Words of Wisdom---Saving Plans
Chris@10	275 @cindex wisdom
Chris@10	276 @cindex saving plans to disk
Chris@10	277
Chris@10	278 FFTW implements a method for saving plans to disk and restoring them.
Chris@10	279 In fact, what FFTW does is more general than just saving and loading
Chris@10	280 plans. The mechanism is called @dfn{wisdom}. Here, we describe
Chris@10	281 this feature at a high level. @xref{FFTW Reference}, for a less casual
Chris@10	282 but more complete discussion of how to use wisdom in FFTW.
Chris@10	283
Chris@10	284 Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or
Chris@10	285 @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance,
Chris@10	286 but may require a long time to compute because FFTW must measure the
Chris@10	287 runtime of many possible plans and select the best one. This setup is
Chris@10	288 designed for the situations where so many transforms of the same size
Chris@10	289 must be computed that the start-up time is irrelevant. For short
Chris@10	290 initialization times, but slower transforms, we have provided
Chris@10	291 @code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the
Chris@10	292 best of both worlds: you compute a good plan once, save it to
Chris@10	293 disk, and later reload it as many times as necessary. The wisdom
Chris@10	294 mechanism can actually save and reload many plans at once, not just
Chris@10	295 one.
Chris@10	296 @ctindex FFTW_MEASURE
Chris@10	297 @ctindex FFTW_PATIENT
Chris@10	298 @ctindex FFTW_EXHAUSTIVE
Chris@10	299 @ctindex FFTW_ESTIMATE
Chris@10	300
Chris@10	301
Chris@10	302 Whenever you create a plan, the FFTW planner accumulates wisdom, which
Chris@10	303 is information sufficient to reconstruct the plan. After planning,
Chris@10	304 you can save this information to disk by means of the function:
Chris@10	305 @example
Chris@10	306 int fftw_export_wisdom_to_filename(const char *filename);
Chris@10	307 @end example
Chris@10	308 @findex fftw_export_wisdom_to_filename
Chris@10	309 (This function returns non-zero on success.)
Chris@10	310
Chris@10	311 The next time you run the program, you can restore the wisdom with
Chris@10	312 @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success),
Chris@10	313 and then recreate the plan using the same flags as before.
Chris@10	314 @example
Chris@10	315 int fftw_import_wisdom_from_filename(const char *filename);
Chris@10	316 @end example
Chris@10	317 @findex fftw_import_wisdom_from_filename
Chris@10	318
Chris@10	319 Wisdom is automatically used for any size to which it is applicable, as
Chris@10	320 long as the planner flags are not more ``patient'' than those with which
Chris@10	321 the wisdom was created. For example, wisdom created with
Chris@10	322 @code{FFTW_MEASURE} can be used if you later plan with
Chris@10	323 @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with
Chris@10	324 @code{FFTW_PATIENT}.
Chris@10	325
Chris@10	326 The @code{wisdom} is cumulative, and is stored in a global, private
Chris@10	327 data structure managed internally by FFTW. The storage space required
Chris@10	328 is minimal, proportional to the logarithm of the sizes the wisdom was
Chris@10	329 generated from. If memory usage is a concern, however, the wisdom can
Chris@10	330 be forgotten and its associated memory freed by calling:
Chris@10	331 @example
Chris@10	332 void fftw_forget_wisdom(void);
Chris@10	333 @end example
Chris@10	334 @findex fftw_forget_wisdom
Chris@10	335
Chris@10	336 Wisdom can be exported to a file, a string, or any other medium.
Chris@10	337 For details, see @ref{Wisdom}.
Chris@10	338
Chris@10	339 @node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics
Chris@10	340 @section Caveats in Using Wisdom
Chris@10	341 @cindex wisdom, problems with
Chris@10	342
Chris@10	343 @quotation
Chris@10	344 @html
Chris@10	345 <i>
Chris@10	346 @end html
Chris@10	347 For in much wisdom is much grief, and he that increaseth knowledge
Chris@10	348 increaseth sorrow.
Chris@10	349 @html
Chris@10	350 </i>
Chris@10	351 @end html
Chris@10	352 [Ecclesiastes 1:18]
Chris@10	353 @cindex Ecclesiastes
Chris@10	354 @end quotation
Chris@10	355 @iftex
Chris@10	356 @medskip
Chris@10	357 @end iftex
Chris@10	358
Chris@10	359 @cindex portability
Chris@10	360 There are pitfalls to using wisdom, in that it can negate FFTW's
Chris@10	361 ability to adapt to changing hardware and other conditions. For
Chris@10	362 example, it would be perfectly possible to export wisdom from a
Chris@10	363 program running on one processor and import it into a program running
Chris@10	364 on another processor. Doing so, however, would mean that the second
Chris@10	365 program would use plans optimized for the first processor, instead of
Chris@10	366 the one it is running on.
Chris@10	367
Chris@10	368 It should be safe to reuse wisdom as long as the hardware and program
Chris@10	369 binaries remain unchanged. (Actually, the optimal plan may change even
Chris@10	370 between runs of the same binary on identical hardware, due to
Chris@10	371 differences in the virtual memory environment, etcetera. Users
Chris@10	372 seriously interested in performance should worry about this problem,
Chris@10	373 too.) It is likely that, if the same wisdom is used for two
Chris@10	374 different program binaries, even running on the same machine, the
Chris@10	375 plans may be sub-optimal because of differing code alignments. It is
Chris@10	376 therefore wise to recreate wisdom every time an application is
Chris@10	377 recompiled. The more the underlying hardware and software changes
Chris@10	378 between the creation of wisdom and its use, the greater grows
Chris@10	379 the risk of sub-optimal plans.
Chris@10	380
Chris@10	381 Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or
Chris@10	382 using possibly-suboptimal wisdom (created on the same machine, but for a
Chris@10	383 different binary), the wisdom is likely to be better. For this reason,
Chris@10	384 we provide a function to import wisdom from a standard system-wide
Chris@10	385 location (@code{/etc/fftw/wisdom} on Unix):
Chris@10	386 @cindex wisdom, system-wide
Chris@10	387
Chris@10	388 @example
Chris@10	389 int fftw_import_system_wisdom(void);
Chris@10	390 @end example
Chris@10	391 @findex fftw_import_system_wisdom
Chris@10	392
Chris@10	393 FFTW also provides a standalone program, @code{fftw-wisdom} (described
Chris@10	394 by its own @code{man} page on Unix) with which users can create wisdom,
Chris@10	395 e.g. for a canonical set of sizes to store in the system wisdom file.
Chris@10	396 @xref{Wisdom Utilities}.
Chris@10	397 @cindex fftw-wisdom utility
Chris@10	398

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.3/doc/other.texi @ 54:5f67a29f0fc7