comparison man/man1/genwav.1 @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:5242703e91d3
1 .TH GENWAV 1 "11 May 1995"
2 .LP
3 .SH NAME
4 .LP
5 genwav \- display the wave in filename.
6 .LP
7 .SH SYNOPSIS
8 .LP
9 genwav [ option=value | -option ] [ filename ]
10 .LP
11 .SH DESCRIPTION
12 .LP
13
14 Genwav sets up and Xwindow and displays a segment of the input wave
15 in the window. The size of the window and the size of the wave are
16 determined by options, as are a number of other input/output
17 functions. These options have no direct bearing on the auditory
18 processing performed by AIM. For convenience, these Non-Auditory
19 options are associated with the instruction genwav (the one
20 non-auditory instruction), and they are listed at the top of the
21 options tables prior to the auditory options.
22
23 .LP
24 There are three classes of Non-Auditory options:
25 .LP
26 I) DISPLAY OPTIONS that determine the format of the auditory representations
27 of sound on the screen, or on paper when printed.
28 .LP
29 II) OUTPUT OPTIONS that determine the format and content of files used
30 to store the auditory representations of sounds.
31 .LP
32 III) INPUT OPTIONS that determine how the wave in the input file should
33 be interpreted.
34 .LP
35 The output options are presented before the input options so that the
36 input options will be adjacent to the filterbank options in the
37 options tables produced by genbmm and subsequent instructions.
38
39 .SS
40 I. DISPLAY OPTIONS
41 .LP
42
43 The AIM modules produce output in the form of a set of functions, one
44 for each channel of the auditory filterbank. For example, the output
45 of genbmm is a set of functions that simulate basilar membrane motion
46 produced in response to the input wave. By default, the AIM software
47 puts an Xwindow up on the computer screen and displays the output in
48 the window. This section describes the options that control these
49 displays.
50
51 .LP
52 The display options are: title, display, x0-win, y0-win, width_win,
53 height_win, display, view, top, bottom, overlap, headroom,
54 magnification, pensize, hiddenline.
55 .LP
56 A. The Display Window Title, Position, and Size
57 .RS 3
58
59 .LP
60 title Title of output display.
61 .RS 5
62 Character string. Default: input file name.
63 .RE
64 .LP
65 The title of the output being displayed. If no title is given, the
66 display bears the name of the file of the input wave.
67
68 .LP
69 display Display output on screen
70 .RS 5
71 Switch. Default: on.
72 .RE
73 .LP
74
75 Normally this switch is on and a bitmap of the output is displayed in
76 a graphical window on the computer screen. The switch is provided
77 because the time taken to create the displays is considerable, and it
78 is useful to turn it dsiplay off using AIM as a preprocessor for
79 speech recognition.
80
81 .LP
82 x0_win Left edge of window
83 .RS 5
84 Unit: pixels. Default: centre.
85 .RE
86 .LP
87 The left edge of the window into which the display will be drawn,
88 relative to the left edge of the screen (i.e. the x-coordinate of the
89 window within the screen). A value of centre will cause centring in
90 the horizontal dimension (provided the window manager does not
91 override).
92 .LP
93 y0-win Lower edge of window
94 .RS 5
95 Unit: pixels. Default: centre.
96 .RE
97 .LP
98 The lower edge of the window into which the display will be drawn,
99 relative to the lower edge of the screen (i.e. the y-coordinate of the
100 window within the screen). A value of centre will cause centring in
101 the vertical dimension (provided the window manager does not
102 override).
103 .LP
104 Taken as a pair x0_win and y0-win determine the origin of the window,
105 relative to the screen origin which is assumed to be the lower left
106 corner of the screen.
107 .LP
108 width_win Window width
109 .RS 5
110 Unit: pixels. Default: 640.
111 .RE
112 .LP
113 The width of the window into which the display will be drawn.
114 .LP
115 height_win Window height
116 .RS 6
117 Unit: pixels. Default: 480.
118 .RE
119 .LP
120 The height of the window into which the display will be drawn.
121 .RE
122
123
124 .LP
125 B. Display Controls
126 .RS 3
127 .LP
128 top The largest postive value visible in the display
129 .LP
130 Scalar. Default value: 1024 (for genwav)
131 .LP
132 Each of the functions in the multi-channel output of a module is
133 displayed in a transparent window. Provided the channel density is not
134 too low, the functions are related and the set of functions produces a
135 display that looks like a complex landscape. Top determines the
136 largest positive value that will appear in the transparent windows of
137 the individual functions, so top must be as large as the largest value
138 in the full set of functions. Increasing top has the effect of moving
139 the viewer farther up above the landscape.
140 .LP
141 bottom The largest negative value visible in the
142 .RS 5
143 display
144 .RE
145 .RS 5
146 Scalar. Default value: -1024 (for genwav)
147 .RE
148 .LP
149 Bottom determines the largest negative value that will appear in the
150 transparent windows of the individual functions, so bottom must be as
151 large in the negative direction as the largest negative value in the
152 full set of functions. Increasing bottom in the negative direction has
153 the effect of depeening the valleys in the landscape.
154 .LP
155 overlap The overlap of transparent windows of the
156 .RS 5
157 individual functions
158 .RE
159 .RS 5
160 Scalar: percentage. Default value: 50%
161 .RE
162 .LP
163 The fact that the output functions are related means that they
164 fit up under each other in the display in a way that concentrates the
165 lines on the landscape and improves the display.
166 .LP
167 headroom Display with headroom for the uppermost channel
168 .RS 5
169 Scalar: percentage. Default value: 0%
170 .RE
171 .LP
172 Because of the overlap of the transparent windows, part of the
173 uppermost transparent window is hidden by the upper edge of the
174 display window. This can cause truncation of the waves in the upper
175 channels. To avoid truncation, headroom enables the user to specify
176 that the highest channel ought to be centred below the upper edge of
177 the window. The value specified is taken to be the percentage of the
178 window between the zero line of the upper channel and the upper edge
179 of the window.
180 .LP
181 magnification Display magnification
182 .RS 9
183 Scalar. Default: 1.0.
184 .RE
185 .LP
186 The degree to which the amplitude of the functions in the display
187 should be magnified before being displayed. This parameter is merely
188 for adjusting the visual contrast of the display. The magnification
189 option is a multiplier, so a value of 1 implies drawing to scale,
190 while a value of 10 implies ten times (10x) the size of values in the
191 module output and 0.1 implies one tenth of the output size.
192 Magnification is related to, but separate from, the gain options which
193 affect the values of the output functions and the values stored in any
194 output files. Magnification is an alternative means of controlling the
195 size of the functions in the display -- alternative to top and bottom.
196 .LP
197 pensize The size of the lines in the displays and the
198 .RS 5
199 dots on the spiral
200 .RE
201 .RS 5
202 Unit: pixels. Default: 1.
203 .RE
204 .LP
205 This option allows the user to specify the thickness of the lines in
206 the display and the size of the dots on spiral auditory images. It
207 also affects the lines and dots in postscript plots. It is provided
208 primarily for use with printers which have much more resolution than
209 computer screens. On laser printers a value of 3-5 gives reasonable
210 line thickness. On the screen, a linewidth greater than 1 produces
211 slow drawing, and a gagged, blurred display.
212 .LP
213 hiddenline Draw with overlapping parts of functions
214 .RS 5
215 hidden
216 .RE
217 .RS 5
218 Switch. Default: on.
219 .RE
220 .LP
221 This switch specifies whether or not a 'hidden line' algorithm should
222 be used when drawing the display. It also affects printed displays.
223 In almost all cases, hiddenline results in more attractive displays of
224 waveforms, and it often makes complex displays easier to understand,
225 so the default is 'on'. Note: hiddenline almost doubles the drawing
226 time so it is sometimes useful to switch it off on slower machines.
227 .LP
228
229 .SS
230 II. OUTPUT OPTIONS
231 .RS 3
232 .LP
233 The output options are listed and described before the input options
234 so that the input options will be adjacent to the filterbank options
235 in the listings produced by genbmm and subsequent modules. The output
236 options are downchannel, erase_ctn, animate_ctn, bitmap_ctn,
237 postscript, output, and header.
238 .LP
239 downchannel Average adjacent channels of multichannel
240 .RS 7
241 representations
242 .RE
243 .RS 7
244 Units: Number of averagings.
245 .RE
246 .RS 7
247 Default value: 0.
248 .RE
249 .LP
250
251 There is interaction between channels in the transmission-line
252 filterbank of the physiological version of AIM, and in the neural
253 encoding of the functional version of AIM. The minimum channel
254 density for these processes to operate properly is four channels per
255 ERB and 2 channels per ERB, respectively. For broadband signals like
256 speech this means that the minimum number of channels is on the order
257 of 128 and 64, respectively. This channel density can produce
258 cluttered displays, and more importantly, it is far too many channels
259 for current speech recognition systems which typically use 12-24
260 channels. This is not just a computer power problem; the recognition
261 systems actually perform less well with extra channels. Accordingly,
262 the option 'downchannel' provides the option of reducing the channel
263 density at output, so that AIM can operate with the appropriate
264 channel density and still provide output that is compatible with
265 displays and speech recognition systems.
266
267 .LP
268 Downchannel averages pairs of adjacent channels and the option value
269 specifies how many times it should execute the averaging process. Each
270 averaging reduces the number of channels by a factor of 2, so for
271 proper transmission-line filtering and an output file with 16
272 channels, set channels_afb=128 and downchannel=3 (three successive
273 halvings of the number of channels).
274
275
276 .LP
277 A. Animated Cartoons
278 .LP
279 .RS 3
280 Four of the AIM instructions produce output in the form of sequences
281 of spectral frames (gensgm, gencgm, genasa and genepn). Bitmap
282 versions of the displays of the frames can be stored by AIM and
283 replayed by review and xreview. When the sequence of frames is played
284 rapidly, it appears as an animated cartoon that shows the dynamic
285 behaviour of the spectrum of the sound.
286 .LP
287 Similarly, the AIM instructions for auditory images (gensai and
288 genspl) produce sequences of landscape frames, and bitmap versions of
289 the landscape displays can also be stored by AIM and replayed by
290 review and xreview. Indeed, it was the desire to produce auditory
291 image cartoons that led to the development of much of the AIM software
292 package. The animated cartoons or auditory images show the dynamic
293 behaviour of features in the images, like the motion of formants in
294 diphthongs and the motion of notes in a melody.
295 .LP
296 This section describes the options that control the construction and
297 storage of sequences of bitmaps; there is a separate manual entries for
298 the xreview routine that replays the bitmaps ( 'manaim xreview').
299
300
301 .LP
302 erase_ctn Erase the current frame before presenting
303 .RS 7
304 the next frame
305 .RE
306 .RS 7
307 Switch. Default value: on.
308 .RE
309 .LP
310
311 Normally, when presenting a sequence of frames as an animated cartoon,
312 one wants to erase the current frame before presenting the next. When
313 the frames are spectra, however, the set of frames can together form a
314 meaningful display; for example, the set of rising spectra produced at
315 the onset of a sound produces a contour map of the onset. The option
316 erase_ctn enables the user to observe the full set of spectra
317 simultaneously. (See aimdemo_gtf_spectra or aimdemo_tlf_spectra ).
318
319 .LP
320 animate_ctn Store frames in memory and replay all of
321 .RS 7
322 them as a cartoon
323 .RE
324 .RS 7
325 Switch. Default value: off.
326 .RE
327 .LP
328 When this option is on, AIM stores the bitmaps of the frames it
329 produces in the memory of the machine and replays them rapidly when
330 the instruction is complete. Type RETURN to animate the cartoon again;
331 type 'q RETURN' to exit the instruction. (This option was important
332 when machines were slower and before the availability of review and
333 xreview. It is now largely obsolete.)
334 .LP
335 bitmap_ctn Store bitmaps of frames in a file for
336 .RS 7
337 replay as a cartoon
338 .RE
339 .RS 7
340 Switch. Default value: off.
341 .RE
342 .LP
343 When this option is on, bitmaps of the frames produced for the input
344 in file_name will be stored in file_name.ctn. The sequence of frames can later be replayed using either
345 .LP
346 > review file_name or
347 .LP
348 > xreview file_name
349 .LP
350 Both of these programs enable the user to vary the rate of animation,
351 the section of the sequence to be view, etc. The xreview version has a
352 window interface with useful information and is the preferred version
353 in most cases.
354 .RE
355
356 .RS 3
357 B. Output Files for Printing and Postprocessing
358
359 .LP
360 Postscript Produce printer-ready output
361 .RS 7
362 Switch. Default value: off.
363 .RE
364 .LP
365 This switch causes AIM to produce a printer-ready version of the
366 displays it presents on the computer screen. For example, the NAP of
367 a 32-ms section of cegc can be printed using
368 .LP
369 > gennap length=32 postscript=on cegc | lpr -Plw
370 .LP
371 where 'lpr' is the Unix printer-driver and the 'lw' of -Plw specifies
372 the destination printer. You may need to check the name of your
373 system's printer driver and laser printer.
374 .LP
375 Alternately the postscript version of the display may be directed to a
376 file using an instruction like
377 .LP
378 > gennap length=32 postscript=on cegc > cegc_nap.ps
379 .LP
380 and printed later at the users convenience. In this example, the file
381 name cegc_nap.ps is not generated by AIM; the '_nap.ps' suffix is
382 added by the user following standard conventions to indicate that the file
383 contains a NAP in postscript form.
384
385 .RS 3
386 .LP
387 THREE POSTSCRIPT CAUTIONS:
388 .LP
389 Postscript files of landscape displays from AIM are very large. As a
390 result, we recommend
391 .LP
392 a) that you NOT switch postscript on without redirecting the output to
393 a file, as it will cause the output to be display on the screen in a
394 seemingly endless display,
395 .LP
396 b) that you be careful NOT to print postscript files on a printer
397 which does not understand the Postscript language, as it can cause the
398 printer to put out an extremely long file, one column per page!
399 .LP
400 c) that you NOT set postscript=on in an options file as it will
401 generate large files in the directory without your noticing.
402 .RE
403
404 .LP
405 output Generate an output file
406 .RS 3
407 Switch. Default value: off.
408 .RE
409 .LP
410 This switch causes the array of functions that defines AIM's
411 simulation of basilar membrane motion, or a neural activity pattern,
412 or an auditory image, to be stored in a file for subsequent processing
413 by the aimtools or other, user defined, operators. By convention, the
414 file is given the same name as the input file, but with a suffix
415 reflecting the entry point, to distinguish it from the input file on
416 the one hand and from other output files on the other hand. The naming
417 system enables the user to construct and store a set of output files
418 for one input file without the need to specify a sequence of file
419 names. The suffixes are those used to identify the modules in the
420 listing produced by 'gen -help'. So, for example, the following
421 command line:
422 .LP
423 > gennap output=on length=32 cegc
424 .LP
425 will produce an output file named cegc.nap containing a multiplexed
426 version of the functions that define the NAP of the first 32 ms of
427 cegc.
428 .LP
429 The spectrographic representations produced by gensgm and gencgm can
430 be stored in the same way, as can the sequences of spectra produced by
431 genasa and genepn. It is the output files of genasa and gencgm that
432 are used to interface AIM with speech recognition systems (Robinson et
433 al., 1990; Patterson et al., 1995; Giguere and Woodland, 1994a).
434 Details of the file formats are presented in docs/aimFileFormat.
435 .LP
436 Header Put a header on the output file
437 .RS 3
438 Flag. Default value: on.
439 .RE
440 .LP
441 By default, a header is prepended to each output file so that
442 subsequent processors have access to the history of the file. Details
443 of the header structure are presented in docs/aimFileFormat.
444 .LP
445 .RE
446
447 .SS
448 III. INPUT OPTIONS
449 .LP
450 The input options enable the user to process a subsection of the input
451 wave, and to specify characterisitcs of the wave.
452 .LP
453 The input options are: input_wave, start_wave, length_wave,
454 samplerate, swap_wave, bits_wave, dB_wave.
455 .LP
456 input_wave Default input wave name
457 .RS 13
458 Filename. Default value: none.
459 .RE
460 .LP
461 The name of the wave file to process. This option permits simple
462 repetitive processing of the same input file without repetitive typing. It
463 also enables one to circumvent the Unix convention of having the filename
464 last on the command line. This option is overridden if the user supplies a
465 wave file name at the end of the command line.
466 .LP
467 start_wave Start point in wave
468 .RS 13
469 Default unit: ms. Default value: 0.
470 .RE
471 .LP
472 The point in the input wave at which processing should begin. The
473 start_wave option is expressed in milliseconds and its default value is the
474 beginning of the file (i.e. 0 ms into the file).
475 .LP
476 length_wave Length of wave
477 .RS 13
478 Default unit: ms. Default value: remainder.
479 .RE
480 .LP
481 The number of milliseconds of the wave that ought to be processed,
482 beyond the start point. The special value 'remainder' indicates that
483 the entire length of the wave from the start point to the end of the
484 file should be processed.
485 .LP
486 samplerate Input wave sample rate
487 .RS 13
488 Default unit: Hertz. Default value: 20,000 Hz.
489 .RE
490 .LP
491 The rate at which the input wave was sampled.
492 .LP
493 swap_wave Swap the bytes in each binary pair of the
494 .RS 13
495 input file
496 .RE
497 .RS 13
498 Switch. Default: off.
499 .RE
500 .LP
501 The order of the bytes in short integers varies between manufacturers.
502 Specifically the order for Sun and HP is opposite that for DEC SGI and
503 IBM. The default setting (off) is for the latter byte order.
504 .LP
505 bits_wave Bits in the input wave
506 .RS 13
507 Unit: bits. Default: 12. (Only alternate: 16.)
508 .RE
509 .LP
510 The number of significant bits in each (16-bit) word of the input
511 wave. Note that gain_gtf or gaim_tlf should be changed to 0.0625 when
512 the number of bits is set to 16 to avoid overflow.
513 .LP
514 dB_wave Scaling of the input wave
515 .RS 13
516 (for physiological route only)
517 .RE
518 .RS 13
519 Units: dB. Default: 60 dB
520 .RE
521 .LP
522 This option enables the user to specify the relative level of
523 the input wave in decibels. It is particularly useful for
524 investigating the level-dependent properties of the
525 physiological version of AIM.
526 .LP
527 The functional route is level-independent and dB_wave is
528 ignored no matter what its value.
529 .LP
530 dB_wave can also be used to scale the input wave in absolute
531 units, i.e sound-pressure level (dB SPL), using the following
532 equation:
533 .LP
534 dB_wave = dBSPL - 20log(RMS/200)
535 .LP
536 where RMS is the root-mean-square amplitude of the input wave,
537 or the portion of the wave or interest, and dBSPL is the
538 desired sound-pressure level scaling (in dB). For
539 example, to scale to 60 dB SPL a wave with an RMS amplitude
540 of 467.3, dB_wave should be set to 52.6.
541 .LP
542 Note: The RMS value of a stored input wave can be calculated using
543 the tools provided with the AIM software.
544
545
546 .LP
547 .RE
548
549 .SH FILES
550 .LP
551 .genwavrc The options file for genwav.
552 .SH SEE ALSO
553 .LP
554 genbmm
555 .SH BUGS
556 .LP
557 .SH COPYRIGHT
558 .LP
559 Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
560 .LP
561 Permission to use, copy, modify, and distribute this software without fee
562 is hereby granted for research purposes, provided that this copyright
563 notice appears in all copies and in all supporting documentation, and that
564 the software is not redistributed for any fee (except for a nominal
565 shipping charge). Anyone wanting to incorporate all or part of this
566 software in a commercial product must obtain a license from the Medical
567 Research Council.
568 .LP
569 The MRC makes no representations about the suitability of this
570 software for any purpose. It is provided "as is" without express or
571 implied warranty.
572 .LP
573 THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
574 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
575 THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
576 OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
577 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
578 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
579 SOFTWARE.
580 .LP
581 .SH ACKNOWLEDGEMENTS
582 .LP
583 The AIM software was developed for Unix workstations by John
584 Holdsworth and Mike Allerhand of the MRC APU, under the direction of
585 Roy Patterson. The physiological version of AIM was developed by
586 Christian Giguere. The options handler is by Paul Manson. The revised
587 SAI module is by Jay Datta. Michael Akeroyd extended the postscript
588 facilites and developed the xreview routine for auditory image
589 cartoons.
590 .LP
591 The project was supported by the MRC and grants from the U.K. Defense
592 Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
593 BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
594