Mercurial > hg > aim92
comparison man/man1/genwav.1 @ 0:5242703e91d3 tip
Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author | tomwalters |
---|---|
date | Fri, 20 May 2011 15:19:45 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5242703e91d3 |
---|---|
1 .TH GENWAV 1 "11 May 1995" | |
2 .LP | |
3 .SH NAME | |
4 .LP | |
5 genwav \- display the wave in filename. | |
6 .LP | |
7 .SH SYNOPSIS | |
8 .LP | |
9 genwav [ option=value | -option ] [ filename ] | |
10 .LP | |
11 .SH DESCRIPTION | |
12 .LP | |
13 | |
14 Genwav sets up and Xwindow and displays a segment of the input wave | |
15 in the window. The size of the window and the size of the wave are | |
16 determined by options, as are a number of other input/output | |
17 functions. These options have no direct bearing on the auditory | |
18 processing performed by AIM. For convenience, these Non-Auditory | |
19 options are associated with the instruction genwav (the one | |
20 non-auditory instruction), and they are listed at the top of the | |
21 options tables prior to the auditory options. | |
22 | |
23 .LP | |
24 There are three classes of Non-Auditory options: | |
25 .LP | |
26 I) DISPLAY OPTIONS that determine the format of the auditory representations | |
27 of sound on the screen, or on paper when printed. | |
28 .LP | |
29 II) OUTPUT OPTIONS that determine the format and content of files used | |
30 to store the auditory representations of sounds. | |
31 .LP | |
32 III) INPUT OPTIONS that determine how the wave in the input file should | |
33 be interpreted. | |
34 .LP | |
35 The output options are presented before the input options so that the | |
36 input options will be adjacent to the filterbank options in the | |
37 options tables produced by genbmm and subsequent instructions. | |
38 | |
39 .SS | |
40 I. DISPLAY OPTIONS | |
41 .LP | |
42 | |
43 The AIM modules produce output in the form of a set of functions, one | |
44 for each channel of the auditory filterbank. For example, the output | |
45 of genbmm is a set of functions that simulate basilar membrane motion | |
46 produced in response to the input wave. By default, the AIM software | |
47 puts an Xwindow up on the computer screen and displays the output in | |
48 the window. This section describes the options that control these | |
49 displays. | |
50 | |
51 .LP | |
52 The display options are: title, display, x0-win, y0-win, width_win, | |
53 height_win, display, view, top, bottom, overlap, headroom, | |
54 magnification, pensize, hiddenline. | |
55 .LP | |
56 A. The Display Window Title, Position, and Size | |
57 .RS 3 | |
58 | |
59 .LP | |
60 title Title of output display. | |
61 .RS 5 | |
62 Character string. Default: input file name. | |
63 .RE | |
64 .LP | |
65 The title of the output being displayed. If no title is given, the | |
66 display bears the name of the file of the input wave. | |
67 | |
68 .LP | |
69 display Display output on screen | |
70 .RS 5 | |
71 Switch. Default: on. | |
72 .RE | |
73 .LP | |
74 | |
75 Normally this switch is on and a bitmap of the output is displayed in | |
76 a graphical window on the computer screen. The switch is provided | |
77 because the time taken to create the displays is considerable, and it | |
78 is useful to turn it dsiplay off using AIM as a preprocessor for | |
79 speech recognition. | |
80 | |
81 .LP | |
82 x0_win Left edge of window | |
83 .RS 5 | |
84 Unit: pixels. Default: centre. | |
85 .RE | |
86 .LP | |
87 The left edge of the window into which the display will be drawn, | |
88 relative to the left edge of the screen (i.e. the x-coordinate of the | |
89 window within the screen). A value of centre will cause centring in | |
90 the horizontal dimension (provided the window manager does not | |
91 override). | |
92 .LP | |
93 y0-win Lower edge of window | |
94 .RS 5 | |
95 Unit: pixels. Default: centre. | |
96 .RE | |
97 .LP | |
98 The lower edge of the window into which the display will be drawn, | |
99 relative to the lower edge of the screen (i.e. the y-coordinate of the | |
100 window within the screen). A value of centre will cause centring in | |
101 the vertical dimension (provided the window manager does not | |
102 override). | |
103 .LP | |
104 Taken as a pair x0_win and y0-win determine the origin of the window, | |
105 relative to the screen origin which is assumed to be the lower left | |
106 corner of the screen. | |
107 .LP | |
108 width_win Window width | |
109 .RS 5 | |
110 Unit: pixels. Default: 640. | |
111 .RE | |
112 .LP | |
113 The width of the window into which the display will be drawn. | |
114 .LP | |
115 height_win Window height | |
116 .RS 6 | |
117 Unit: pixels. Default: 480. | |
118 .RE | |
119 .LP | |
120 The height of the window into which the display will be drawn. | |
121 .RE | |
122 | |
123 | |
124 .LP | |
125 B. Display Controls | |
126 .RS 3 | |
127 .LP | |
128 top The largest postive value visible in the display | |
129 .LP | |
130 Scalar. Default value: 1024 (for genwav) | |
131 .LP | |
132 Each of the functions in the multi-channel output of a module is | |
133 displayed in a transparent window. Provided the channel density is not | |
134 too low, the functions are related and the set of functions produces a | |
135 display that looks like a complex landscape. Top determines the | |
136 largest positive value that will appear in the transparent windows of | |
137 the individual functions, so top must be as large as the largest value | |
138 in the full set of functions. Increasing top has the effect of moving | |
139 the viewer farther up above the landscape. | |
140 .LP | |
141 bottom The largest negative value visible in the | |
142 .RS 5 | |
143 display | |
144 .RE | |
145 .RS 5 | |
146 Scalar. Default value: -1024 (for genwav) | |
147 .RE | |
148 .LP | |
149 Bottom determines the largest negative value that will appear in the | |
150 transparent windows of the individual functions, so bottom must be as | |
151 large in the negative direction as the largest negative value in the | |
152 full set of functions. Increasing bottom in the negative direction has | |
153 the effect of depeening the valleys in the landscape. | |
154 .LP | |
155 overlap The overlap of transparent windows of the | |
156 .RS 5 | |
157 individual functions | |
158 .RE | |
159 .RS 5 | |
160 Scalar: percentage. Default value: 50% | |
161 .RE | |
162 .LP | |
163 The fact that the output functions are related means that they | |
164 fit up under each other in the display in a way that concentrates the | |
165 lines on the landscape and improves the display. | |
166 .LP | |
167 headroom Display with headroom for the uppermost channel | |
168 .RS 5 | |
169 Scalar: percentage. Default value: 0% | |
170 .RE | |
171 .LP | |
172 Because of the overlap of the transparent windows, part of the | |
173 uppermost transparent window is hidden by the upper edge of the | |
174 display window. This can cause truncation of the waves in the upper | |
175 channels. To avoid truncation, headroom enables the user to specify | |
176 that the highest channel ought to be centred below the upper edge of | |
177 the window. The value specified is taken to be the percentage of the | |
178 window between the zero line of the upper channel and the upper edge | |
179 of the window. | |
180 .LP | |
181 magnification Display magnification | |
182 .RS 9 | |
183 Scalar. Default: 1.0. | |
184 .RE | |
185 .LP | |
186 The degree to which the amplitude of the functions in the display | |
187 should be magnified before being displayed. This parameter is merely | |
188 for adjusting the visual contrast of the display. The magnification | |
189 option is a multiplier, so a value of 1 implies drawing to scale, | |
190 while a value of 10 implies ten times (10x) the size of values in the | |
191 module output and 0.1 implies one tenth of the output size. | |
192 Magnification is related to, but separate from, the gain options which | |
193 affect the values of the output functions and the values stored in any | |
194 output files. Magnification is an alternative means of controlling the | |
195 size of the functions in the display -- alternative to top and bottom. | |
196 .LP | |
197 pensize The size of the lines in the displays and the | |
198 .RS 5 | |
199 dots on the spiral | |
200 .RE | |
201 .RS 5 | |
202 Unit: pixels. Default: 1. | |
203 .RE | |
204 .LP | |
205 This option allows the user to specify the thickness of the lines in | |
206 the display and the size of the dots on spiral auditory images. It | |
207 also affects the lines and dots in postscript plots. It is provided | |
208 primarily for use with printers which have much more resolution than | |
209 computer screens. On laser printers a value of 3-5 gives reasonable | |
210 line thickness. On the screen, a linewidth greater than 1 produces | |
211 slow drawing, and a gagged, blurred display. | |
212 .LP | |
213 hiddenline Draw with overlapping parts of functions | |
214 .RS 5 | |
215 hidden | |
216 .RE | |
217 .RS 5 | |
218 Switch. Default: on. | |
219 .RE | |
220 .LP | |
221 This switch specifies whether or not a 'hidden line' algorithm should | |
222 be used when drawing the display. It also affects printed displays. | |
223 In almost all cases, hiddenline results in more attractive displays of | |
224 waveforms, and it often makes complex displays easier to understand, | |
225 so the default is 'on'. Note: hiddenline almost doubles the drawing | |
226 time so it is sometimes useful to switch it off on slower machines. | |
227 .LP | |
228 | |
229 .SS | |
230 II. OUTPUT OPTIONS | |
231 .RS 3 | |
232 .LP | |
233 The output options are listed and described before the input options | |
234 so that the input options will be adjacent to the filterbank options | |
235 in the listings produced by genbmm and subsequent modules. The output | |
236 options are downchannel, erase_ctn, animate_ctn, bitmap_ctn, | |
237 postscript, output, and header. | |
238 .LP | |
239 downchannel Average adjacent channels of multichannel | |
240 .RS 7 | |
241 representations | |
242 .RE | |
243 .RS 7 | |
244 Units: Number of averagings. | |
245 .RE | |
246 .RS 7 | |
247 Default value: 0. | |
248 .RE | |
249 .LP | |
250 | |
251 There is interaction between channels in the transmission-line | |
252 filterbank of the physiological version of AIM, and in the neural | |
253 encoding of the functional version of AIM. The minimum channel | |
254 density for these processes to operate properly is four channels per | |
255 ERB and 2 channels per ERB, respectively. For broadband signals like | |
256 speech this means that the minimum number of channels is on the order | |
257 of 128 and 64, respectively. This channel density can produce | |
258 cluttered displays, and more importantly, it is far too many channels | |
259 for current speech recognition systems which typically use 12-24 | |
260 channels. This is not just a computer power problem; the recognition | |
261 systems actually perform less well with extra channels. Accordingly, | |
262 the option 'downchannel' provides the option of reducing the channel | |
263 density at output, so that AIM can operate with the appropriate | |
264 channel density and still provide output that is compatible with | |
265 displays and speech recognition systems. | |
266 | |
267 .LP | |
268 Downchannel averages pairs of adjacent channels and the option value | |
269 specifies how many times it should execute the averaging process. Each | |
270 averaging reduces the number of channels by a factor of 2, so for | |
271 proper transmission-line filtering and an output file with 16 | |
272 channels, set channels_afb=128 and downchannel=3 (three successive | |
273 halvings of the number of channels). | |
274 | |
275 | |
276 .LP | |
277 A. Animated Cartoons | |
278 .LP | |
279 .RS 3 | |
280 Four of the AIM instructions produce output in the form of sequences | |
281 of spectral frames (gensgm, gencgm, genasa and genepn). Bitmap | |
282 versions of the displays of the frames can be stored by AIM and | |
283 replayed by review and xreview. When the sequence of frames is played | |
284 rapidly, it appears as an animated cartoon that shows the dynamic | |
285 behaviour of the spectrum of the sound. | |
286 .LP | |
287 Similarly, the AIM instructions for auditory images (gensai and | |
288 genspl) produce sequences of landscape frames, and bitmap versions of | |
289 the landscape displays can also be stored by AIM and replayed by | |
290 review and xreview. Indeed, it was the desire to produce auditory | |
291 image cartoons that led to the development of much of the AIM software | |
292 package. The animated cartoons or auditory images show the dynamic | |
293 behaviour of features in the images, like the motion of formants in | |
294 diphthongs and the motion of notes in a melody. | |
295 .LP | |
296 This section describes the options that control the construction and | |
297 storage of sequences of bitmaps; there is a separate manual entries for | |
298 the xreview routine that replays the bitmaps ( 'manaim xreview'). | |
299 | |
300 | |
301 .LP | |
302 erase_ctn Erase the current frame before presenting | |
303 .RS 7 | |
304 the next frame | |
305 .RE | |
306 .RS 7 | |
307 Switch. Default value: on. | |
308 .RE | |
309 .LP | |
310 | |
311 Normally, when presenting a sequence of frames as an animated cartoon, | |
312 one wants to erase the current frame before presenting the next. When | |
313 the frames are spectra, however, the set of frames can together form a | |
314 meaningful display; for example, the set of rising spectra produced at | |
315 the onset of a sound produces a contour map of the onset. The option | |
316 erase_ctn enables the user to observe the full set of spectra | |
317 simultaneously. (See aimdemo_gtf_spectra or aimdemo_tlf_spectra ). | |
318 | |
319 .LP | |
320 animate_ctn Store frames in memory and replay all of | |
321 .RS 7 | |
322 them as a cartoon | |
323 .RE | |
324 .RS 7 | |
325 Switch. Default value: off. | |
326 .RE | |
327 .LP | |
328 When this option is on, AIM stores the bitmaps of the frames it | |
329 produces in the memory of the machine and replays them rapidly when | |
330 the instruction is complete. Type RETURN to animate the cartoon again; | |
331 type 'q RETURN' to exit the instruction. (This option was important | |
332 when machines were slower and before the availability of review and | |
333 xreview. It is now largely obsolete.) | |
334 .LP | |
335 bitmap_ctn Store bitmaps of frames in a file for | |
336 .RS 7 | |
337 replay as a cartoon | |
338 .RE | |
339 .RS 7 | |
340 Switch. Default value: off. | |
341 .RE | |
342 .LP | |
343 When this option is on, bitmaps of the frames produced for the input | |
344 in file_name will be stored in file_name.ctn. The sequence of frames can later be replayed using either | |
345 .LP | |
346 > review file_name or | |
347 .LP | |
348 > xreview file_name | |
349 .LP | |
350 Both of these programs enable the user to vary the rate of animation, | |
351 the section of the sequence to be view, etc. The xreview version has a | |
352 window interface with useful information and is the preferred version | |
353 in most cases. | |
354 .RE | |
355 | |
356 .RS 3 | |
357 B. Output Files for Printing and Postprocessing | |
358 | |
359 .LP | |
360 Postscript Produce printer-ready output | |
361 .RS 7 | |
362 Switch. Default value: off. | |
363 .RE | |
364 .LP | |
365 This switch causes AIM to produce a printer-ready version of the | |
366 displays it presents on the computer screen. For example, the NAP of | |
367 a 32-ms section of cegc can be printed using | |
368 .LP | |
369 > gennap length=32 postscript=on cegc | lpr -Plw | |
370 .LP | |
371 where 'lpr' is the Unix printer-driver and the 'lw' of -Plw specifies | |
372 the destination printer. You may need to check the name of your | |
373 system's printer driver and laser printer. | |
374 .LP | |
375 Alternately the postscript version of the display may be directed to a | |
376 file using an instruction like | |
377 .LP | |
378 > gennap length=32 postscript=on cegc > cegc_nap.ps | |
379 .LP | |
380 and printed later at the users convenience. In this example, the file | |
381 name cegc_nap.ps is not generated by AIM; the '_nap.ps' suffix is | |
382 added by the user following standard conventions to indicate that the file | |
383 contains a NAP in postscript form. | |
384 | |
385 .RS 3 | |
386 .LP | |
387 THREE POSTSCRIPT CAUTIONS: | |
388 .LP | |
389 Postscript files of landscape displays from AIM are very large. As a | |
390 result, we recommend | |
391 .LP | |
392 a) that you NOT switch postscript on without redirecting the output to | |
393 a file, as it will cause the output to be display on the screen in a | |
394 seemingly endless display, | |
395 .LP | |
396 b) that you be careful NOT to print postscript files on a printer | |
397 which does not understand the Postscript language, as it can cause the | |
398 printer to put out an extremely long file, one column per page! | |
399 .LP | |
400 c) that you NOT set postscript=on in an options file as it will | |
401 generate large files in the directory without your noticing. | |
402 .RE | |
403 | |
404 .LP | |
405 output Generate an output file | |
406 .RS 3 | |
407 Switch. Default value: off. | |
408 .RE | |
409 .LP | |
410 This switch causes the array of functions that defines AIM's | |
411 simulation of basilar membrane motion, or a neural activity pattern, | |
412 or an auditory image, to be stored in a file for subsequent processing | |
413 by the aimtools or other, user defined, operators. By convention, the | |
414 file is given the same name as the input file, but with a suffix | |
415 reflecting the entry point, to distinguish it from the input file on | |
416 the one hand and from other output files on the other hand. The naming | |
417 system enables the user to construct and store a set of output files | |
418 for one input file without the need to specify a sequence of file | |
419 names. The suffixes are those used to identify the modules in the | |
420 listing produced by 'gen -help'. So, for example, the following | |
421 command line: | |
422 .LP | |
423 > gennap output=on length=32 cegc | |
424 .LP | |
425 will produce an output file named cegc.nap containing a multiplexed | |
426 version of the functions that define the NAP of the first 32 ms of | |
427 cegc. | |
428 .LP | |
429 The spectrographic representations produced by gensgm and gencgm can | |
430 be stored in the same way, as can the sequences of spectra produced by | |
431 genasa and genepn. It is the output files of genasa and gencgm that | |
432 are used to interface AIM with speech recognition systems (Robinson et | |
433 al., 1990; Patterson et al., 1995; Giguere and Woodland, 1994a). | |
434 Details of the file formats are presented in docs/aimFileFormat. | |
435 .LP | |
436 Header Put a header on the output file | |
437 .RS 3 | |
438 Flag. Default value: on. | |
439 .RE | |
440 .LP | |
441 By default, a header is prepended to each output file so that | |
442 subsequent processors have access to the history of the file. Details | |
443 of the header structure are presented in docs/aimFileFormat. | |
444 .LP | |
445 .RE | |
446 | |
447 .SS | |
448 III. INPUT OPTIONS | |
449 .LP | |
450 The input options enable the user to process a subsection of the input | |
451 wave, and to specify characterisitcs of the wave. | |
452 .LP | |
453 The input options are: input_wave, start_wave, length_wave, | |
454 samplerate, swap_wave, bits_wave, dB_wave. | |
455 .LP | |
456 input_wave Default input wave name | |
457 .RS 13 | |
458 Filename. Default value: none. | |
459 .RE | |
460 .LP | |
461 The name of the wave file to process. This option permits simple | |
462 repetitive processing of the same input file without repetitive typing. It | |
463 also enables one to circumvent the Unix convention of having the filename | |
464 last on the command line. This option is overridden if the user supplies a | |
465 wave file name at the end of the command line. | |
466 .LP | |
467 start_wave Start point in wave | |
468 .RS 13 | |
469 Default unit: ms. Default value: 0. | |
470 .RE | |
471 .LP | |
472 The point in the input wave at which processing should begin. The | |
473 start_wave option is expressed in milliseconds and its default value is the | |
474 beginning of the file (i.e. 0 ms into the file). | |
475 .LP | |
476 length_wave Length of wave | |
477 .RS 13 | |
478 Default unit: ms. Default value: remainder. | |
479 .RE | |
480 .LP | |
481 The number of milliseconds of the wave that ought to be processed, | |
482 beyond the start point. The special value 'remainder' indicates that | |
483 the entire length of the wave from the start point to the end of the | |
484 file should be processed. | |
485 .LP | |
486 samplerate Input wave sample rate | |
487 .RS 13 | |
488 Default unit: Hertz. Default value: 20,000 Hz. | |
489 .RE | |
490 .LP | |
491 The rate at which the input wave was sampled. | |
492 .LP | |
493 swap_wave Swap the bytes in each binary pair of the | |
494 .RS 13 | |
495 input file | |
496 .RE | |
497 .RS 13 | |
498 Switch. Default: off. | |
499 .RE | |
500 .LP | |
501 The order of the bytes in short integers varies between manufacturers. | |
502 Specifically the order for Sun and HP is opposite that for DEC SGI and | |
503 IBM. The default setting (off) is for the latter byte order. | |
504 .LP | |
505 bits_wave Bits in the input wave | |
506 .RS 13 | |
507 Unit: bits. Default: 12. (Only alternate: 16.) | |
508 .RE | |
509 .LP | |
510 The number of significant bits in each (16-bit) word of the input | |
511 wave. Note that gain_gtf or gaim_tlf should be changed to 0.0625 when | |
512 the number of bits is set to 16 to avoid overflow. | |
513 .LP | |
514 dB_wave Scaling of the input wave | |
515 .RS 13 | |
516 (for physiological route only) | |
517 .RE | |
518 .RS 13 | |
519 Units: dB. Default: 60 dB | |
520 .RE | |
521 .LP | |
522 This option enables the user to specify the relative level of | |
523 the input wave in decibels. It is particularly useful for | |
524 investigating the level-dependent properties of the | |
525 physiological version of AIM. | |
526 .LP | |
527 The functional route is level-independent and dB_wave is | |
528 ignored no matter what its value. | |
529 .LP | |
530 dB_wave can also be used to scale the input wave in absolute | |
531 units, i.e sound-pressure level (dB SPL), using the following | |
532 equation: | |
533 .LP | |
534 dB_wave = dBSPL - 20log(RMS/200) | |
535 .LP | |
536 where RMS is the root-mean-square amplitude of the input wave, | |
537 or the portion of the wave or interest, and dBSPL is the | |
538 desired sound-pressure level scaling (in dB). For | |
539 example, to scale to 60 dB SPL a wave with an RMS amplitude | |
540 of 467.3, dB_wave should be set to 52.6. | |
541 .LP | |
542 Note: The RMS value of a stored input wave can be calculated using | |
543 the tools provided with the AIM software. | |
544 | |
545 | |
546 .LP | |
547 .RE | |
548 | |
549 .SH FILES | |
550 .LP | |
551 .genwavrc The options file for genwav. | |
552 .SH SEE ALSO | |
553 .LP | |
554 genbmm | |
555 .SH BUGS | |
556 .LP | |
557 .SH COPYRIGHT | |
558 .LP | |
559 Copyright (c) Applied Psychology Unit, Medical Research Council, 1995 | |
560 .LP | |
561 Permission to use, copy, modify, and distribute this software without fee | |
562 is hereby granted for research purposes, provided that this copyright | |
563 notice appears in all copies and in all supporting documentation, and that | |
564 the software is not redistributed for any fee (except for a nominal | |
565 shipping charge). Anyone wanting to incorporate all or part of this | |
566 software in a commercial product must obtain a license from the Medical | |
567 Research Council. | |
568 .LP | |
569 The MRC makes no representations about the suitability of this | |
570 software for any purpose. It is provided "as is" without express or | |
571 implied warranty. | |
572 .LP | |
573 THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING | |
574 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL | |
575 THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES | |
576 OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, | |
577 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, | |
578 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS | |
579 SOFTWARE. | |
580 .LP | |
581 .SH ACKNOWLEDGEMENTS | |
582 .LP | |
583 The AIM software was developed for Unix workstations by John | |
584 Holdsworth and Mike Allerhand of the MRC APU, under the direction of | |
585 Roy Patterson. The physiological version of AIM was developed by | |
586 Christian Giguere. The options handler is by Paul Manson. The revised | |
587 SAI module is by Jay Datta. Michael Akeroyd extended the postscript | |
588 facilites and developed the xreview routine for auditory image | |
589 cartoons. | |
590 .LP | |
591 The project was supported by the MRC and grants from the U.K. Defense | |
592 Research Agency, Farnborough (Research Contract 2239); the EEC Esprit | |
593 BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust. | |
594 |