view man/man1/genwav.1 @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
line wrap: on
line source
.TH GENWAV 1 "11 May 1995"
.LP
.SH NAME
.LP
genwav \- display the wave in filename.
.LP
.SH SYNOPSIS
.LP
genwav [ option=value | -option ] [ filename ]
.LP
.SH DESCRIPTION
.LP 

Genwav sets up and Xwindow and displays a segment of the input wave
in the window. The size of the window and the size of the wave are
determined by options, as are a number of other input/output
functions. These options have no direct bearing on the auditory
processing performed by AIM. For convenience, these Non-Auditory
options are associated with the instruction genwav (the one
non-auditory instruction), and they are listed at the top of the
options tables prior to the auditory options.

.LP
There are three classes of Non-Auditory options: 
.LP
I)   DISPLAY OPTIONS that determine the format of the auditory representations 
of sound on the screen, or on paper when printed.
.LP  
II)  OUTPUT OPTIONS that determine the format and content of files used
to store the auditory representations of sounds.
.LP 
III) INPUT OPTIONS that determine how the wave in the input file should
be interpreted.
.LP
The output options are presented before the input options so that the
input options will be adjacent to the filterbank options in the
options tables produced by genbmm and subsequent instructions.

.SS 
I. DISPLAY OPTIONS
.LP

The AIM modules produce output in the form of a set of functions, one
for each channel of the auditory filterbank.  For example, the output
of genbmm is a set of functions that simulate basilar membrane motion
produced in response to the input wave.  By default, the AIM software
puts an Xwindow up on the computer screen and displays the output in
the window. This section describes the options that control these
displays.

.LP
The display options are: title, display, x0-win, y0-win, width_win,
height_win, display, view, top, bottom, overlap, headroom,
magnification, pensize, hiddenline.
.LP
A. The Display Window Title, Position, and Size
.RS 3

.LP
title	Title of output display.
.RS 5
	Character string. Default: input file name.
.RE
.LP
The title of the output being displayed.  If no title is given, the
display bears the name of the file of the input wave.

.LP
display	Display output on screen
.RS 5
	Switch.  Default: on.
.RE
.LP

Normally this switch is on and a bitmap of the output is displayed in
a graphical window on the computer screen.  The switch is provided
because the time taken to create the displays is considerable, and it
is useful to turn it dsiplay off using AIM as a preprocessor for
speech recognition.

.LP
x0_win	Left edge of window
.RS 5
	Unit: pixels.  Default: centre.
.RE
.LP
The left edge of the window into which the display will be drawn,
relative to the left edge of the screen (i.e. the x-coordinate of the
window within the screen).  A value of centre will cause centring in
the horizontal dimension (provided the window manager does not
override).
.LP
y0-win    Lower edge of window
.RS 5
	Unit: pixels.  Default: centre.
.RE
.LP
The lower edge of the window into which the display will be drawn,
relative to the lower edge of the screen (i.e. the y-coordinate of the
window within the screen).  A value of centre will cause centring in
the vertical dimension (provided the window manager does not
override).
.LP
Taken as a pair x0_win and y0-win determine the origin of the window,
relative to the screen origin which is assumed to be the lower left
corner of the screen.
.LP
width_win Window width
.RS 5
	Unit: pixels.  Default: 640.
.RE
.LP
The width of the window into which the display will be drawn.
.LP
height_win Window height
.RS 6
	Unit: pixels.  Default: 480.
.RE
.LP
The height of the window into which the display will be drawn.
.RE


.LP
B. Display Controls
.RS 3
.LP
top	The largest postive value visible in the display
.LP
	Scalar. Default value: 1024 (for genwav) 
.LP
Each of the functions in the multi-channel output of a module is
displayed in a transparent window. Provided the channel density is not
too low, the functions are related and the set of functions produces a
display that looks like a complex landscape. Top determines the
largest positive value that will appear in the transparent windows of
the individual functions, so top must be as large as the largest value
in the full set of functions. Increasing top has the effect of moving
the viewer farther up above the landscape.
.LP
bottom	The largest negative value visible in the
.RS 5 
	display 
.RE
.RS 5
	Scalar. Default value: -1024 (for genwav) 
.RE
.LP
Bottom determines the largest negative value that will appear in the
transparent windows of the individual functions, so bottom must be as
large in the negative direction as the largest negative value in the
full set of functions. Increasing bottom in the negative direction has
the effect of depeening the valleys in the landscape.
.LP
overlap   The overlap of transparent windows of the 
.RS 5
	individual functions
.RE
.RS 5
	Scalar: percentage. Default value: 50%
.RE
.LP 
The fact that the output functions are related means that they
fit up under each other in the display in a way that concentrates the
lines on the landscape and improves the display.
.LP
headroom  Display with headroom for the uppermost channel 
.RS 5
	Scalar: percentage. Default value: 0%
.RE
.LP
Because of the overlap of the transparent windows, part of the
uppermost transparent window is hidden by the upper edge of the
display window. This can cause truncation of the waves in the upper
channels.  To avoid truncation, headroom enables the user to specify
that the highest channel ought to be centred below the upper edge of
the window.  The value specified is taken to be the percentage of the
window between the zero line of the upper channel and the upper edge
of the window.
.LP
magnification Display magnification
.RS 9
	Scalar.  Default: 1.0.
.RE
.LP
The degree to which the amplitude of the functions in the display
should be magnified before being displayed.  This parameter is merely
for adjusting the visual contrast of the display.  The magnification
option is a multiplier, so a value of 1 implies drawing to scale,
while a value of 10 implies ten times (10x) the size of values in the
module output and 0.1 implies one tenth of the output size.
Magnification is related to, but separate from, the gain options which
affect the values of the output functions and the values stored in any
output files. Magnification is an alternative means of controlling the
size of the functions in the display -- alternative to top and bottom.
.LP
pensize	The size of the lines in the displays and the 
.RS 5
	dots on the spiral
.RE
.RS 5
	Unit: pixels.  Default: 1. 
.RE
.LP
This option allows the user to specify the thickness of the lines in
the display and the size of the dots on spiral auditory images.  It
also affects the lines and dots in postscript plots.  It is provided
primarily for use with printers which have much more resolution than
computer screens.  On laser printers a value of 3-5 gives reasonable
line thickness.  On the screen, a linewidth greater than 1 produces
slow drawing, and a gagged, blurred display.
.LP
hiddenline  Draw with overlapping parts of functions 
.RS 5 
	  hidden 
.RE
.RS 5
	  Switch.  Default: on.
.RE
.LP
This switch specifies whether or not a 'hidden line' algorithm should
be used when drawing the display.  It also affects printed displays.
In almost all cases, hiddenline results in more attractive displays of
waveforms, and it often makes complex displays easier to understand,
so the default is 'on'.  Note: hiddenline almost doubles the drawing
time so it is sometimes useful to switch it off on slower machines.
.LP

.SS 
II. OUTPUT OPTIONS
.RS 3
.LP
The output options are listed and described before the input options
so that the input options will be adjacent to the filterbank options
in the listings produced by genbmm and subsequent modules.  The output
options are downchannel, erase_ctn, animate_ctn, bitmap_ctn,
postscript, output, and header.
.LP
downchannel Average adjacent channels of multichannel 
.RS 7
	representations 
.RE
.RS 7
	Units: Number of averagings. 
.RE
.RS 7
     Default value: 0.
.RE
.LP

There is interaction between channels in the transmission-line
filterbank of the physiological version of AIM, and in the neural
encoding of the functional version of AIM.  The minimum channel
density for these processes to operate properly is four channels per
ERB and 2 channels per ERB, respectively.  For broadband signals like
speech this means that the minimum number of channels is on the order
of 128 and 64, respectively.  This channel density can produce
cluttered displays, and more importantly, it is far too many channels
for current speech recognition systems which typically use 12-24
channels.  This is not just a computer power problem; the recognition
systems actually perform less well with extra channels.  Accordingly,
the option 'downchannel' provides the option of reducing the channel
density at output, so that AIM can operate with the appropriate
channel density and still provide output that is compatible with
displays and speech recognition systems.

.LP
Downchannel averages pairs of adjacent channels and the option value
specifies how many times it should execute the averaging process. Each
averaging reduces the number of channels by a factor of 2, so for
proper transmission-line filtering and an output file with 16
channels, set channels_afb=128 and downchannel=3 (three successive
halvings of the number of channels).


.LP
A. Animated Cartoons
.LP
.RS 3
Four of the AIM instructions produce output in the form of sequences
of spectral frames (gensgm, gencgm, genasa and genepn).  Bitmap
versions of the displays of the frames can be stored by AIM and
replayed by review and xreview.  When the sequence of frames is played
rapidly, it appears as an animated cartoon that shows the dynamic
behaviour of the spectrum of the sound.
.LP
Similarly, the AIM instructions for auditory images (gensai and
genspl) produce sequences of landscape frames, and bitmap versions of
the landscape displays can also be stored by AIM and replayed by
review and xreview.  Indeed, it was the desire to produce auditory
image cartoons that led to the development of much of the AIM software
package. The animated cartoons or auditory images show the dynamic
behaviour of features in the images, like the motion of formants in
diphthongs and the motion of notes in a melody.
.LP
This section describes the options that control the construction and
storage of sequences of bitmaps; there is a separate manual entries for
the xreview routine that replays the bitmaps ( 'manaim xreview').


.LP
erase_ctn   Erase the current frame before presenting  
.RS 7 
	the next frame
.RE
.RS 7
	Switch. Default value: on.
.RE
.LP

Normally, when presenting a sequence of frames as an animated cartoon,
one wants to erase the current frame before presenting the next. When
the frames are spectra, however, the set of frames can together form a
meaningful display; for example, the set of rising spectra produced at
the onset of a sound produces a contour map of the onset. The option
erase_ctn enables the user to observe the full set of spectra
simultaneously. (See aimdemo_gtf_spectra or aimdemo_tlf_spectra ).

.LP
animate_ctn Store frames in memory and replay all of 
.RS 7
	them as a cartoon
.RE
.RS 7
	Switch. Default value: off.
.RE
.LP
When this option is on, AIM stores the bitmaps of the frames it
produces in the memory of the machine and replays them rapidly when
the instruction is complete. Type RETURN to animate the cartoon again;
type 'q RETURN' to exit the instruction.  (This option was important
when machines were slower and before the availability of review and
xreview. It is now largely obsolete.)
.LP
bitmap_ctn  Store bitmaps of frames in a file for  
.RS 7 
	replay as a cartoon 
.RE
.RS 7
	Switch. Default value: off.
.RE
.LP
When this option is on, bitmaps of the frames produced for the input
in file_name will be stored in file_name.ctn.  The sequence of frames can later be replayed using either 
.LP
> review file_name  or
.LP
> xreview file_name	
.LP
Both of these programs enable the user to vary the rate of animation,
the section of the sequence to be view, etc. The xreview version has a
window interface with useful information and is the preferred version
in most cases.
.RE

.RS 3
B. Output Files for Printing and Postprocessing

.LP
Postscript  Produce printer-ready output
.RS 7
	Switch.  Default value:  off.
.RE
.LP
This switch causes AIM to produce a printer-ready version of the
displays it presents on the computer screen.  For example, the NAP of
a 32-ms section of cegc can be printed using
.LP
> gennap length=32 postscript=on cegc | lpr -Plw
.LP
where 'lpr' is the Unix printer-driver and the 'lw' of -Plw specifies
the destination printer.  You may need to check the name of your
system's printer driver and laser printer.
.LP
Alternately the postscript version of the display may be directed to a
file using an instruction like
.LP
> gennap length=32 postscript=on cegc > cegc_nap.ps
.LP
and printed later at the users convenience. In this example, the file
name cegc_nap.ps is not generated by AIM; the '_nap.ps' suffix is
added by the user following standard conventions to indicate that the file
contains a NAP in postscript form.

.RS 3
.LP
THREE POSTSCRIPT CAUTIONS: 
.LP
Postscript files of landscape displays from AIM are very large. As a
result, we recommend
.LP
a) that you NOT switch postscript on without redirecting the output to
a file, as it will cause the output to be display on the screen in a
seemingly endless display,
.LP
b) that you be careful NOT to print postscript files on a printer
which does not understand the Postscript language, as it can cause the
printer to put out an extremely long file, one column per page!
.LP
c) that you NOT set postscript=on in an options file as it will
generate large files in the directory without your noticing.
.RE

.LP
output  Generate an output file
.RS 3
	Switch.  Default value:  off.
.RE
.LP
This switch causes the array of functions that defines AIM's
simulation of basilar membrane motion, or a neural activity pattern,
or an auditory image, to be stored in a file for subsequent processing
by the aimtools or other, user defined, operators. By convention, the
file is given the same name as the input file, but with a suffix
reflecting the entry point, to distinguish it from the input file on
the one hand and from other output files on the other hand. The naming
system enables the user to construct and store a set of output files
for one input file without the need to specify a sequence of file
names.  The suffixes are those used to identify the modules in the
listing produced by 'gen -help'.  So, for example, the following
command line:
.LP
> gennap output=on length=32 cegc
.LP
will produce an output file named cegc.nap containing a multiplexed
version of the functions that define the NAP of the first 32 ms of
cegc.  
.LP
The spectrographic representations produced by gensgm and gencgm can
be stored in the same way, as can the sequences of spectra produced by
genasa and genepn. It is the output files of genasa and gencgm that
are used to interface AIM with speech recognition systems (Robinson et
al., 1990; Patterson et al., 1995; Giguere and Woodland, 1994a).
Details of the file formats are presented in docs/aimFileFormat.
.LP
Header  Put a header on the output file
.RS 3
	Flag.  Default value:  on.
.RE
.LP
By default, a header is prepended to each output file so that
subsequent processors have access to the history of the file.  Details
of the header structure are presented in docs/aimFileFormat.
.LP
.RE 

.SS 
III. INPUT OPTIONS
.LP
The input options enable the user to process a subsection of the input
wave, and to specify characterisitcs of the wave.
.LP
The input options are: input_wave, start_wave, length_wave,
samplerate, swap_wave, bits_wave, dB_wave.
.LP
input_wave   Default input wave name
.RS 13
Filename.  Default value: none.
.RE
.LP
The name of the wave file to process.  This option permits simple
repetitive processing of the same input file without repetitive typing.  It
also enables one to circumvent the Unix convention of having the filename
last on the command line.  This option is overridden if the user supplies a
wave file name at the end of the command line.
.LP
start_wave   Start point in wave
.RS 13
Default unit: ms.  Default value: 0.
.RE
.LP
The point in the input wave at which processing should begin.  The
start_wave option is expressed in milliseconds and its default value is the
beginning of the file (i.e. 0 ms into the file).
.LP
length_wave  Length of wave
.RS 13
Default unit: ms.  Default value: remainder.
.RE
.LP
The number of milliseconds of the wave that ought to be processed,
beyond the start point.  The special value 'remainder' indicates that
the entire length of the wave from the start point to the end of the
file should be processed.
.LP
samplerate   Input wave sample rate
.RS 13
Default unit:  Hertz.  Default value:  20,000 Hz.
.RE
.LP
The rate at which the input wave was sampled.
.LP
swap_wave    Swap the bytes in each binary pair of the 
.RS 13 
input file
.RE
.RS 13
Switch. Default: off.
.RE
.LP
The order of the bytes in short integers varies between manufacturers.
Specifically the order for Sun and HP is opposite that for DEC SGI and
IBM.  The default setting (off) is for the latter byte order.
.LP
bits_wave    Bits in the input wave
.RS 13
Unit:  bits.  Default:  12. (Only alternate: 16.)
.RE
.LP
The number of significant bits in each (16-bit) word of the input
wave.  Note that gain_gtf or gaim_tlf should be changed to 0.0625 when
the number of bits is set to 16 to avoid overflow.
.LP
dB_wave      Scaling of the input wave 
.RS 13
(for physiological route only)
.RE
.RS 13
Units: dB. Default: 60 dB
.RE
.LP
This option enables the user to specify the relative level of
the input wave in decibels. It is particularly useful for 
investigating the level-dependent properties of the 
physiological version of AIM.
.LP
The functional route is level-independent and dB_wave is 
ignored no matter what its value.
.LP
dB_wave can also be used to scale the input wave in absolute 
units, i.e sound-pressure level (dB SPL), using the following 
equation:
.LP
dB_wave = dBSPL - 20log(RMS/200)
.LP
where RMS is the root-mean-square amplitude of the input wave, 
or the portion of the wave or interest, and  dBSPL is the 
desired sound-pressure level scaling (in dB). For 
example, to scale to 60 dB SPL a wave with an RMS amplitude
of 467.3, dB_wave should be set to 52.6.
.LP 
Note: The RMS value of a stored input wave can be calculated using 
the tools provided with the AIM software. 


.LP
.RE

.SH FILES 
.LP
 .genwavrc  The options file for genwav.
.SH SEE ALSO 
.LP
genbmm
.SH BUGS 
.LP
.SH COPYRIGHT
.LP
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
.LP
Permission to use, copy, modify, and distribute this software without fee 
is hereby granted for research purposes, provided that this copyright 
notice appears in all copies and in all supporting documentation, and that 
the software is not redistributed for any fee (except for a nominal 
shipping charge). Anyone wanting to incorporate all or part of this 
software in a commercial product must obtain a license from the Medical 
Research Council.
.LP
The MRC makes no representations about the suitability of this 
software for any purpose.  It is provided "as is" without express or 
implied warranty.
.LP
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING 
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL 
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES 
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, 
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS 
SOFTWARE.
.LP
.SH ACKNOWLEDGEMENTS
.LP
The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.
.LP
The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.