Mercurial > hg > aim92
view man/man1/gensgm.1 @ 0:5242703e91d3 tip
Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author | tomwalters |
---|---|
date | Fri, 20 May 2011 15:19:45 +0100 |
parents | |
children |
line wrap: on
line source
.TH GENSGM 1 "11 May 1995" .LP .SH NAME .LP gensgm \- generate auditory spectrogram .LP .SH SYNOPSIS .LP gensgm [ option=value | -option ] [ filename ] .LP .SH DESCRIPTION .LP The gensgm module of the AIM software performs a time-domain spectral analysis using a bank of auditory filters, and summarises the information in an auditory spectrogram, that is, a spectrogram with auditory frequency resolution and temporal resolution, rather than the fixed frequency and temporal resolution of traditional speech preprocessors. The spectral analysis converts the input wave into an array of filtered waves, one for each channel of a gammatone auditory filterbank. The surface of the array of filtered waves is AIM's representation of basilar membrane motion (BMM) as a function of time. The auditory spectrogram is a plot of a sequence of spectral slices extracted from the envelope of the BMM every 'frstep_epn' ms. The envelope is calculated continuously, by rectifing, compressing, and lowpass filtering the individual BMM waves as they flow from the filterbank. .LP The frequency resolution of the analysis varies with the center frequency of the channel as in the auditory system, and the distribution of channels across frequency is chosen to match that in the auditory system (Patterson and Moore, 1986). Thus, the auditory spectrogram is a greyscale plot of the activity in each channel (shades of black) as a function of time (the abscissa) and the centre frequency of the auditory filter (the ordinate) in ERB's. The representation is referred to as an auditory spectrogram (SGM) to distinguish it from more traditional spectrograms based on Fourier, LPC or cepstral analysis. In AIM, the suffix 'sgm' is used to distinguish this spectral representation from the other spectral representations provided by the software ('asa' auditory spectral analysis, 'cgm' cochleogram, and 'epn' excitation pattern). .LP The spectral analysis performed by gensgm is the same as that performed by genbmm (manaim genbmm). The primary differences are in the display defaults and the inclusion of the Compression and Leaky Integration modules used to produce the spectral slices that form the spectrogram. As a result, this manual entry is restricted to describing the option values that differ from those in genbmm and the additional options required to control the Compression and Leaky Integration. .LP .SH DISPLAY DEFAULTS .LP The default values for three of the display options are reset to produce a spectrographic format rather than a landscape. Specifically, display=greyscale, bottom=0 and top=2500. The number of channels is set to 128 for compatibility with the auditory spectrum modules, genasa and genepn. When using AIM as a preprocessor for speech recognition the number of channels would typically be reduced to between 24 and 32. Use option 'downsample' if it is necessary to reduce the output to less than 24 channels across the speech range. .LP .SH COMPRESSION AND LEAKY INTEGRATION .LP Compression and lowpass filtering are activated and the neural encoding stage that comes between them is turned off: .LP .SS "Compression" .PP Auditory spectra are usually produced via the functional route in AIM. In this case, compress is set on .LP .TP 13 compress Logarithmic compressor switch .RS Switch. Default: on. .RE .RS .LP Note: The compressor in the functional route of AIM is logarithmic and it screens out negative BMM values before compression. This rectifies the wave during the compression process and so the separate rectify option is left off. .RE .LP .RS .LP Note: The compressor in the physiological route of AIM is an integral part of the tlf module, so when using this route to produce auditory spectra, turn off the logarithmic compressor (i.e. compress=off). The compressor in tlf does not screen out negative values so it is also important to set rectify=on. .RE .RS .LP Full wave rectification is produced if rectify is set to 2. This option value leads to smoother spectrograms. It is also useful when calculating envelopes with genasa. .RE .LP .SS "Transduction" .PP .LP .TP 13 transduction Neural transduction switch (at, meddis, off) .RS Switch. Default: off. .RE .LP .SS "Leaky Integration" .PP .LP .TP 13 stages_idt Number of stages of lowpass filtering .RS Default unit: scalar. Default value: 2 .RE .TP 13 tup_idt The time constant for each filter stage .RS Default unit: ms. Default value: 8 ms. .RE .LP The Equivalent Rectandular Duration (ERD) of a two stage lowpass filter is about 1.6 times the time constant of each stage, or 12.8 ms in the current case. .TP 13 frstep_epn The time between successive spectral frames .RS Default unit: ms. Default value: 10 ms. .RE .LP With a frstep_epn of 10 ms, gensgm will produce spectral frames at a rate of 100 per second. .LP .TP 13 downsample The time between successive spectral frames. .RS Default unit: ms. Default value: 10 ms. .RE .LP Downsample is simply another name for frstep_epn, provided to facilitate a different mode of thinking about time-series data. .LP .SH FILES .LP .TP 13 .gensgmrc The options file for gensgm. .LP .SH SEE ALSO .LP genasa, genbmm, genepn, gencgm .LP .SH BUGS .LP None currently known. .SH COPYRIGHT .LP Copyright (c) Applied Psychology Unit, Medical Research Council, 1995 .LP Permission to use, copy, modify, and distribute this software without fee is hereby granted for research purposes, provided that this copyright notice appears in all copies and in all supporting documentation, and that the software is not redistributed for any fee (except for a nominal shipping charge). Anyone wanting to incorporate all or part of this software in a commercial product must obtain a license from the Medical Research Council. .LP The MRC makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty. .LP THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. .LP .SH ACKNOWLEDGEMENTS .LP The AIM software was developed for Unix workstations by John Holdsworth and Mike Allerhand of the MRC APU, under the direction of Roy Patterson. The physiological version of AIM was developed by Christian Giguere. The options handler is by Paul Manson. The revised SAI module is by Jay Datta. Michael Akeroyd extended the postscript facilites and developed the xreview routine for auditory image cartoons. .LP The project was supported by the MRC and grants from the U.K. Defense Research Agency, Farnborough (Research Contract 2239); the EEC Esprit BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.