tomwalters@0
|
1 .TH GENSGM 1 "11 May 1995"
|
tomwalters@0
|
2 .LP
|
tomwalters@0
|
3 .SH NAME
|
tomwalters@0
|
4 .LP
|
tomwalters@0
|
5 gensgm \- generate auditory spectrogram
|
tomwalters@0
|
6 .LP
|
tomwalters@0
|
7 .SH SYNOPSIS
|
tomwalters@0
|
8 .LP
|
tomwalters@0
|
9 gensgm [ option=value | -option ] [ filename ]
|
tomwalters@0
|
10 .LP
|
tomwalters@0
|
11 .SH DESCRIPTION
|
tomwalters@0
|
12 .LP
|
tomwalters@0
|
13 The gensgm module of the AIM software performs a time-domain spectral
|
tomwalters@0
|
14 analysis using a bank of auditory filters, and summarises the
|
tomwalters@0
|
15 information in an auditory spectrogram, that is, a spectrogram with
|
tomwalters@0
|
16 auditory frequency resolution and temporal resolution, rather than the
|
tomwalters@0
|
17 fixed frequency and temporal resolution of traditional speech
|
tomwalters@0
|
18 preprocessors. The spectral analysis converts the input wave into an
|
tomwalters@0
|
19 array of filtered waves, one for each channel of a gammatone auditory
|
tomwalters@0
|
20 filterbank. The surface of the array of filtered waves is AIM's
|
tomwalters@0
|
21 representation of basilar membrane motion (BMM) as a function of
|
tomwalters@0
|
22 time. The auditory spectrogram is a plot of a sequence of spectral
|
tomwalters@0
|
23 slices extracted from the envelope of the BMM every 'frstep_epn'
|
tomwalters@0
|
24 ms. The envelope is calculated continuously, by rectifing,
|
tomwalters@0
|
25 compressing, and lowpass filtering the individual BMM waves as they
|
tomwalters@0
|
26 flow from the filterbank.
|
tomwalters@0
|
27 .LP
|
tomwalters@0
|
28 The frequency resolution of the analysis varies with the center
|
tomwalters@0
|
29 frequency of the channel as in the auditory system, and the
|
tomwalters@0
|
30 distribution of channels across frequency is chosen to match that in
|
tomwalters@0
|
31 the auditory system (Patterson and Moore, 1986). Thus, the auditory
|
tomwalters@0
|
32 spectrogram is a greyscale plot of the activity in each channel
|
tomwalters@0
|
33 (shades of black) as a function of time (the abscissa) and the centre
|
tomwalters@0
|
34 frequency of the auditory filter (the ordinate) in ERB's. The
|
tomwalters@0
|
35 representation is referred to as an auditory spectrogram (SGM) to
|
tomwalters@0
|
36 distinguish it from more traditional spectrograms based on Fourier,
|
tomwalters@0
|
37 LPC or cepstral analysis. In AIM, the suffix 'sgm' is used to
|
tomwalters@0
|
38 distinguish this spectral representation from the other spectral
|
tomwalters@0
|
39 representations provided by the software ('asa' auditory spectral
|
tomwalters@0
|
40 analysis, 'cgm' cochleogram, and 'epn' excitation pattern).
|
tomwalters@0
|
41 .LP
|
tomwalters@0
|
42 The spectral analysis performed by gensgm is the same as that
|
tomwalters@0
|
43 performed by genbmm (manaim genbmm). The primary differences are in
|
tomwalters@0
|
44 the display defaults and the inclusion of the Compression and Leaky
|
tomwalters@0
|
45 Integration modules used to produce the spectral slices that form the
|
tomwalters@0
|
46 spectrogram. As a result, this manual entry is restricted to
|
tomwalters@0
|
47 describing the option values that differ from those in genbmm and the
|
tomwalters@0
|
48 additional options required to control the Compression and Leaky
|
tomwalters@0
|
49 Integration.
|
tomwalters@0
|
50 .LP
|
tomwalters@0
|
51 .SH DISPLAY DEFAULTS
|
tomwalters@0
|
52 .LP
|
tomwalters@0
|
53 The default values for three of the display options are reset to
|
tomwalters@0
|
54 produce a spectrographic format rather than a landscape. Specifically,
|
tomwalters@0
|
55 display=greyscale, bottom=0 and top=2500. The number of channels is
|
tomwalters@0
|
56 set to 128 for compatibility with the auditory spectrum modules,
|
tomwalters@0
|
57 genasa and genepn. When using AIM as a preprocessor for speech
|
tomwalters@0
|
58 recognition the number of channels would typically be reduced to
|
tomwalters@0
|
59 between 24 and 32. Use option 'downsample' if it is necessary to
|
tomwalters@0
|
60 reduce the output to less than 24 channels across the speech range.
|
tomwalters@0
|
61 .LP
|
tomwalters@0
|
62 .SH COMPRESSION AND LEAKY INTEGRATION
|
tomwalters@0
|
63 .LP
|
tomwalters@0
|
64 Compression and lowpass filtering are activated and the neural
|
tomwalters@0
|
65 encoding stage that comes between them is turned off:
|
tomwalters@0
|
66 .LP
|
tomwalters@0
|
67 .SS "Compression"
|
tomwalters@0
|
68 .PP
|
tomwalters@0
|
69 Auditory spectra are usually produced via the functional route in
|
tomwalters@0
|
70 AIM. In this case, compress is set on
|
tomwalters@0
|
71 .LP
|
tomwalters@0
|
72 .TP 13
|
tomwalters@0
|
73 compress
|
tomwalters@0
|
74 Logarithmic compressor switch
|
tomwalters@0
|
75 .RS
|
tomwalters@0
|
76 Switch. Default: on.
|
tomwalters@0
|
77 .RE
|
tomwalters@0
|
78 .RS
|
tomwalters@0
|
79 .LP
|
tomwalters@0
|
80 Note: The compressor in the functional route of AIM is logarithmic and
|
tomwalters@0
|
81 it screens out negative BMM values before compression. This rectifies
|
tomwalters@0
|
82 the wave during the compression process and so the separate rectify
|
tomwalters@0
|
83 option is left off.
|
tomwalters@0
|
84 .RE
|
tomwalters@0
|
85 .LP
|
tomwalters@0
|
86 .RS
|
tomwalters@0
|
87 .LP
|
tomwalters@0
|
88 Note: The compressor in the physiological route of AIM is an integral
|
tomwalters@0
|
89 part of the tlf module, so when using this route to produce auditory
|
tomwalters@0
|
90 spectra, turn off the logarithmic compressor (i.e. compress=off). The
|
tomwalters@0
|
91 compressor in tlf does not screen out negative values so it is also
|
tomwalters@0
|
92 important to set rectify=on.
|
tomwalters@0
|
93 .RE
|
tomwalters@0
|
94 .RS
|
tomwalters@0
|
95 .LP
|
tomwalters@0
|
96 Full wave rectification is produced if rectify is set to 2. This
|
tomwalters@0
|
97 option value leads to smoother spectrograms. It is also useful when
|
tomwalters@0
|
98 calculating envelopes with genasa.
|
tomwalters@0
|
99 .RE
|
tomwalters@0
|
100 .LP
|
tomwalters@0
|
101 .SS "Transduction"
|
tomwalters@0
|
102 .PP
|
tomwalters@0
|
103 .LP
|
tomwalters@0
|
104 .TP 13
|
tomwalters@0
|
105 transduction
|
tomwalters@0
|
106 Neural transduction switch (at, meddis, off)
|
tomwalters@0
|
107 .RS
|
tomwalters@0
|
108 Switch. Default: off.
|
tomwalters@0
|
109 .RE
|
tomwalters@0
|
110 .LP
|
tomwalters@0
|
111 .SS "Leaky Integration"
|
tomwalters@0
|
112 .PP
|
tomwalters@0
|
113 .LP
|
tomwalters@0
|
114 .TP 13
|
tomwalters@0
|
115 stages_idt
|
tomwalters@0
|
116 Number of stages of lowpass filtering
|
tomwalters@0
|
117 .RS
|
tomwalters@0
|
118 Default unit: scalar. Default value: 2
|
tomwalters@0
|
119 .RE
|
tomwalters@0
|
120 .TP 13
|
tomwalters@0
|
121 tup_idt
|
tomwalters@0
|
122 The time constant for each filter stage
|
tomwalters@0
|
123 .RS
|
tomwalters@0
|
124 Default unit: ms. Default value: 8 ms.
|
tomwalters@0
|
125 .RE
|
tomwalters@0
|
126 .LP
|
tomwalters@0
|
127 The Equivalent Rectandular Duration (ERD) of a two stage lowpass
|
tomwalters@0
|
128 filter is about 1.6 times the time constant of each stage, or
|
tomwalters@0
|
129 12.8 ms in the current case.
|
tomwalters@0
|
130 .TP 13
|
tomwalters@0
|
131 frstep_epn
|
tomwalters@0
|
132 The time between successive spectral frames
|
tomwalters@0
|
133 .RS
|
tomwalters@0
|
134 Default unit: ms. Default value: 10 ms.
|
tomwalters@0
|
135 .RE
|
tomwalters@0
|
136 .LP
|
tomwalters@0
|
137 With a frstep_epn of 10 ms, gensgm will produce spectral frames at a
|
tomwalters@0
|
138 rate of 100 per second.
|
tomwalters@0
|
139 .LP
|
tomwalters@0
|
140 .TP 13
|
tomwalters@0
|
141 downsample
|
tomwalters@0
|
142 The time between successive spectral frames.
|
tomwalters@0
|
143 .RS
|
tomwalters@0
|
144 Default unit: ms. Default value: 10 ms.
|
tomwalters@0
|
145 .RE
|
tomwalters@0
|
146 .LP
|
tomwalters@0
|
147 Downsample is simply another name for frstep_epn, provided to
|
tomwalters@0
|
148 facilitate a different mode of thinking about time-series data.
|
tomwalters@0
|
149 .LP
|
tomwalters@0
|
150 .SH FILES
|
tomwalters@0
|
151 .LP
|
tomwalters@0
|
152 .TP 13
|
tomwalters@0
|
153 .gensgmrc
|
tomwalters@0
|
154 The options file for gensgm.
|
tomwalters@0
|
155 .LP
|
tomwalters@0
|
156 .SH SEE ALSO
|
tomwalters@0
|
157 .LP
|
tomwalters@0
|
158 genasa, genbmm, genepn, gencgm
|
tomwalters@0
|
159 .LP
|
tomwalters@0
|
160 .SH BUGS
|
tomwalters@0
|
161 .LP
|
tomwalters@0
|
162 None currently known.
|
tomwalters@0
|
163 .SH COPYRIGHT
|
tomwalters@0
|
164 .LP
|
tomwalters@0
|
165 Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
|
tomwalters@0
|
166 .LP
|
tomwalters@0
|
167 Permission to use, copy, modify, and distribute this software without fee
|
tomwalters@0
|
168 is hereby granted for research purposes, provided that this copyright
|
tomwalters@0
|
169 notice appears in all copies and in all supporting documentation, and that
|
tomwalters@0
|
170 the software is not redistributed for any fee (except for a nominal
|
tomwalters@0
|
171 shipping charge). Anyone wanting to incorporate all or part of this
|
tomwalters@0
|
172 software in a commercial product must obtain a license from the Medical
|
tomwalters@0
|
173 Research Council.
|
tomwalters@0
|
174 .LP
|
tomwalters@0
|
175 The MRC makes no representations about the suitability of this
|
tomwalters@0
|
176 software for any purpose. It is provided "as is" without express or
|
tomwalters@0
|
177 implied warranty.
|
tomwalters@0
|
178 .LP
|
tomwalters@0
|
179 THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
|
tomwalters@0
|
180 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
|
tomwalters@0
|
181 THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
|
tomwalters@0
|
182 OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
|
tomwalters@0
|
183 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
|
tomwalters@0
|
184 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
|
tomwalters@0
|
185 SOFTWARE.
|
tomwalters@0
|
186 .LP
|
tomwalters@0
|
187 .SH ACKNOWLEDGEMENTS
|
tomwalters@0
|
188 .LP
|
tomwalters@0
|
189 The AIM software was developed for Unix workstations by John
|
tomwalters@0
|
190 Holdsworth and Mike Allerhand of the MRC APU, under the direction of
|
tomwalters@0
|
191 Roy Patterson. The physiological version of AIM was developed by
|
tomwalters@0
|
192 Christian Giguere. The options handler is by Paul Manson. The revised
|
tomwalters@0
|
193 SAI module is by Jay Datta. Michael Akeroyd extended the postscript
|
tomwalters@0
|
194 facilites and developed the xreview routine for auditory image
|
tomwalters@0
|
195 cartoons.
|
tomwalters@0
|
196 .LP
|
tomwalters@0
|
197 The project was supported by the MRC and grants from the U.K. Defense
|
tomwalters@0
|
198 Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
|
tomwalters@0
|
199 BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
|
tomwalters@0
|
200
|