tomwalters@0
|
1 Revised for JASA, 3 April 95 1
|
tomwalters@0
|
2
|
tomwalters@0
|
3
|
tomwalters@0
|
4 Time-domain modelling of peripheral auditory processing:
|
tomwalters@0
|
5 A modular architecture and a software platform*
|
tomwalters@0
|
6
|
tomwalters@0
|
7 Roy D. Patterson and Mike H. Allerhand
|
tomwalters@0
|
8 MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, UK
|
tomwalters@0
|
9
|
tomwalters@0
|
10 Christian Gigure Laboratory of Experimental Audiology, University
|
tomwalters@0
|
11 Hospital Utrecht, 3508 GA Utrecht, The Netherlands
|
tomwalters@0
|
12
|
tomwalters@0
|
13 (Received December, 1994) (Revised 31 March 1995)
|
tomwalters@0
|
14
|
tomwalters@0
|
15 A software package with a modular architecture has been developed to
|
tomwalters@0
|
16 support perceptual modelling of the fine-grain spectro-temporal
|
tomwalters@0
|
17 information observed in the auditory nerve. The package contains both
|
tomwalters@0
|
18 functional and physiological modules to simulate auditory spectral
|
tomwalters@0
|
19 analysis, neural encoding and temporal integration, including new
|
tomwalters@0
|
20 forms of periodicity-sensitive temporal integration that generate
|
tomwalters@0
|
21 stabilized auditory images. Combinations of the modules enable the
|
tomwalters@0
|
22 user to approximate a wide variety of existing, time-domain, auditory
|
tomwalters@0
|
23 models. Sequences of auditory images can be replayed to produce
|
tomwalters@0
|
24 cartoons of auditory perceptions that illustrate the dynamic response
|
tomwalters@0
|
25 of the auditory system to everyday sounds.
|
tomwalters@0
|
26
|
tomwalters@0
|
27 PACS numbers: 43.64.Bt, 43.66.Ba, 43.71.An
|
tomwalters@0
|
28
|
tomwalters@0
|
29 Running head: Auditory Image Model Software
|
tomwalters@0
|
30
|
tomwalters@0
|
31
|
tomwalters@0
|
32 INTRODUCTION
|
tomwalters@0
|
33
|
tomwalters@0
|
34 Several years ago, we developed a functional model of the cochlea to
|
tomwalters@0
|
35 simulate the phase-locked activity that complex sounds produce in the
|
tomwalters@0
|
36 auditory nerve. The purpose was to investigate the role of the
|
tomwalters@0
|
37 fine-grain timing information in auditory perception generally
|
tomwalters@0
|
38 (Patterson et al., 1992a; Patterson and Akeroyd, 1995), and in speech
|
tomwalters@0
|
39 perception in particular (Patterson, Holdsworth and Allerhand, 1992b).
|
tomwalters@0
|
40 The architecture of the resulting Auditory Image Model (AIM) is shown
|
tomwalters@0
|
41 in the left-hand column of Fig. 1. The responses of the three modules
|
tomwalters@0
|
42 to the vowel in 'hat' are shown in the three panels of Fig. 2.
|
tomwalters@0
|
43 Briefly, the spectral analysis stage converts the sound wave into the
|
tomwalters@0
|
44 model's representation of basilar membrane motion (BMM). For the vowel
|
tomwalters@0
|
45 in 'hat', each glottal cycle generates a version of the basic vowel
|
tomwalters@0
|
46 structure in the BMM (top panel). The neural encoding stage
|
tomwalters@0
|
47 stabilizes the BMM in level and sharpens features like vowel formants,
|
tomwalters@0
|
48 to produce a simulation of the neural activity pattern (NAP) produced
|
tomwalters@0
|
49 by the sound in the auditory nerve (middle panel). The temporal
|
tomwalters@0
|
50 integration stage stabilizes the repeating structure in the NAP and
|
tomwalters@0
|
51 produces a simulation of our perception of the vowel (bottom panel),
|
tomwalters@0
|
52 referred to as the auditory image. Sequences of simulated images can
|
tomwalters@0
|
53 be generated at regular intervals and replayed as an animated cartoon
|
tomwalters@0
|
54 to show the dynamic behaviour of the auditory images produced by
|
tomwalters@0
|
55 everyday sounds.
|
tomwalters@0
|
56
|
tomwalters@0
|
57 An earlier version of the AIM software was made available to
|
tomwalters@0
|
58 collaborators via the Internet. From there it spread to the speech and
|
tomwalters@0
|
59 music communities, indicating a more general interest in auditory
|
tomwalters@0
|
60 models than we had originally anticipated. This has prompted us to
|
tomwalters@0
|
61 prepare documentation and a formal release of the software (AIM R7).
|
tomwalters@0
|
62
|
tomwalters@0
|
63 A number of users wanted to compare the outputs from the functional
|
tomwalters@0
|
64 model, which is almost level independent, with those from
|
tomwalters@0
|
65 physiological models of the cochlea, which are fundamentally level
|
tomwalters@0
|
66 dependent. Others wanted to compare the auditory images produced by
|
tomwalters@0
|
67 strobed temporal integration with correlograms. As a result, we have
|
tomwalters@0
|
68 installed alternative modules for each of the three main stages as
|
tomwalters@0
|
69 shown in the right-hand column of Fig. 1. The alternative spectral
|
tomwalters@0
|
70 analysis module is a non-linear, transmission line filterbank based on
|
tomwalters@0
|
71 Gigure and Woodland (1994a). The neural encoding module is based on
|
tomwalters@0
|
72 the inner haircell model of Meddis (1988). The temporal integration
|
tomwalters@0
|
73 module generates correlograms like those of Slaney and Lyon (1990) or
|
tomwalters@0
|
74 Meddis and Hewitt (1991), using the algorithm proposed by Allerhand
|
tomwalters@0
|
75 and Patterson (1992). The responses of the three modules to the vowel
|
tomwalters@0
|
76 in 'hat' are shown in Fig. 3 for the case where the level of the vowel
|
tomwalters@0
|
77 is 60 dB SPL. The patterns are broadly similar to those of the
|
tomwalters@0
|
78 functional modules but the details differ, particularly at the output
|
tomwalters@0
|
79 of the third stage. The differences grow more pronounced when the
|
tomwalters@0
|
80 level of the vowel is reduced to 30 dB SPL or increased to 90 dB SPL.
|
tomwalters@0
|
81 Figures 2 and 3 together illustrate how the software can be used to
|
tomwalters@0
|
82 compare and contrast different auditory models. The new modules also
|
tomwalters@0
|
83 open the way to time-domain simulation of hearing impairment and
|
tomwalters@0
|
84 distortion products of cochlear origin.
|
tomwalters@0
|
85
|
tomwalters@0
|
86 Switches were installed to enable the user to shift from the
|
tomwalters@0
|
87 functional to the physiological version of AIM at the output of each
|
tomwalters@0
|
88 stage of the model. This architecture enables the system to implement
|
tomwalters@0
|
89 other popular auditory models such as the gammatone- filterbank,
|
tomwalters@0
|
90 Meddis-haircell, correlogram models proposed by Assmann and
|
tomwalters@0
|
91 Summerfield (1990), Meddis and Hewitt (1991), and Brown and Cooke
|
tomwalters@0
|
92 (1994). The remainder of this letter describes the integrated software
|
tomwalters@0
|
93 package with emphasis on the functional and physiological routes, and
|
tomwalters@0
|
94 on practical aspects of obtaining the software package.*
|
tomwalters@0
|
95
|
tomwalters@0
|
96
|
tomwalters@0
|
97
|
tomwalters@0
|
98 I. THE AUDITORY IMAGE MODEL
|
tomwalters@0
|
99
|
tomwalters@0
|
100 A. The spectral analysis stage
|
tomwalters@0
|
101
|
tomwalters@0
|
102 Spectral analysis is performed by a bank of auditory filters which
|
tomwalters@0
|
103 converts a digitized wave into an array of filtered waves like those
|
tomwalters@0
|
104 shown in the top panels of Figs 2 and 3. The set of waves is AIM's
|
tomwalters@0
|
105 representation of basilar membrane motion. The software distributes
|
tomwalters@0
|
106 the filters linearly along a frequency scale measured in Equivalent
|
tomwalters@0
|
107 Rectangular Bandwidths (ERB's). The ERB scale was proposed by Glasberg
|
tomwalters@0
|
108 and Moore (1990) based on physiological research summarized in
|
tomwalters@0
|
109 Greenwood (1990) and psychoacoustic research summarized in Patterson
|
tomwalters@0
|
110 and Moore (1986). The constants of the ERB function can also be set to
|
tomwalters@0
|
111 produce a reasonable approximation to the Bark scale. Options enable
|
tomwalters@0
|
112 the user to specify the number of channels in the filterbank and the
|
tomwalters@0
|
113 minimum and maximum filter center frequencies.
|
tomwalters@0
|
114
|
tomwalters@0
|
115 AIM provides both a functional auditory filter and a physiological
|
tomwalters@0
|
116 auditory filter for generating the BMM: the former is a linear,
|
tomwalters@0
|
117 gammatone filter (Patterson et al., 1992a); the latter is a
|
tomwalters@0
|
118 non-linear, transmission-line filter (Gigure and Woodland, 1994a).
|
tomwalters@0
|
119 The impulse response of the gammatone filter provides an excellent fit
|
tomwalters@0
|
120 to the impulse response of primary auditory neurons in cats, and its
|
tomwalters@0
|
121 amplitude characteristic is very similar to that of the 'roex' filter
|
tomwalters@0
|
122 commonly used to represent the human auditory filter. The motivation
|
tomwalters@0
|
123 for the gammatone filterbank and the available implementations are
|
tomwalters@0
|
124 summarized in Patterson (1994a). The input wave is passed through an
|
tomwalters@0
|
125 optional middle-ear filter adapted from Lutman and Martin (1979).
|
tomwalters@0
|
126
|
tomwalters@0
|
127 In the physiological version, a 'wave digital filter' is used to
|
tomwalters@0
|
128 implement the classical, one-dimensional, transmission-line
|
tomwalters@0
|
129 approximation to cochlear hydrodynamics. A feedback circuit
|
tomwalters@0
|
130 representing the fast motile response of the outer haircells generates
|
tomwalters@0
|
131 level- dependent basilar membrane motion (Gigure and Woodland,
|
tomwalters@0
|
132 1994a). The filterbank generates combination tones of the type
|
tomwalters@0
|
133 f1-n(f2-f1) which propagate to the appropriate channel, and it has the
|
tomwalters@0
|
134 potential to generate cochlear echoes. Options enable the user to
|
tomwalters@0
|
135 customize the transmission line filter by specifying the feedback gain
|
tomwalters@0
|
136 and saturation level of the outer haircell circuit. The middle ear
|
tomwalters@0
|
137 filter forms an integral part of the simulation in this case.
|
tomwalters@0
|
138 Together, it and the transmission line filterbank provide a
|
tomwalters@0
|
139 bi-directional model of auditory spectral analysis.
|
tomwalters@0
|
140
|
tomwalters@0
|
141 The upper panels of Figs 2 and 3 show the responses of the two
|
tomwalters@0
|
142 filterbanks to the vowel in 'hat'. They have 75 channels covering the
|
tomwalters@0
|
143 frequency range 100 to 6000 Hz (3.3 to 30.6 ERB's). In the
|
tomwalters@0
|
144 high-frequency channels, the filters are broad and the glottal pulses
|
tomwalters@0
|
145 generate impulse responses which decay relatively quickly. In the
|
tomwalters@0
|
146 low-frequency channels, the filters are narrow and so they resolve
|
tomwalters@0
|
147 individual continuous harmonics. The rightward skew in the
|
tomwalters@0
|
148 low-frequency channels is the 'phase lag,' or 'propagation delay,' of
|
tomwalters@0
|
149 the cochlea, which arises because the narrower low-frequency filters
|
tomwalters@0
|
150 respond more slowly to input. The transmission line filterbank shows
|
tomwalters@0
|
151 more ringing in the valleys than the gammatone filterbank because of
|
tomwalters@0
|
152 its dynamic signal compression; as amplitude decreases the damping of
|
tomwalters@0
|
153 the basilar membrane is reduced to increase sensitivity and frequency
|
tomwalters@0
|
154 resolution.
|
tomwalters@0
|
155
|
tomwalters@0
|
156
|
tomwalters@0
|
157 B. The neural encoding stage
|
tomwalters@0
|
158
|
tomwalters@0
|
159 The second stage of AIM simulates the mechanical/neural transduction
|
tomwalters@0
|
160 process performed by the inner haircells. It converts the BMM into a
|
tomwalters@0
|
161 neural activity pattern (NAP), which is AIM's representation of the
|
tomwalters@0
|
162 afferent activity in the auditory nerve. Two alternative simulations
|
tomwalters@0
|
163 are provided for generating the NAP: a bank of two-dimensional
|
tomwalters@0
|
164 adaptive- thresholding units (Holdsworth and Patterson, 1993), or a
|
tomwalters@0
|
165 bank of inner haircell simulators (Meddis, 1988).
|
tomwalters@0
|
166
|
tomwalters@0
|
167 The adaptive thresholding mechanism is a functional representation of
|
tomwalters@0
|
168 neural encoding. It begins by rectifying and compressing the BMM; then
|
tomwalters@0
|
169 it applies adaptation in time and suppression across frequency. The
|
tomwalters@0
|
170 adaptation and suppression are coupled and they jointly sharpen
|
tomwalters@0
|
171 features like vowel formants in the compressed BMM representation.
|
tomwalters@0
|
172 Briefly, an adaptive threshold value is maintained for each channel
|
tomwalters@0
|
173 and updated at the sampling rate. The new value is the largest of a)
|
tomwalters@0
|
174 the previous value reduced by a fast-acting temporal decay factor, b)
|
tomwalters@0
|
175 the previous value reduced by a longer-term temporal decay factor, c)
|
tomwalters@0
|
176 the adapted level in the channel immediately above, reduced by a
|
tomwalters@0
|
177 frequency spread factor, or d) the adapted level in the channel
|
tomwalters@0
|
178 immediately below, reduced by the same frequency spread factor. The
|
tomwalters@0
|
179 mechanism produces output whenever the input exceeds the adaptive
|
tomwalters@0
|
180 threshold, and the output level is the difference between the input
|
tomwalters@0
|
181 and the adaptive threshold. The parameters that control the spread of
|
tomwalters@0
|
182 activity in time and frequency are options in AIM.
|
tomwalters@0
|
183
|
tomwalters@0
|
184 The Meddis (1988) module simulates the operation of an individual
|
tomwalters@0
|
185 inner haircell; specifically, it simulates the flow of
|
tomwalters@0
|
186 neurotransmitter across three reservoirs that are postulated to exist
|
tomwalters@0
|
187 in and around the haircell. The module reproduces important properties
|
tomwalters@0
|
188 of single afferent fibres such as two-component time adaptation and
|
tomwalters@0
|
189 phase-locking. The transmitter flow equations are solved using the
|
tomwalters@0
|
190 wave-digital-filter algorithm described in Gigure and Woodland
|
tomwalters@0
|
191 (1994a). There is one haircell simulator for each channel of the
|
tomwalters@0
|
192 filterbank. Options allow the user to shift the entire rate-intensity
|
tomwalters@0
|
193 function to a higher or lower level, and to specify the type of fibre
|
tomwalters@0
|
194 (medium or high spontaneous-rate).
|
tomwalters@0
|
195
|
tomwalters@0
|
196 The middle panels in Figures 2 and 3 show the NAPs obtained with
|
tomwalters@0
|
197 adaptive thresholding and the Meddis module in response to BMMs from
|
tomwalters@0
|
198 the gammatone and transmission line filterbanks of Figs 1 and 2,
|
tomwalters@0
|
199 respectively. The phase lag of the BMM is preserved in the NAP. The
|
tomwalters@0
|
200 positive half-cycles of the BMM waves have been sharpened in time, an
|
tomwalters@0
|
201 effect which is more obvious in the adaptive thresholding NAP.
|
tomwalters@0
|
202 Sharpening is also evident in the frequency dimension of the adaptive
|
tomwalters@0
|
203 thresholding NAP. The individual 'haircells' are not coupled across
|
tomwalters@0
|
204 channels in the Meddis module, and thus there is no frequency
|
tomwalters@0
|
205 sharpening in this case. The physiological NAP reveals that the
|
tomwalters@0
|
206 activity between glottal pulses in the high-frequency channels is due
|
tomwalters@0
|
207 to the strong sixth harmonic in the first formant of the vowel.
|
tomwalters@0
|
208
|
tomwalters@0
|
209
|
tomwalters@0
|
210 C. The temporal integration stage
|
tomwalters@0
|
211
|
tomwalters@0
|
212 Periodic sounds give rise to static, rather than oscillating,
|
tomwalters@0
|
213 perceptions indicating that temporal integration is applied to the NAP
|
tomwalters@0
|
214 in the production of our initial perception of a sound -- our auditory
|
tomwalters@0
|
215 image. Traditionally, auditory temporal integration is represented by
|
tomwalters@0
|
216 a simple leaky integration process and AIM provides a bank of lowpass
|
tomwalters@0
|
217 filters to enable the user to generate auditory spectra (Patterson,
|
tomwalters@0
|
218 1994a) and auditory spectrograms (Patterson et al., 1992b). However,
|
tomwalters@0
|
219 the leaky integrator removes the phase-locked fine structure observed
|
tomwalters@0
|
220 in the NAP, and this conflicts with perceptual data indicating that
|
tomwalters@0
|
221 the fine structure plays an important role in determining sound
|
tomwalters@0
|
222 quality and source identification (Patterson, 1994b; Patterson and
|
tomwalters@0
|
223 Akeroyd, 1995). As a result, AIM includes two modules which preserve
|
tomwalters@0
|
224 much of the time-interval information in the NAP during temporal
|
tomwalters@0
|
225 integration, and which produce a better representation of our auditory
|
tomwalters@0
|
226 images. In the functional version of AIM, this is accomplished with
|
tomwalters@0
|
227 strobed temporal integration (Patterson et al., 1992a,b); in the
|
tomwalters@0
|
228 physiological version, it is accomplished with a bank of
|
tomwalters@0
|
229 autocorrelators (Slaney and Lyon, 1990; Meddis and Hewitt, 1991).
|
tomwalters@0
|
230
|
tomwalters@0
|
231 In the case of strobed temporal integration (STI), a bank of delay
|
tomwalters@0
|
232 lines is used to form a buffer store for the NAP, one delay line per
|
tomwalters@0
|
233 channel, and as the NAP proceeds along the buffer it decays linearly
|
tomwalters@0
|
234 with time, at about 2.5 %/ms. Each channel of the buffer is assigned a
|
tomwalters@0
|
235 strobe unit which monitors activity in that channel looking for local
|
tomwalters@0
|
236 maxima in the stream of NAP pulses. When one is found, the unit
|
tomwalters@0
|
237 initiates temporal integration in that channel; that is, it transfers
|
tomwalters@0
|
238 a copy of the NAP at that instant to the corresponding channel of an
|
tomwalters@0
|
239 image buffer and adds it point-for-point with whatever is already
|
tomwalters@0
|
240 there. The local maximum itself is mapped to the 0-ms point in the
|
tomwalters@0
|
241 image buffer. The multi-channel version of this STI process produces
|
tomwalters@0
|
242 AIM's representation of our auditory image of a sound. Periodic and
|
tomwalters@0
|
243 quasi-periodic sounds cause regular strobing which leads to simulated
|
tomwalters@0
|
244 auditory images that are static, or nearly static, and which have the
|
tomwalters@0
|
245 same temporal resolution as the NAP. Dynamic sounds are represented
|
tomwalters@0
|
246 as a sequence of auditory image frames. If the rate of change in a
|
tomwalters@0
|
247 sound is not too rapid, as is diphthongs, features are seen to move
|
tomwalters@0
|
248 smoothly as the sound proceeds, much as characters move smoothly in
|
tomwalters@0
|
249 animated cartoons.
|
tomwalters@0
|
250
|
tomwalters@0
|
251 An alternative form of temporal integration is provided by the
|
tomwalters@0
|
252 correlogram (Slaney and Lyon, 1990; Meddis and Hewitt, 1991). It
|
tomwalters@0
|
253 extracts periodicity information and preserves intra-period fine
|
tomwalters@0
|
254 structure by autocorrelating each channel of the NAP. The correlogram
|
tomwalters@0
|
255 is the multi-channel version of this process. It was originally
|
tomwalters@0
|
256 introduced as a model of pitch perception (Licklider, 1951) with a
|
tomwalters@0
|
257 neural wiring diagram to illustrate that it was physiologically
|
tomwalters@0
|
258 plausible. To date, however, there is no physiological evidence for
|
tomwalters@0
|
259 autocorrelation in the auditory system, and the installation of the
|
tomwalters@0
|
260 module in the physiological route was a matter of convenience. The
|
tomwalters@0
|
261 current implementation is a recursive, or running, autocorrelation. A
|
tomwalters@0
|
262 functionally equivalent FFT-based method is also provided (Allerhand
|
tomwalters@0
|
263 and Patterson, 1992). A comparison of the correlogram in the bottom
|
tomwalters@0
|
264 panel of Fig. 3 with the auditory image in the bottom panel of Fig. 2
|
tomwalters@0
|
265 shows that the vowel structure is more symmetric in the correlogram
|
tomwalters@0
|
266 and there are larger level contrasts in the correlogram. It is not
|
tomwalters@0
|
267 yet known whether one of the representations is more realistic or more
|
tomwalters@0
|
268 useful. The present purpose is to note that the software package can
|
tomwalters@0
|
269 be used to compare auditory representations in a way not previously
|
tomwalters@0
|
270 possible.
|
tomwalters@0
|
271
|
tomwalters@0
|
272
|
tomwalters@0
|
273
|
tomwalters@0
|
274 II. THE SOFTWARE/HARDWARE PLATFORM
|
tomwalters@0
|
275
|
tomwalters@0
|
276 i. The software package: The code is distributed as a compressed
|
tomwalters@0
|
277 archive (in unix tar format), and can be obtained via ftp from the
|
tomwalters@0
|
278 address: ftp.mrc-apu.cam.ac.uk (Name=anonymous; Password=<your email
|
tomwalters@0
|
279 address>). All the software is contained in a single archive:
|
tomwalters@0
|
280 pub/aim/aim.tar.Z. The associated text file pub/aim/ReadMe contains
|
tomwalters@0
|
281 instructions for installing and compiling the software. The AIM
|
tomwalters@0
|
282 package consists of a makefile and several sub-directories. Five of
|
tomwalters@0
|
283 these (filter, glib, model, stitch and wdf) contain the C code for
|
tomwalters@0
|
284 AIM. An aim/tools directory contains C code for ancillary software
|
tomwalters@0
|
285 tools. These software tools are provided for pre/post-processing of
|
tomwalters@0
|
286 model input/output. A variety of functions are offered, including:
|
tomwalters@0
|
287 stimulus generation, signal processing, and data manipulation. An
|
tomwalters@0
|
288 aim/man directory contains on-line manual pages describing AIM and the
|
tomwalters@0
|
289 software tools. An aim/scripts directory contains demonstration
|
tomwalters@0
|
290 scripts for a guided tour through the model. Sounds used to test and
|
tomwalters@0
|
291 demonstrate the model are provided in the aim/waves directory. These
|
tomwalters@0
|
292 sounds were sampled at 20 kHz, and each sample is a 2-byte number in
|
tomwalters@0
|
293 little-endian byte order; a tool is provided to swap byte order when
|
tomwalters@0
|
294 necessary.
|
tomwalters@0
|
295
|
tomwalters@0
|
296 ii. System requirements: The software is written in C. The code
|
tomwalters@0
|
297 generated by the native C compilers included with Ultrix (version 4.3a
|
tomwalters@0
|
298 and above) and SunOS (version 4.1.3 and above) has been extensively
|
tomwalters@0
|
299 tested. The code from the GNU C compiler (version 2.5.7 and above) is
|
tomwalters@0
|
300 also reliable. The total disc usage of the AIM source code is about
|
tomwalters@0
|
301 700 kbytes. The package also includes 500 kbytes of sources for
|
tomwalters@0
|
302 ancillary software tools, and 200 kbytes of documentation. The
|
tomwalters@0
|
303 executable programs occupy about 1000 kbytes, and executable programs
|
tomwalters@0
|
304 for ancillary tools occupy 7000 kbytes. About 800 Kbytes of temporary
|
tomwalters@0
|
305 space are required for object files during compilation. The graphical
|
tomwalters@0
|
306 interface uses X11 (R4 and above) with either the OpenWindows or Motif
|
tomwalters@0
|
307 user interface. The programs can be compiled using the base Xlib
|
tomwalters@0
|
308 library (libX11.a), and will run on both 1- bit (mono) and multi-plane
|
tomwalters@0
|
309 (colour or greyscale) displays.
|
tomwalters@0
|
310
|
tomwalters@0
|
311 iii. Compilation and operation: The makefile includes targets to
|
tomwalters@0
|
312 compile the source code for AIM and the associated tools on a range of
|
tomwalters@0
|
313 machines (DEC, SUN, SGI, HP); the targets differ only in the pathnames
|
tomwalters@0
|
314 for the local X11 base library (libX11.a) and header files (X11/X.h
|
tomwalters@0
|
315 and X11/Xlib.h). AIM can be compiled without the display code if the
|
tomwalters@0
|
316 graphics interface is not required or if X11 is not available (make
|
tomwalters@0
|
317 noplot). The executable for AIM is called gen. Compilation also
|
tomwalters@0
|
318 generates symbolic links to gen, such as genbmm, gennap and gensai,
|
tomwalters@0
|
319 which are used to select the desired output (BMM, NAP or SAI). The
|
tomwalters@0
|
320 links and the executables for the aim/tools are installed in the
|
tomwalters@0
|
321 aim/bin directory after compilation. Options are specified as:
|
tomwalters@0
|
322 name=value on the command line; unspecified options are assigned
|
tomwalters@0
|
323 default values. The model output takes the form of binary data routed
|
tomwalters@0
|
324 by default to the model's graphical displays. Output can also be
|
tomwalters@0
|
325 routed to plotting hardware, or other post- processing software.
|
tomwalters@0
|
326
|
tomwalters@0
|
327
|
tomwalters@0
|
328
|
tomwalters@0
|
329 III. APPLICATIONS AND SUMMARY
|
tomwalters@0
|
330
|
tomwalters@0
|
331 In hearing research, the functional version of AIM has been used to
|
tomwalters@0
|
332 model phase perception (Patterson, 1987a), octave perception
|
tomwalters@0
|
333 (Patterson et al., 1993), and timbre perception (Patterson, 1994b).
|
tomwalters@0
|
334 The physiological version has been used to simulate cochlear hearing
|
tomwalters@0
|
335 loss (Gigure, Woodland, and Robinson, 1993; Gigure and Woodland,
|
tomwalters@0
|
336 1994b), and combination tones of cochlear origin (Gigure, Kunov, and
|
tomwalters@0
|
337 Smoorenburg, 1995). In speech research, the functional version has
|
tomwalters@0
|
338 been used to explain syllabic stress (Allerhand et al., 1992), and
|
tomwalters@0
|
339 both versions have been used as preprocessors for speech recognition
|
tomwalters@0
|
340 systems (e.g. Patterson, Anderson, and Allerhand, 1994; Gigure et
|
tomwalters@0
|
341 al., 1993). In summary, the AIM software package provides a modular
|
tomwalters@0
|
342 architecture for time- domain computational studies of peripheral
|
tomwalters@0
|
343 auditory processing.
|
tomwalters@0
|
344
|
tomwalters@0
|
345
|
tomwalters@0
|
346 * Instructions for acquiring the software package electronically are
|
tomwalters@0
|
347 presented in Section II. This document refers to AIM R7 which is the
|
tomwalters@0
|
348 first official release.
|
tomwalters@0
|
349
|
tomwalters@0
|
350
|
tomwalters@0
|
351 ACKNOWLEDGEMENTS
|
tomwalters@0
|
352
|
tomwalters@0
|
353 The gammatone filterbank, adaptive thresholding, and much of the
|
tomwalters@0
|
354 software platform were written by John Holdsworth; the options handler
|
tomwalters@0
|
355 is by Paul Manson, and the revised STI module by Jay Datta. Michael
|
tomwalters@0
|
356 Akeroyd extended the postscript facilities and developed the xreview
|
tomwalters@0
|
357 routine for auditory image cartoons. The software development was
|
tomwalters@0
|
358 supported by grants from DRA Farnborough (U.K.), Esprit BR 3207 (EEC),
|
tomwalters@0
|
359 and the Hearing Research Trust (U.K.). We thank Malcolm Slaney and
|
tomwalters@0
|
360 Michael Akeroyd for helpful comments on an earlier version of the
|
tomwalters@0
|
361 paper.
|
tomwalters@0
|
362
|
tomwalters@0
|
363
|
tomwalters@0
|
364 Allerhand, M., and Patterson, R.D. (1992). "Correlograms and auditory
|
tomwalters@0
|
365 images," Proc. Inst. Acoust. 14, 281-288.
|
tomwalters@0
|
366
|
tomwalters@0
|
367 Allerhand, M., Butterfield, S., Cutler, A., and Patterson, R.D.
|
tomwalters@0
|
368 (1992). "Assessing syllable strength via an auditory model," Proc.
|
tomwalters@0
|
369 Inst. Acoust. 14, 297-304.
|
tomwalters@0
|
370
|
tomwalters@0
|
371 Assmann, P.F., and Summerfield, Q. (1990). "Modelling the perception
|
tomwalters@0
|
372 of concurrent vowels: Vowels with different fundamental frequencies,"
|
tomwalters@0
|
373 J. Acoust. Soc. Am., 88, 680- 697.
|
tomwalters@0
|
374
|
tomwalters@0
|
375 Brown, G.J., and Cooke, M. (1994) "Computational auditory scene
|
tomwalters@0
|
376 analysis," Computer Speech and Language 8, 297-336.
|
tomwalters@0
|
377
|
tomwalters@0
|
378 Gigure, C., Woodland, P.C., and Robinson, A.J. (1993). "Application
|
tomwalters@0
|
379 of an auditory model to the computer simulation of hearing impairment:
|
tomwalters@0
|
380 Preliminary results," Can. Acoust. 21, 135-136.
|
tomwalters@0
|
381
|
tomwalters@0
|
382 Gigure, C., and Woodland, P.C. (1994a). "A computational model of
|
tomwalters@0
|
383 the auditory periphery for speech and hearing research. I. Ascending
|
tomwalters@0
|
384 path," J. Acoust. Soc. Am. 95, 331-342.
|
tomwalters@0
|
385
|
tomwalters@0
|
386 Gigure, C., and Woodland, P.C. (1994b). "A computational model of
|
tomwalters@0
|
387 the auditory periphery for speech and hearing research. II: Descending
|
tomwalters@0
|
388 paths,'' J. Acoust. Soc. Am. 95, 343-349.
|
tomwalters@0
|
389
|
tomwalters@0
|
390 Gigure, C., Kunov, H., and Smoorenburg, G.F. (1995). "Computational
|
tomwalters@0
|
391 modelling of psycho-acoustic combination tones and distortion-product
|
tomwalters@0
|
392 otoacoustic emissions," 15th Int. Cong. on Acoustics, Trondheim
|
tomwalters@0
|
393 (Norway), 26-30 June.
|
tomwalters@0
|
394
|
tomwalters@0
|
395 Glasberg, B.R., and Moore, B.C.J. (1990). "Derivation of auditory
|
tomwalters@0
|
396 filter shapes from notched-noise data," Hear. Res. 47, 103-38.
|
tomwalters@0
|
397
|
tomwalters@0
|
398 Greenwood, D.D. (1990). "A cochlear frequency-position function for
|
tomwalters@0
|
399 several species - 29 years later," J. Acoust. Soc. Am. 87, 2592-2605.
|
tomwalters@0
|
400
|
tomwalters@0
|
401 Holdsworth, J.W., and Patterson, R.D. (1991). "Analysis of
|
tomwalters@0
|
402 waveforms," UK Patent No. GB 2-234-078-A (23.1.91). London: UK
|
tomwalters@0
|
403 Patent Office.
|
tomwalters@0
|
404
|
tomwalters@0
|
405 Licklider, J. C. R. (1951). "A duplex theory of pitch perception,"
|
tomwalters@0
|
406 Experientia, 7, 128- 133.
|
tomwalters@0
|
407
|
tomwalters@0
|
408 Lutman, M.E. and Martin, A.M. (1979). "Development of an
|
tomwalters@0
|
409 electroacoustic analogue model of the middle ear and acoustic reflex,"
|
tomwalters@0
|
410 J. Sound. Vib. 64, 133-157.
|
tomwalters@0
|
411
|
tomwalters@0
|
412 Meddis, R. (1988). "Simulation of auditory-neural transduction:
|
tomwalters@0
|
413 Further studies," J. Acoust. Soc. Am. 83, 1056-1063.
|
tomwalters@0
|
414
|
tomwalters@0
|
415 Meddis, R. and Hewitt, M.J. (1991). "Modelling the perception of
|
tomwalters@0
|
416 concurrent vowels with different fundamental frequencies," J. Acoust.
|
tomwalters@0
|
417 Soc. Am. 91, 233-45.
|
tomwalters@0
|
418
|
tomwalters@0
|
419 Patterson, R.D. (1987). "A pulse ribbon model of monaural phase
|
tomwalters@0
|
420 perception," J. Acoust. Soc. Am., 82, 1560-1586.
|
tomwalters@0
|
421
|
tomwalters@0
|
422 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
|
tomwalters@0
|
423 J. Acoust. Soc. Am. 96, 1409-1418.
|
tomwalters@0
|
424
|
tomwalters@0
|
425 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
|
tomwalters@0
|
426 models." J. Acoust. Soc. Am. 96, 1419-1428.
|
tomwalters@0
|
427
|
tomwalters@0
|
428 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
|
tomwalters@0
|
429 sound quality," in: Advances in Hearing Research: Proceedings of the
|
tomwalters@0
|
430 10th International Symposium on Hearing, edited by G. Manley, G.
|
tomwalters@0
|
431 Klump, C. Koppl, H. Fastl, & H. Oeckinghaus, World Scientific,
|
tomwalters@0
|
432 Singapore, (in press).
|
tomwalters@0
|
433
|
tomwalters@0
|
434 Patterson, R.D., Anderson, T., and Allerhand, M. (1994). "The auditory
|
tomwalters@0
|
435 image model as a preprocessor for spoken language," in Proc. Third
|
tomwalters@0
|
436 ICSLP, Yokohama, Japan 1395- 1398.
|
tomwalters@0
|
437
|
tomwalters@0
|
438 Patterson, R.D., Milroy, R. and Allerhand, M. (1993). "What is the
|
tomwalters@0
|
439 octave of a harmonically rich note?" In: Proc. 2nd Int. Conf. on Music
|
tomwalters@0
|
440 and the Cognitive Sciences, edited by I. Cross and I Deliege (Harwood,
|
tomwalters@0
|
441 Switzerland) 69-81.
|
tomwalters@0
|
442
|
tomwalters@0
|
443 Patterson, R.D. and B.C.J. Moore (1986). "Auditory filters and
|
tomwalters@0
|
444 excitation patterns as representations of frequency resolution," in
|
tomwalters@0
|
445 Frequency Selectivity in Hearing, edited by B. C. J. Moore, (Academic,
|
tomwalters@0
|
446 London) pp. 123-177.
|
tomwalters@0
|
447
|
tomwalters@0
|
448 Patterson, R.D., Holdsworth, J. and Allerhand M. (1992) "Auditory
|
tomwalters@0
|
449 Models as preprocessors for speech recognition," In: The Auditory
|
tomwalters@0
|
450 Processing of Speech: From the auditory periphery to words, edited by
|
tomwalters@0
|
451 M. E. H. Schouten (Mouton de Gruyter, Berlin) 67-83.
|
tomwalters@0
|
452
|
tomwalters@0
|
453 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C.,
|
tomwalters@0
|
454 and Allerhand M. (1992) "Complex sounds and auditory images," In:
|
tomwalters@0
|
455 Auditory physiology and perception, edited by Y Cazals, L. Demany, and
|
tomwalters@0
|
456 K. Horner (Pergamon, Oxford) 429-446.
|
tomwalters@0
|
457
|
tomwalters@0
|
458 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
|
tomwalters@0
|
459 Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing,
|
tomwalters@0
|
460 Albuquerque, New Mexico, April 1990.
|
tomwalters@0
|
461
|
tomwalters@0
|
462
|
tomwalters@0
|
463 Figure 1. The three-stage structure of the AIM software package.
|
tomwalters@0
|
464 Left-hand column: functional route, right-hand column: physiological
|
tomwalters@0
|
465 route. For each module, the figure shows the function (bold type), the
|
tomwalters@0
|
466 implementation (in the rectangle), and the simulation it produces
|
tomwalters@0
|
467 (italics).
|
tomwalters@0
|
468
|
tomwalters@0
|
469 Figure 2. Responses of the model to the vowel in 'hat' processed
|
tomwalters@0
|
470 through the functional route: (top) basilar membrane motion, (middle)
|
tomwalters@0
|
471 neural activity pattern, and (bottom) auditory image.
|
tomwalters@0
|
472
|
tomwalters@0
|
473 Figure 3. Responses of the model to the vowel in 'hat' processed
|
tomwalters@0
|
474 through the physiological route: (top) basilar membrane motion,
|
tomwalters@0
|
475 (middle) neural activity pattern, and (bottom) autocorrelogram image.
|
tomwalters@0
|
476
|