Mercurial > hg > aim92
comparison docs/PAG95.doc @ 0:5242703e91d3 tip
Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author | tomwalters |
---|---|
date | Fri, 20 May 2011 15:19:45 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5242703e91d3 |
---|---|
1 Revised for JASA, 3 April 95 1 | |
2 | |
3 | |
4 Time-domain modelling of peripheral auditory processing: | |
5 A modular architecture and a software platform* | |
6 | |
7 Roy D. Patterson and Mike H. Allerhand | |
8 MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, UK | |
9 | |
10 Christian Gigure Laboratory of Experimental Audiology, University | |
11 Hospital Utrecht, 3508 GA Utrecht, The Netherlands | |
12 | |
13 (Received December, 1994) (Revised 31 March 1995) | |
14 | |
15 A software package with a modular architecture has been developed to | |
16 support perceptual modelling of the fine-grain spectro-temporal | |
17 information observed in the auditory nerve. The package contains both | |
18 functional and physiological modules to simulate auditory spectral | |
19 analysis, neural encoding and temporal integration, including new | |
20 forms of periodicity-sensitive temporal integration that generate | |
21 stabilized auditory images. Combinations of the modules enable the | |
22 user to approximate a wide variety of existing, time-domain, auditory | |
23 models. Sequences of auditory images can be replayed to produce | |
24 cartoons of auditory perceptions that illustrate the dynamic response | |
25 of the auditory system to everyday sounds. | |
26 | |
27 PACS numbers: 43.64.Bt, 43.66.Ba, 43.71.An | |
28 | |
29 Running head: Auditory Image Model Software | |
30 | |
31 | |
32 INTRODUCTION | |
33 | |
34 Several years ago, we developed a functional model of the cochlea to | |
35 simulate the phase-locked activity that complex sounds produce in the | |
36 auditory nerve. The purpose was to investigate the role of the | |
37 fine-grain timing information in auditory perception generally | |
38 (Patterson et al., 1992a; Patterson and Akeroyd, 1995), and in speech | |
39 perception in particular (Patterson, Holdsworth and Allerhand, 1992b). | |
40 The architecture of the resulting Auditory Image Model (AIM) is shown | |
41 in the left-hand column of Fig. 1. The responses of the three modules | |
42 to the vowel in 'hat' are shown in the three panels of Fig. 2. | |
43 Briefly, the spectral analysis stage converts the sound wave into the | |
44 model's representation of basilar membrane motion (BMM). For the vowel | |
45 in 'hat', each glottal cycle generates a version of the basic vowel | |
46 structure in the BMM (top panel). The neural encoding stage | |
47 stabilizes the BMM in level and sharpens features like vowel formants, | |
48 to produce a simulation of the neural activity pattern (NAP) produced | |
49 by the sound in the auditory nerve (middle panel). The temporal | |
50 integration stage stabilizes the repeating structure in the NAP and | |
51 produces a simulation of our perception of the vowel (bottom panel), | |
52 referred to as the auditory image. Sequences of simulated images can | |
53 be generated at regular intervals and replayed as an animated cartoon | |
54 to show the dynamic behaviour of the auditory images produced by | |
55 everyday sounds. | |
56 | |
57 An earlier version of the AIM software was made available to | |
58 collaborators via the Internet. From there it spread to the speech and | |
59 music communities, indicating a more general interest in auditory | |
60 models than we had originally anticipated. This has prompted us to | |
61 prepare documentation and a formal release of the software (AIM R7). | |
62 | |
63 A number of users wanted to compare the outputs from the functional | |
64 model, which is almost level independent, with those from | |
65 physiological models of the cochlea, which are fundamentally level | |
66 dependent. Others wanted to compare the auditory images produced by | |
67 strobed temporal integration with correlograms. As a result, we have | |
68 installed alternative modules for each of the three main stages as | |
69 shown in the right-hand column of Fig. 1. The alternative spectral | |
70 analysis module is a non-linear, transmission line filterbank based on | |
71 Gigure and Woodland (1994a). The neural encoding module is based on | |
72 the inner haircell model of Meddis (1988). The temporal integration | |
73 module generates correlograms like those of Slaney and Lyon (1990) or | |
74 Meddis and Hewitt (1991), using the algorithm proposed by Allerhand | |
75 and Patterson (1992). The responses of the three modules to the vowel | |
76 in 'hat' are shown in Fig. 3 for the case where the level of the vowel | |
77 is 60 dB SPL. The patterns are broadly similar to those of the | |
78 functional modules but the details differ, particularly at the output | |
79 of the third stage. The differences grow more pronounced when the | |
80 level of the vowel is reduced to 30 dB SPL or increased to 90 dB SPL. | |
81 Figures 2 and 3 together illustrate how the software can be used to | |
82 compare and contrast different auditory models. The new modules also | |
83 open the way to time-domain simulation of hearing impairment and | |
84 distortion products of cochlear origin. | |
85 | |
86 Switches were installed to enable the user to shift from the | |
87 functional to the physiological version of AIM at the output of each | |
88 stage of the model. This architecture enables the system to implement | |
89 other popular auditory models such as the gammatone- filterbank, | |
90 Meddis-haircell, correlogram models proposed by Assmann and | |
91 Summerfield (1990), Meddis and Hewitt (1991), and Brown and Cooke | |
92 (1994). The remainder of this letter describes the integrated software | |
93 package with emphasis on the functional and physiological routes, and | |
94 on practical aspects of obtaining the software package.* | |
95 | |
96 | |
97 | |
98 I. THE AUDITORY IMAGE MODEL | |
99 | |
100 A. The spectral analysis stage | |
101 | |
102 Spectral analysis is performed by a bank of auditory filters which | |
103 converts a digitized wave into an array of filtered waves like those | |
104 shown in the top panels of Figs 2 and 3. The set of waves is AIM's | |
105 representation of basilar membrane motion. The software distributes | |
106 the filters linearly along a frequency scale measured in Equivalent | |
107 Rectangular Bandwidths (ERB's). The ERB scale was proposed by Glasberg | |
108 and Moore (1990) based on physiological research summarized in | |
109 Greenwood (1990) and psychoacoustic research summarized in Patterson | |
110 and Moore (1986). The constants of the ERB function can also be set to | |
111 produce a reasonable approximation to the Bark scale. Options enable | |
112 the user to specify the number of channels in the filterbank and the | |
113 minimum and maximum filter center frequencies. | |
114 | |
115 AIM provides both a functional auditory filter and a physiological | |
116 auditory filter for generating the BMM: the former is a linear, | |
117 gammatone filter (Patterson et al., 1992a); the latter is a | |
118 non-linear, transmission-line filter (Gigure and Woodland, 1994a). | |
119 The impulse response of the gammatone filter provides an excellent fit | |
120 to the impulse response of primary auditory neurons in cats, and its | |
121 amplitude characteristic is very similar to that of the 'roex' filter | |
122 commonly used to represent the human auditory filter. The motivation | |
123 for the gammatone filterbank and the available implementations are | |
124 summarized in Patterson (1994a). The input wave is passed through an | |
125 optional middle-ear filter adapted from Lutman and Martin (1979). | |
126 | |
127 In the physiological version, a 'wave digital filter' is used to | |
128 implement the classical, one-dimensional, transmission-line | |
129 approximation to cochlear hydrodynamics. A feedback circuit | |
130 representing the fast motile response of the outer haircells generates | |
131 level- dependent basilar membrane motion (Gigure and Woodland, | |
132 1994a). The filterbank generates combination tones of the type | |
133 f1-n(f2-f1) which propagate to the appropriate channel, and it has the | |
134 potential to generate cochlear echoes. Options enable the user to | |
135 customize the transmission line filter by specifying the feedback gain | |
136 and saturation level of the outer haircell circuit. The middle ear | |
137 filter forms an integral part of the simulation in this case. | |
138 Together, it and the transmission line filterbank provide a | |
139 bi-directional model of auditory spectral analysis. | |
140 | |
141 The upper panels of Figs 2 and 3 show the responses of the two | |
142 filterbanks to the vowel in 'hat'. They have 75 channels covering the | |
143 frequency range 100 to 6000 Hz (3.3 to 30.6 ERB's). In the | |
144 high-frequency channels, the filters are broad and the glottal pulses | |
145 generate impulse responses which decay relatively quickly. In the | |
146 low-frequency channels, the filters are narrow and so they resolve | |
147 individual continuous harmonics. The rightward skew in the | |
148 low-frequency channels is the 'phase lag,' or 'propagation delay,' of | |
149 the cochlea, which arises because the narrower low-frequency filters | |
150 respond more slowly to input. The transmission line filterbank shows | |
151 more ringing in the valleys than the gammatone filterbank because of | |
152 its dynamic signal compression; as amplitude decreases the damping of | |
153 the basilar membrane is reduced to increase sensitivity and frequency | |
154 resolution. | |
155 | |
156 | |
157 B. The neural encoding stage | |
158 | |
159 The second stage of AIM simulates the mechanical/neural transduction | |
160 process performed by the inner haircells. It converts the BMM into a | |
161 neural activity pattern (NAP), which is AIM's representation of the | |
162 afferent activity in the auditory nerve. Two alternative simulations | |
163 are provided for generating the NAP: a bank of two-dimensional | |
164 adaptive- thresholding units (Holdsworth and Patterson, 1993), or a | |
165 bank of inner haircell simulators (Meddis, 1988). | |
166 | |
167 The adaptive thresholding mechanism is a functional representation of | |
168 neural encoding. It begins by rectifying and compressing the BMM; then | |
169 it applies adaptation in time and suppression across frequency. The | |
170 adaptation and suppression are coupled and they jointly sharpen | |
171 features like vowel formants in the compressed BMM representation. | |
172 Briefly, an adaptive threshold value is maintained for each channel | |
173 and updated at the sampling rate. The new value is the largest of a) | |
174 the previous value reduced by a fast-acting temporal decay factor, b) | |
175 the previous value reduced by a longer-term temporal decay factor, c) | |
176 the adapted level in the channel immediately above, reduced by a | |
177 frequency spread factor, or d) the adapted level in the channel | |
178 immediately below, reduced by the same frequency spread factor. The | |
179 mechanism produces output whenever the input exceeds the adaptive | |
180 threshold, and the output level is the difference between the input | |
181 and the adaptive threshold. The parameters that control the spread of | |
182 activity in time and frequency are options in AIM. | |
183 | |
184 The Meddis (1988) module simulates the operation of an individual | |
185 inner haircell; specifically, it simulates the flow of | |
186 neurotransmitter across three reservoirs that are postulated to exist | |
187 in and around the haircell. The module reproduces important properties | |
188 of single afferent fibres such as two-component time adaptation and | |
189 phase-locking. The transmitter flow equations are solved using the | |
190 wave-digital-filter algorithm described in Gigure and Woodland | |
191 (1994a). There is one haircell simulator for each channel of the | |
192 filterbank. Options allow the user to shift the entire rate-intensity | |
193 function to a higher or lower level, and to specify the type of fibre | |
194 (medium or high spontaneous-rate). | |
195 | |
196 The middle panels in Figures 2 and 3 show the NAPs obtained with | |
197 adaptive thresholding and the Meddis module in response to BMMs from | |
198 the gammatone and transmission line filterbanks of Figs 1 and 2, | |
199 respectively. The phase lag of the BMM is preserved in the NAP. The | |
200 positive half-cycles of the BMM waves have been sharpened in time, an | |
201 effect which is more obvious in the adaptive thresholding NAP. | |
202 Sharpening is also evident in the frequency dimension of the adaptive | |
203 thresholding NAP. The individual 'haircells' are not coupled across | |
204 channels in the Meddis module, and thus there is no frequency | |
205 sharpening in this case. The physiological NAP reveals that the | |
206 activity between glottal pulses in the high-frequency channels is due | |
207 to the strong sixth harmonic in the first formant of the vowel. | |
208 | |
209 | |
210 C. The temporal integration stage | |
211 | |
212 Periodic sounds give rise to static, rather than oscillating, | |
213 perceptions indicating that temporal integration is applied to the NAP | |
214 in the production of our initial perception of a sound -- our auditory | |
215 image. Traditionally, auditory temporal integration is represented by | |
216 a simple leaky integration process and AIM provides a bank of lowpass | |
217 filters to enable the user to generate auditory spectra (Patterson, | |
218 1994a) and auditory spectrograms (Patterson et al., 1992b). However, | |
219 the leaky integrator removes the phase-locked fine structure observed | |
220 in the NAP, and this conflicts with perceptual data indicating that | |
221 the fine structure plays an important role in determining sound | |
222 quality and source identification (Patterson, 1994b; Patterson and | |
223 Akeroyd, 1995). As a result, AIM includes two modules which preserve | |
224 much of the time-interval information in the NAP during temporal | |
225 integration, and which produce a better representation of our auditory | |
226 images. In the functional version of AIM, this is accomplished with | |
227 strobed temporal integration (Patterson et al., 1992a,b); in the | |
228 physiological version, it is accomplished with a bank of | |
229 autocorrelators (Slaney and Lyon, 1990; Meddis and Hewitt, 1991). | |
230 | |
231 In the case of strobed temporal integration (STI), a bank of delay | |
232 lines is used to form a buffer store for the NAP, one delay line per | |
233 channel, and as the NAP proceeds along the buffer it decays linearly | |
234 with time, at about 2.5 %/ms. Each channel of the buffer is assigned a | |
235 strobe unit which monitors activity in that channel looking for local | |
236 maxima in the stream of NAP pulses. When one is found, the unit | |
237 initiates temporal integration in that channel; that is, it transfers | |
238 a copy of the NAP at that instant to the corresponding channel of an | |
239 image buffer and adds it point-for-point with whatever is already | |
240 there. The local maximum itself is mapped to the 0-ms point in the | |
241 image buffer. The multi-channel version of this STI process produces | |
242 AIM's representation of our auditory image of a sound. Periodic and | |
243 quasi-periodic sounds cause regular strobing which leads to simulated | |
244 auditory images that are static, or nearly static, and which have the | |
245 same temporal resolution as the NAP. Dynamic sounds are represented | |
246 as a sequence of auditory image frames. If the rate of change in a | |
247 sound is not too rapid, as is diphthongs, features are seen to move | |
248 smoothly as the sound proceeds, much as characters move smoothly in | |
249 animated cartoons. | |
250 | |
251 An alternative form of temporal integration is provided by the | |
252 correlogram (Slaney and Lyon, 1990; Meddis and Hewitt, 1991). It | |
253 extracts periodicity information and preserves intra-period fine | |
254 structure by autocorrelating each channel of the NAP. The correlogram | |
255 is the multi-channel version of this process. It was originally | |
256 introduced as a model of pitch perception (Licklider, 1951) with a | |
257 neural wiring diagram to illustrate that it was physiologically | |
258 plausible. To date, however, there is no physiological evidence for | |
259 autocorrelation in the auditory system, and the installation of the | |
260 module in the physiological route was a matter of convenience. The | |
261 current implementation is a recursive, or running, autocorrelation. A | |
262 functionally equivalent FFT-based method is also provided (Allerhand | |
263 and Patterson, 1992). A comparison of the correlogram in the bottom | |
264 panel of Fig. 3 with the auditory image in the bottom panel of Fig. 2 | |
265 shows that the vowel structure is more symmetric in the correlogram | |
266 and there are larger level contrasts in the correlogram. It is not | |
267 yet known whether one of the representations is more realistic or more | |
268 useful. The present purpose is to note that the software package can | |
269 be used to compare auditory representations in a way not previously | |
270 possible. | |
271 | |
272 | |
273 | |
274 II. THE SOFTWARE/HARDWARE PLATFORM | |
275 | |
276 i. The software package: The code is distributed as a compressed | |
277 archive (in unix tar format), and can be obtained via ftp from the | |
278 address: ftp.mrc-apu.cam.ac.uk (Name=anonymous; Password=<your email | |
279 address>). All the software is contained in a single archive: | |
280 pub/aim/aim.tar.Z. The associated text file pub/aim/ReadMe contains | |
281 instructions for installing and compiling the software. The AIM | |
282 package consists of a makefile and several sub-directories. Five of | |
283 these (filter, glib, model, stitch and wdf) contain the C code for | |
284 AIM. An aim/tools directory contains C code for ancillary software | |
285 tools. These software tools are provided for pre/post-processing of | |
286 model input/output. A variety of functions are offered, including: | |
287 stimulus generation, signal processing, and data manipulation. An | |
288 aim/man directory contains on-line manual pages describing AIM and the | |
289 software tools. An aim/scripts directory contains demonstration | |
290 scripts for a guided tour through the model. Sounds used to test and | |
291 demonstrate the model are provided in the aim/waves directory. These | |
292 sounds were sampled at 20 kHz, and each sample is a 2-byte number in | |
293 little-endian byte order; a tool is provided to swap byte order when | |
294 necessary. | |
295 | |
296 ii. System requirements: The software is written in C. The code | |
297 generated by the native C compilers included with Ultrix (version 4.3a | |
298 and above) and SunOS (version 4.1.3 and above) has been extensively | |
299 tested. The code from the GNU C compiler (version 2.5.7 and above) is | |
300 also reliable. The total disc usage of the AIM source code is about | |
301 700 kbytes. The package also includes 500 kbytes of sources for | |
302 ancillary software tools, and 200 kbytes of documentation. The | |
303 executable programs occupy about 1000 kbytes, and executable programs | |
304 for ancillary tools occupy 7000 kbytes. About 800 Kbytes of temporary | |
305 space are required for object files during compilation. The graphical | |
306 interface uses X11 (R4 and above) with either the OpenWindows or Motif | |
307 user interface. The programs can be compiled using the base Xlib | |
308 library (libX11.a), and will run on both 1- bit (mono) and multi-plane | |
309 (colour or greyscale) displays. | |
310 | |
311 iii. Compilation and operation: The makefile includes targets to | |
312 compile the source code for AIM and the associated tools on a range of | |
313 machines (DEC, SUN, SGI, HP); the targets differ only in the pathnames | |
314 for the local X11 base library (libX11.a) and header files (X11/X.h | |
315 and X11/Xlib.h). AIM can be compiled without the display code if the | |
316 graphics interface is not required or if X11 is not available (make | |
317 noplot). The executable for AIM is called gen. Compilation also | |
318 generates symbolic links to gen, such as genbmm, gennap and gensai, | |
319 which are used to select the desired output (BMM, NAP or SAI). The | |
320 links and the executables for the aim/tools are installed in the | |
321 aim/bin directory after compilation. Options are specified as: | |
322 name=value on the command line; unspecified options are assigned | |
323 default values. The model output takes the form of binary data routed | |
324 by default to the model's graphical displays. Output can also be | |
325 routed to plotting hardware, or other post- processing software. | |
326 | |
327 | |
328 | |
329 III. APPLICATIONS AND SUMMARY | |
330 | |
331 In hearing research, the functional version of AIM has been used to | |
332 model phase perception (Patterson, 1987a), octave perception | |
333 (Patterson et al., 1993), and timbre perception (Patterson, 1994b). | |
334 The physiological version has been used to simulate cochlear hearing | |
335 loss (Gigure, Woodland, and Robinson, 1993; Gigure and Woodland, | |
336 1994b), and combination tones of cochlear origin (Gigure, Kunov, and | |
337 Smoorenburg, 1995). In speech research, the functional version has | |
338 been used to explain syllabic stress (Allerhand et al., 1992), and | |
339 both versions have been used as preprocessors for speech recognition | |
340 systems (e.g. Patterson, Anderson, and Allerhand, 1994; Gigure et | |
341 al., 1993). In summary, the AIM software package provides a modular | |
342 architecture for time- domain computational studies of peripheral | |
343 auditory processing. | |
344 | |
345 | |
346 * Instructions for acquiring the software package electronically are | |
347 presented in Section II. This document refers to AIM R7 which is the | |
348 first official release. | |
349 | |
350 | |
351 ACKNOWLEDGEMENTS | |
352 | |
353 The gammatone filterbank, adaptive thresholding, and much of the | |
354 software platform were written by John Holdsworth; the options handler | |
355 is by Paul Manson, and the revised STI module by Jay Datta. Michael | |
356 Akeroyd extended the postscript facilities and developed the xreview | |
357 routine for auditory image cartoons. The software development was | |
358 supported by grants from DRA Farnborough (U.K.), Esprit BR 3207 (EEC), | |
359 and the Hearing Research Trust (U.K.). We thank Malcolm Slaney and | |
360 Michael Akeroyd for helpful comments on an earlier version of the | |
361 paper. | |
362 | |
363 | |
364 Allerhand, M., and Patterson, R.D. (1992). "Correlograms and auditory | |
365 images," Proc. Inst. Acoust. 14, 281-288. | |
366 | |
367 Allerhand, M., Butterfield, S., Cutler, A., and Patterson, R.D. | |
368 (1992). "Assessing syllable strength via an auditory model," Proc. | |
369 Inst. Acoust. 14, 297-304. | |
370 | |
371 Assmann, P.F., and Summerfield, Q. (1990). "Modelling the perception | |
372 of concurrent vowels: Vowels with different fundamental frequencies," | |
373 J. Acoust. Soc. Am., 88, 680- 697. | |
374 | |
375 Brown, G.J., and Cooke, M. (1994) "Computational auditory scene | |
376 analysis," Computer Speech and Language 8, 297-336. | |
377 | |
378 Gigure, C., Woodland, P.C., and Robinson, A.J. (1993). "Application | |
379 of an auditory model to the computer simulation of hearing impairment: | |
380 Preliminary results," Can. Acoust. 21, 135-136. | |
381 | |
382 Gigure, C., and Woodland, P.C. (1994a). "A computational model of | |
383 the auditory periphery for speech and hearing research. I. Ascending | |
384 path," J. Acoust. Soc. Am. 95, 331-342. | |
385 | |
386 Gigure, C., and Woodland, P.C. (1994b). "A computational model of | |
387 the auditory periphery for speech and hearing research. II: Descending | |
388 paths,'' J. Acoust. Soc. Am. 95, 343-349. | |
389 | |
390 Gigure, C., Kunov, H., and Smoorenburg, G.F. (1995). "Computational | |
391 modelling of psycho-acoustic combination tones and distortion-product | |
392 otoacoustic emissions," 15th Int. Cong. on Acoustics, Trondheim | |
393 (Norway), 26-30 June. | |
394 | |
395 Glasberg, B.R., and Moore, B.C.J. (1990). "Derivation of auditory | |
396 filter shapes from notched-noise data," Hear. Res. 47, 103-38. | |
397 | |
398 Greenwood, D.D. (1990). "A cochlear frequency-position function for | |
399 several species - 29 years later," J. Acoust. Soc. Am. 87, 2592-2605. | |
400 | |
401 Holdsworth, J.W., and Patterson, R.D. (1991). "Analysis of | |
402 waveforms," UK Patent No. GB 2-234-078-A (23.1.91). London: UK | |
403 Patent Office. | |
404 | |
405 Licklider, J. C. R. (1951). "A duplex theory of pitch perception," | |
406 Experientia, 7, 128- 133. | |
407 | |
408 Lutman, M.E. and Martin, A.M. (1979). "Development of an | |
409 electroacoustic analogue model of the middle ear and acoustic reflex," | |
410 J. Sound. Vib. 64, 133-157. | |
411 | |
412 Meddis, R. (1988). "Simulation of auditory-neural transduction: | |
413 Further studies," J. Acoust. Soc. Am. 83, 1056-1063. | |
414 | |
415 Meddis, R. and Hewitt, M.J. (1991). "Modelling the perception of | |
416 concurrent vowels with different fundamental frequencies," J. Acoust. | |
417 Soc. Am. 91, 233-45. | |
418 | |
419 Patterson, R.D. (1987). "A pulse ribbon model of monaural phase | |
420 perception," J. Acoust. Soc. Am., 82, 1560-1586. | |
421 | |
422 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models," | |
423 J. Acoust. Soc. Am. 96, 1409-1418. | |
424 | |
425 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval | |
426 models." J. Acoust. Soc. Am. 96, 1419-1428. | |
427 | |
428 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and | |
429 sound quality," in: Advances in Hearing Research: Proceedings of the | |
430 10th International Symposium on Hearing, edited by G. Manley, G. | |
431 Klump, C. Koppl, H. Fastl, & H. Oeckinghaus, World Scientific, | |
432 Singapore, (in press). | |
433 | |
434 Patterson, R.D., Anderson, T., and Allerhand, M. (1994). "The auditory | |
435 image model as a preprocessor for spoken language," in Proc. Third | |
436 ICSLP, Yokohama, Japan 1395- 1398. | |
437 | |
438 Patterson, R.D., Milroy, R. and Allerhand, M. (1993). "What is the | |
439 octave of a harmonically rich note?" In: Proc. 2nd Int. Conf. on Music | |
440 and the Cognitive Sciences, edited by I. Cross and I Deliege (Harwood, | |
441 Switzerland) 69-81. | |
442 | |
443 Patterson, R.D. and B.C.J. Moore (1986). "Auditory filters and | |
444 excitation patterns as representations of frequency resolution," in | |
445 Frequency Selectivity in Hearing, edited by B. C. J. Moore, (Academic, | |
446 London) pp. 123-177. | |
447 | |
448 Patterson, R.D., Holdsworth, J. and Allerhand M. (1992) "Auditory | |
449 Models as preprocessors for speech recognition," In: The Auditory | |
450 Processing of Speech: From the auditory periphery to words, edited by | |
451 M. E. H. Schouten (Mouton de Gruyter, Berlin) 67-83. | |
452 | |
453 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., | |
454 and Allerhand M. (1992) "Complex sounds and auditory images," In: | |
455 Auditory physiology and perception, edited by Y Cazals, L. Demany, and | |
456 K. Horner (Pergamon, Oxford) 429-446. | |
457 | |
458 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in | |
459 Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, | |
460 Albuquerque, New Mexico, April 1990. | |
461 | |
462 | |
463 Figure 1. The three-stage structure of the AIM software package. | |
464 Left-hand column: functional route, right-hand column: physiological | |
465 route. For each module, the figure shows the function (bold type), the | |
466 implementation (in the rectangle), and the simulation it produces | |
467 (italics). | |
468 | |
469 Figure 2. Responses of the model to the vowel in 'hat' processed | |
470 through the functional route: (top) basilar membrane motion, (middle) | |
471 neural activity pattern, and (bottom) auditory image. | |
472 | |
473 Figure 3. Responses of the model to the vowel in 'hat' processed | |
474 through the physiological route: (top) basilar membrane motion, | |
475 (middle) neural activity pattern, and (bottom) autocorrelogram image. | |
476 |