annotate docs/PAG95.doc @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
rev   line source
tomwalters@0 1 Revised for JASA, 3 April 95 1
tomwalters@0 2
tomwalters@0 3
tomwalters@0 4 Time-domain modelling of peripheral auditory processing:
tomwalters@0 5 A modular architecture and a software platform*
tomwalters@0 6
tomwalters@0 7 Roy D. Patterson and Mike H. Allerhand
tomwalters@0 8 MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, UK
tomwalters@0 9
tomwalters@0 10 Christian Gigure Laboratory of Experimental Audiology, University
tomwalters@0 11 Hospital Utrecht, 3508 GA Utrecht, The Netherlands
tomwalters@0 12
tomwalters@0 13 (Received December, 1994) (Revised 31 March 1995)
tomwalters@0 14
tomwalters@0 15 A software package with a modular architecture has been developed to
tomwalters@0 16 support perceptual modelling of the fine-grain spectro-temporal
tomwalters@0 17 information observed in the auditory nerve. The package contains both
tomwalters@0 18 functional and physiological modules to simulate auditory spectral
tomwalters@0 19 analysis, neural encoding and temporal integration, including new
tomwalters@0 20 forms of periodicity-sensitive temporal integration that generate
tomwalters@0 21 stabilized auditory images. Combinations of the modules enable the
tomwalters@0 22 user to approximate a wide variety of existing, time-domain, auditory
tomwalters@0 23 models. Sequences of auditory images can be replayed to produce
tomwalters@0 24 cartoons of auditory perceptions that illustrate the dynamic response
tomwalters@0 25 of the auditory system to everyday sounds.
tomwalters@0 26
tomwalters@0 27 PACS numbers: 43.64.Bt, 43.66.Ba, 43.71.An
tomwalters@0 28
tomwalters@0 29 Running head: Auditory Image Model Software
tomwalters@0 30
tomwalters@0 31
tomwalters@0 32 INTRODUCTION
tomwalters@0 33
tomwalters@0 34 Several years ago, we developed a functional model of the cochlea to
tomwalters@0 35 simulate the phase-locked activity that complex sounds produce in the
tomwalters@0 36 auditory nerve. The purpose was to investigate the role of the
tomwalters@0 37 fine-grain timing information in auditory perception generally
tomwalters@0 38 (Patterson et al., 1992a; Patterson and Akeroyd, 1995), and in speech
tomwalters@0 39 perception in particular (Patterson, Holdsworth and Allerhand, 1992b).
tomwalters@0 40 The architecture of the resulting Auditory Image Model (AIM) is shown
tomwalters@0 41 in the left-hand column of Fig. 1. The responses of the three modules
tomwalters@0 42 to the vowel in 'hat' are shown in the three panels of Fig. 2.
tomwalters@0 43 Briefly, the spectral analysis stage converts the sound wave into the
tomwalters@0 44 model's representation of basilar membrane motion (BMM). For the vowel
tomwalters@0 45 in 'hat', each glottal cycle generates a version of the basic vowel
tomwalters@0 46 structure in the BMM (top panel). The neural encoding stage
tomwalters@0 47 stabilizes the BMM in level and sharpens features like vowel formants,
tomwalters@0 48 to produce a simulation of the neural activity pattern (NAP) produced
tomwalters@0 49 by the sound in the auditory nerve (middle panel). The temporal
tomwalters@0 50 integration stage stabilizes the repeating structure in the NAP and
tomwalters@0 51 produces a simulation of our perception of the vowel (bottom panel),
tomwalters@0 52 referred to as the auditory image. Sequences of simulated images can
tomwalters@0 53 be generated at regular intervals and replayed as an animated cartoon
tomwalters@0 54 to show the dynamic behaviour of the auditory images produced by
tomwalters@0 55 everyday sounds.
tomwalters@0 56
tomwalters@0 57 An earlier version of the AIM software was made available to
tomwalters@0 58 collaborators via the Internet. From there it spread to the speech and
tomwalters@0 59 music communities, indicating a more general interest in auditory
tomwalters@0 60 models than we had originally anticipated. This has prompted us to
tomwalters@0 61 prepare documentation and a formal release of the software (AIM R7).
tomwalters@0 62
tomwalters@0 63 A number of users wanted to compare the outputs from the functional
tomwalters@0 64 model, which is almost level independent, with those from
tomwalters@0 65 physiological models of the cochlea, which are fundamentally level
tomwalters@0 66 dependent. Others wanted to compare the auditory images produced by
tomwalters@0 67 strobed temporal integration with correlograms. As a result, we have
tomwalters@0 68 installed alternative modules for each of the three main stages as
tomwalters@0 69 shown in the right-hand column of Fig. 1. The alternative spectral
tomwalters@0 70 analysis module is a non-linear, transmission line filterbank based on
tomwalters@0 71 Gigure and Woodland (1994a). The neural encoding module is based on
tomwalters@0 72 the inner haircell model of Meddis (1988). The temporal integration
tomwalters@0 73 module generates correlograms like those of Slaney and Lyon (1990) or
tomwalters@0 74 Meddis and Hewitt (1991), using the algorithm proposed by Allerhand
tomwalters@0 75 and Patterson (1992). The responses of the three modules to the vowel
tomwalters@0 76 in 'hat' are shown in Fig. 3 for the case where the level of the vowel
tomwalters@0 77 is 60 dB SPL. The patterns are broadly similar to those of the
tomwalters@0 78 functional modules but the details differ, particularly at the output
tomwalters@0 79 of the third stage. The differences grow more pronounced when the
tomwalters@0 80 level of the vowel is reduced to 30 dB SPL or increased to 90 dB SPL.
tomwalters@0 81 Figures 2 and 3 together illustrate how the software can be used to
tomwalters@0 82 compare and contrast different auditory models. The new modules also
tomwalters@0 83 open the way to time-domain simulation of hearing impairment and
tomwalters@0 84 distortion products of cochlear origin.
tomwalters@0 85
tomwalters@0 86 Switches were installed to enable the user to shift from the
tomwalters@0 87 functional to the physiological version of AIM at the output of each
tomwalters@0 88 stage of the model. This architecture enables the system to implement
tomwalters@0 89 other popular auditory models such as the gammatone- filterbank,
tomwalters@0 90 Meddis-haircell, correlogram models proposed by Assmann and
tomwalters@0 91 Summerfield (1990), Meddis and Hewitt (1991), and Brown and Cooke
tomwalters@0 92 (1994). The remainder of this letter describes the integrated software
tomwalters@0 93 package with emphasis on the functional and physiological routes, and
tomwalters@0 94 on practical aspects of obtaining the software package.*
tomwalters@0 95
tomwalters@0 96
tomwalters@0 97
tomwalters@0 98 I. THE AUDITORY IMAGE MODEL
tomwalters@0 99
tomwalters@0 100 A. The spectral analysis stage
tomwalters@0 101
tomwalters@0 102 Spectral analysis is performed by a bank of auditory filters which
tomwalters@0 103 converts a digitized wave into an array of filtered waves like those
tomwalters@0 104 shown in the top panels of Figs 2 and 3. The set of waves is AIM's
tomwalters@0 105 representation of basilar membrane motion. The software distributes
tomwalters@0 106 the filters linearly along a frequency scale measured in Equivalent
tomwalters@0 107 Rectangular Bandwidths (ERB's). The ERB scale was proposed by Glasberg
tomwalters@0 108 and Moore (1990) based on physiological research summarized in
tomwalters@0 109 Greenwood (1990) and psychoacoustic research summarized in Patterson
tomwalters@0 110 and Moore (1986). The constants of the ERB function can also be set to
tomwalters@0 111 produce a reasonable approximation to the Bark scale. Options enable
tomwalters@0 112 the user to specify the number of channels in the filterbank and the
tomwalters@0 113 minimum and maximum filter center frequencies.
tomwalters@0 114
tomwalters@0 115 AIM provides both a functional auditory filter and a physiological
tomwalters@0 116 auditory filter for generating the BMM: the former is a linear,
tomwalters@0 117 gammatone filter (Patterson et al., 1992a); the latter is a
tomwalters@0 118 non-linear, transmission-line filter (Gigure and Woodland, 1994a).
tomwalters@0 119 The impulse response of the gammatone filter provides an excellent fit
tomwalters@0 120 to the impulse response of primary auditory neurons in cats, and its
tomwalters@0 121 amplitude characteristic is very similar to that of the 'roex' filter
tomwalters@0 122 commonly used to represent the human auditory filter. The motivation
tomwalters@0 123 for the gammatone filterbank and the available implementations are
tomwalters@0 124 summarized in Patterson (1994a). The input wave is passed through an
tomwalters@0 125 optional middle-ear filter adapted from Lutman and Martin (1979).
tomwalters@0 126
tomwalters@0 127 In the physiological version, a 'wave digital filter' is used to
tomwalters@0 128 implement the classical, one-dimensional, transmission-line
tomwalters@0 129 approximation to cochlear hydrodynamics. A feedback circuit
tomwalters@0 130 representing the fast motile response of the outer haircells generates
tomwalters@0 131 level- dependent basilar membrane motion (Gigure and Woodland,
tomwalters@0 132 1994a). The filterbank generates combination tones of the type
tomwalters@0 133 f1-n(f2-f1) which propagate to the appropriate channel, and it has the
tomwalters@0 134 potential to generate cochlear echoes. Options enable the user to
tomwalters@0 135 customize the transmission line filter by specifying the feedback gain
tomwalters@0 136 and saturation level of the outer haircell circuit. The middle ear
tomwalters@0 137 filter forms an integral part of the simulation in this case.
tomwalters@0 138 Together, it and the transmission line filterbank provide a
tomwalters@0 139 bi-directional model of auditory spectral analysis.
tomwalters@0 140
tomwalters@0 141 The upper panels of Figs 2 and 3 show the responses of the two
tomwalters@0 142 filterbanks to the vowel in 'hat'. They have 75 channels covering the
tomwalters@0 143 frequency range 100 to 6000 Hz (3.3 to 30.6 ERB's). In the
tomwalters@0 144 high-frequency channels, the filters are broad and the glottal pulses
tomwalters@0 145 generate impulse responses which decay relatively quickly. In the
tomwalters@0 146 low-frequency channels, the filters are narrow and so they resolve
tomwalters@0 147 individual continuous harmonics. The rightward skew in the
tomwalters@0 148 low-frequency channels is the 'phase lag,' or 'propagation delay,' of
tomwalters@0 149 the cochlea, which arises because the narrower low-frequency filters
tomwalters@0 150 respond more slowly to input. The transmission line filterbank shows
tomwalters@0 151 more ringing in the valleys than the gammatone filterbank because of
tomwalters@0 152 its dynamic signal compression; as amplitude decreases the damping of
tomwalters@0 153 the basilar membrane is reduced to increase sensitivity and frequency
tomwalters@0 154 resolution.
tomwalters@0 155
tomwalters@0 156
tomwalters@0 157 B. The neural encoding stage
tomwalters@0 158
tomwalters@0 159 The second stage of AIM simulates the mechanical/neural transduction
tomwalters@0 160 process performed by the inner haircells. It converts the BMM into a
tomwalters@0 161 neural activity pattern (NAP), which is AIM's representation of the
tomwalters@0 162 afferent activity in the auditory nerve. Two alternative simulations
tomwalters@0 163 are provided for generating the NAP: a bank of two-dimensional
tomwalters@0 164 adaptive- thresholding units (Holdsworth and Patterson, 1993), or a
tomwalters@0 165 bank of inner haircell simulators (Meddis, 1988).
tomwalters@0 166
tomwalters@0 167 The adaptive thresholding mechanism is a functional representation of
tomwalters@0 168 neural encoding. It begins by rectifying and compressing the BMM; then
tomwalters@0 169 it applies adaptation in time and suppression across frequency. The
tomwalters@0 170 adaptation and suppression are coupled and they jointly sharpen
tomwalters@0 171 features like vowel formants in the compressed BMM representation.
tomwalters@0 172 Briefly, an adaptive threshold value is maintained for each channel
tomwalters@0 173 and updated at the sampling rate. The new value is the largest of a)
tomwalters@0 174 the previous value reduced by a fast-acting temporal decay factor, b)
tomwalters@0 175 the previous value reduced by a longer-term temporal decay factor, c)
tomwalters@0 176 the adapted level in the channel immediately above, reduced by a
tomwalters@0 177 frequency spread factor, or d) the adapted level in the channel
tomwalters@0 178 immediately below, reduced by the same frequency spread factor. The
tomwalters@0 179 mechanism produces output whenever the input exceeds the adaptive
tomwalters@0 180 threshold, and the output level is the difference between the input
tomwalters@0 181 and the adaptive threshold. The parameters that control the spread of
tomwalters@0 182 activity in time and frequency are options in AIM.
tomwalters@0 183
tomwalters@0 184 The Meddis (1988) module simulates the operation of an individual
tomwalters@0 185 inner haircell; specifically, it simulates the flow of
tomwalters@0 186 neurotransmitter across three reservoirs that are postulated to exist
tomwalters@0 187 in and around the haircell. The module reproduces important properties
tomwalters@0 188 of single afferent fibres such as two-component time adaptation and
tomwalters@0 189 phase-locking. The transmitter flow equations are solved using the
tomwalters@0 190 wave-digital-filter algorithm described in Gigure and Woodland
tomwalters@0 191 (1994a). There is one haircell simulator for each channel of the
tomwalters@0 192 filterbank. Options allow the user to shift the entire rate-intensity
tomwalters@0 193 function to a higher or lower level, and to specify the type of fibre
tomwalters@0 194 (medium or high spontaneous-rate).
tomwalters@0 195
tomwalters@0 196 The middle panels in Figures 2 and 3 show the NAPs obtained with
tomwalters@0 197 adaptive thresholding and the Meddis module in response to BMMs from
tomwalters@0 198 the gammatone and transmission line filterbanks of Figs 1 and 2,
tomwalters@0 199 respectively. The phase lag of the BMM is preserved in the NAP. The
tomwalters@0 200 positive half-cycles of the BMM waves have been sharpened in time, an
tomwalters@0 201 effect which is more obvious in the adaptive thresholding NAP.
tomwalters@0 202 Sharpening is also evident in the frequency dimension of the adaptive
tomwalters@0 203 thresholding NAP. The individual 'haircells' are not coupled across
tomwalters@0 204 channels in the Meddis module, and thus there is no frequency
tomwalters@0 205 sharpening in this case. The physiological NAP reveals that the
tomwalters@0 206 activity between glottal pulses in the high-frequency channels is due
tomwalters@0 207 to the strong sixth harmonic in the first formant of the vowel.
tomwalters@0 208
tomwalters@0 209
tomwalters@0 210 C. The temporal integration stage
tomwalters@0 211
tomwalters@0 212 Periodic sounds give rise to static, rather than oscillating,
tomwalters@0 213 perceptions indicating that temporal integration is applied to the NAP
tomwalters@0 214 in the production of our initial perception of a sound -- our auditory
tomwalters@0 215 image. Traditionally, auditory temporal integration is represented by
tomwalters@0 216 a simple leaky integration process and AIM provides a bank of lowpass
tomwalters@0 217 filters to enable the user to generate auditory spectra (Patterson,
tomwalters@0 218 1994a) and auditory spectrograms (Patterson et al., 1992b). However,
tomwalters@0 219 the leaky integrator removes the phase-locked fine structure observed
tomwalters@0 220 in the NAP, and this conflicts with perceptual data indicating that
tomwalters@0 221 the fine structure plays an important role in determining sound
tomwalters@0 222 quality and source identification (Patterson, 1994b; Patterson and
tomwalters@0 223 Akeroyd, 1995). As a result, AIM includes two modules which preserve
tomwalters@0 224 much of the time-interval information in the NAP during temporal
tomwalters@0 225 integration, and which produce a better representation of our auditory
tomwalters@0 226 images. In the functional version of AIM, this is accomplished with
tomwalters@0 227 strobed temporal integration (Patterson et al., 1992a,b); in the
tomwalters@0 228 physiological version, it is accomplished with a bank of
tomwalters@0 229 autocorrelators (Slaney and Lyon, 1990; Meddis and Hewitt, 1991).
tomwalters@0 230
tomwalters@0 231 In the case of strobed temporal integration (STI), a bank of delay
tomwalters@0 232 lines is used to form a buffer store for the NAP, one delay line per
tomwalters@0 233 channel, and as the NAP proceeds along the buffer it decays linearly
tomwalters@0 234 with time, at about 2.5 %/ms. Each channel of the buffer is assigned a
tomwalters@0 235 strobe unit which monitors activity in that channel looking for local
tomwalters@0 236 maxima in the stream of NAP pulses. When one is found, the unit
tomwalters@0 237 initiates temporal integration in that channel; that is, it transfers
tomwalters@0 238 a copy of the NAP at that instant to the corresponding channel of an
tomwalters@0 239 image buffer and adds it point-for-point with whatever is already
tomwalters@0 240 there. The local maximum itself is mapped to the 0-ms point in the
tomwalters@0 241 image buffer. The multi-channel version of this STI process produces
tomwalters@0 242 AIM's representation of our auditory image of a sound. Periodic and
tomwalters@0 243 quasi-periodic sounds cause regular strobing which leads to simulated
tomwalters@0 244 auditory images that are static, or nearly static, and which have the
tomwalters@0 245 same temporal resolution as the NAP. Dynamic sounds are represented
tomwalters@0 246 as a sequence of auditory image frames. If the rate of change in a
tomwalters@0 247 sound is not too rapid, as is diphthongs, features are seen to move
tomwalters@0 248 smoothly as the sound proceeds, much as characters move smoothly in
tomwalters@0 249 animated cartoons.
tomwalters@0 250
tomwalters@0 251 An alternative form of temporal integration is provided by the
tomwalters@0 252 correlogram (Slaney and Lyon, 1990; Meddis and Hewitt, 1991). It
tomwalters@0 253 extracts periodicity information and preserves intra-period fine
tomwalters@0 254 structure by autocorrelating each channel of the NAP. The correlogram
tomwalters@0 255 is the multi-channel version of this process. It was originally
tomwalters@0 256 introduced as a model of pitch perception (Licklider, 1951) with a
tomwalters@0 257 neural wiring diagram to illustrate that it was physiologically
tomwalters@0 258 plausible. To date, however, there is no physiological evidence for
tomwalters@0 259 autocorrelation in the auditory system, and the installation of the
tomwalters@0 260 module in the physiological route was a matter of convenience. The
tomwalters@0 261 current implementation is a recursive, or running, autocorrelation. A
tomwalters@0 262 functionally equivalent FFT-based method is also provided (Allerhand
tomwalters@0 263 and Patterson, 1992). A comparison of the correlogram in the bottom
tomwalters@0 264 panel of Fig. 3 with the auditory image in the bottom panel of Fig. 2
tomwalters@0 265 shows that the vowel structure is more symmetric in the correlogram
tomwalters@0 266 and there are larger level contrasts in the correlogram. It is not
tomwalters@0 267 yet known whether one of the representations is more realistic or more
tomwalters@0 268 useful. The present purpose is to note that the software package can
tomwalters@0 269 be used to compare auditory representations in a way not previously
tomwalters@0 270 possible.
tomwalters@0 271
tomwalters@0 272
tomwalters@0 273
tomwalters@0 274 II. THE SOFTWARE/HARDWARE PLATFORM
tomwalters@0 275
tomwalters@0 276 i. The software package: The code is distributed as a compressed
tomwalters@0 277 archive (in unix tar format), and can be obtained via ftp from the
tomwalters@0 278 address: ftp.mrc-apu.cam.ac.uk (Name=anonymous; Password=<your email
tomwalters@0 279 address>). All the software is contained in a single archive:
tomwalters@0 280 pub/aim/aim.tar.Z. The associated text file pub/aim/ReadMe contains
tomwalters@0 281 instructions for installing and compiling the software. The AIM
tomwalters@0 282 package consists of a makefile and several sub-directories. Five of
tomwalters@0 283 these (filter, glib, model, stitch and wdf) contain the C code for
tomwalters@0 284 AIM. An aim/tools directory contains C code for ancillary software
tomwalters@0 285 tools. These software tools are provided for pre/post-processing of
tomwalters@0 286 model input/output. A variety of functions are offered, including:
tomwalters@0 287 stimulus generation, signal processing, and data manipulation. An
tomwalters@0 288 aim/man directory contains on-line manual pages describing AIM and the
tomwalters@0 289 software tools. An aim/scripts directory contains demonstration
tomwalters@0 290 scripts for a guided tour through the model. Sounds used to test and
tomwalters@0 291 demonstrate the model are provided in the aim/waves directory. These
tomwalters@0 292 sounds were sampled at 20 kHz, and each sample is a 2-byte number in
tomwalters@0 293 little-endian byte order; a tool is provided to swap byte order when
tomwalters@0 294 necessary.
tomwalters@0 295
tomwalters@0 296 ii. System requirements: The software is written in C. The code
tomwalters@0 297 generated by the native C compilers included with Ultrix (version 4.3a
tomwalters@0 298 and above) and SunOS (version 4.1.3 and above) has been extensively
tomwalters@0 299 tested. The code from the GNU C compiler (version 2.5.7 and above) is
tomwalters@0 300 also reliable. The total disc usage of the AIM source code is about
tomwalters@0 301 700 kbytes. The package also includes 500 kbytes of sources for
tomwalters@0 302 ancillary software tools, and 200 kbytes of documentation. The
tomwalters@0 303 executable programs occupy about 1000 kbytes, and executable programs
tomwalters@0 304 for ancillary tools occupy 7000 kbytes. About 800 Kbytes of temporary
tomwalters@0 305 space are required for object files during compilation. The graphical
tomwalters@0 306 interface uses X11 (R4 and above) with either the OpenWindows or Motif
tomwalters@0 307 user interface. The programs can be compiled using the base Xlib
tomwalters@0 308 library (libX11.a), and will run on both 1- bit (mono) and multi-plane
tomwalters@0 309 (colour or greyscale) displays.
tomwalters@0 310
tomwalters@0 311 iii. Compilation and operation: The makefile includes targets to
tomwalters@0 312 compile the source code for AIM and the associated tools on a range of
tomwalters@0 313 machines (DEC, SUN, SGI, HP); the targets differ only in the pathnames
tomwalters@0 314 for the local X11 base library (libX11.a) and header files (X11/X.h
tomwalters@0 315 and X11/Xlib.h). AIM can be compiled without the display code if the
tomwalters@0 316 graphics interface is not required or if X11 is not available (make
tomwalters@0 317 noplot). The executable for AIM is called gen. Compilation also
tomwalters@0 318 generates symbolic links to gen, such as genbmm, gennap and gensai,
tomwalters@0 319 which are used to select the desired output (BMM, NAP or SAI). The
tomwalters@0 320 links and the executables for the aim/tools are installed in the
tomwalters@0 321 aim/bin directory after compilation. Options are specified as:
tomwalters@0 322 name=value on the command line; unspecified options are assigned
tomwalters@0 323 default values. The model output takes the form of binary data routed
tomwalters@0 324 by default to the model's graphical displays. Output can also be
tomwalters@0 325 routed to plotting hardware, or other post- processing software.
tomwalters@0 326
tomwalters@0 327
tomwalters@0 328
tomwalters@0 329 III. APPLICATIONS AND SUMMARY
tomwalters@0 330
tomwalters@0 331 In hearing research, the functional version of AIM has been used to
tomwalters@0 332 model phase perception (Patterson, 1987a), octave perception
tomwalters@0 333 (Patterson et al., 1993), and timbre perception (Patterson, 1994b).
tomwalters@0 334 The physiological version has been used to simulate cochlear hearing
tomwalters@0 335 loss (Gigure, Woodland, and Robinson, 1993; Gigure and Woodland,
tomwalters@0 336 1994b), and combination tones of cochlear origin (Gigure, Kunov, and
tomwalters@0 337 Smoorenburg, 1995). In speech research, the functional version has
tomwalters@0 338 been used to explain syllabic stress (Allerhand et al., 1992), and
tomwalters@0 339 both versions have been used as preprocessors for speech recognition
tomwalters@0 340 systems (e.g. Patterson, Anderson, and Allerhand, 1994; Gigure et
tomwalters@0 341 al., 1993). In summary, the AIM software package provides a modular
tomwalters@0 342 architecture for time- domain computational studies of peripheral
tomwalters@0 343 auditory processing.
tomwalters@0 344
tomwalters@0 345
tomwalters@0 346 * Instructions for acquiring the software package electronically are
tomwalters@0 347 presented in Section II. This document refers to AIM R7 which is the
tomwalters@0 348 first official release.
tomwalters@0 349
tomwalters@0 350
tomwalters@0 351 ACKNOWLEDGEMENTS
tomwalters@0 352
tomwalters@0 353 The gammatone filterbank, adaptive thresholding, and much of the
tomwalters@0 354 software platform were written by John Holdsworth; the options handler
tomwalters@0 355 is by Paul Manson, and the revised STI module by Jay Datta. Michael
tomwalters@0 356 Akeroyd extended the postscript facilities and developed the xreview
tomwalters@0 357 routine for auditory image cartoons. The software development was
tomwalters@0 358 supported by grants from DRA Farnborough (U.K.), Esprit BR 3207 (EEC),
tomwalters@0 359 and the Hearing Research Trust (U.K.). We thank Malcolm Slaney and
tomwalters@0 360 Michael Akeroyd for helpful comments on an earlier version of the
tomwalters@0 361 paper.
tomwalters@0 362
tomwalters@0 363
tomwalters@0 364 Allerhand, M., and Patterson, R.D. (1992). "Correlograms and auditory
tomwalters@0 365 images," Proc. Inst. Acoust. 14, 281-288.
tomwalters@0 366
tomwalters@0 367 Allerhand, M., Butterfield, S., Cutler, A., and Patterson, R.D.
tomwalters@0 368 (1992). "Assessing syllable strength via an auditory model," Proc.
tomwalters@0 369 Inst. Acoust. 14, 297-304.
tomwalters@0 370
tomwalters@0 371 Assmann, P.F., and Summerfield, Q. (1990). "Modelling the perception
tomwalters@0 372 of concurrent vowels: Vowels with different fundamental frequencies,"
tomwalters@0 373 J. Acoust. Soc. Am., 88, 680- 697.
tomwalters@0 374
tomwalters@0 375 Brown, G.J., and Cooke, M. (1994) "Computational auditory scene
tomwalters@0 376 analysis," Computer Speech and Language 8, 297-336.
tomwalters@0 377
tomwalters@0 378 Gigure, C., Woodland, P.C., and Robinson, A.J. (1993). "Application
tomwalters@0 379 of an auditory model to the computer simulation of hearing impairment:
tomwalters@0 380 Preliminary results," Can. Acoust. 21, 135-136.
tomwalters@0 381
tomwalters@0 382 Gigure, C., and Woodland, P.C. (1994a). "A computational model of
tomwalters@0 383 the auditory periphery for speech and hearing research. I. Ascending
tomwalters@0 384 path," J. Acoust. Soc. Am. 95, 331-342.
tomwalters@0 385
tomwalters@0 386 Gigure, C., and Woodland, P.C. (1994b). "A computational model of
tomwalters@0 387 the auditory periphery for speech and hearing research. II: Descending
tomwalters@0 388 paths,'' J. Acoust. Soc. Am. 95, 343-349.
tomwalters@0 389
tomwalters@0 390 Gigure, C., Kunov, H., and Smoorenburg, G.F. (1995). "Computational
tomwalters@0 391 modelling of psycho-acoustic combination tones and distortion-product
tomwalters@0 392 otoacoustic emissions," 15th Int. Cong. on Acoustics, Trondheim
tomwalters@0 393 (Norway), 26-30 June.
tomwalters@0 394
tomwalters@0 395 Glasberg, B.R., and Moore, B.C.J. (1990). "Derivation of auditory
tomwalters@0 396 filter shapes from notched-noise data," Hear. Res. 47, 103-38.
tomwalters@0 397
tomwalters@0 398 Greenwood, D.D. (1990). "A cochlear frequency-position function for
tomwalters@0 399 several species - 29 years later," J. Acoust. Soc. Am. 87, 2592-2605.
tomwalters@0 400
tomwalters@0 401 Holdsworth, J.W., and Patterson, R.D. (1991). "Analysis of
tomwalters@0 402 waveforms," UK Patent No. GB 2-234-078-A (23.1.91). London: UK
tomwalters@0 403 Patent Office.
tomwalters@0 404
tomwalters@0 405 Licklider, J. C. R. (1951). "A duplex theory of pitch perception,"
tomwalters@0 406 Experientia, 7, 128- 133.
tomwalters@0 407
tomwalters@0 408 Lutman, M.E. and Martin, A.M. (1979). "Development of an
tomwalters@0 409 electroacoustic analogue model of the middle ear and acoustic reflex,"
tomwalters@0 410 J. Sound. Vib. 64, 133-157.
tomwalters@0 411
tomwalters@0 412 Meddis, R. (1988). "Simulation of auditory-neural transduction:
tomwalters@0 413 Further studies," J. Acoust. Soc. Am. 83, 1056-1063.
tomwalters@0 414
tomwalters@0 415 Meddis, R. and Hewitt, M.J. (1991). "Modelling the perception of
tomwalters@0 416 concurrent vowels with different fundamental frequencies," J. Acoust.
tomwalters@0 417 Soc. Am. 91, 233-45.
tomwalters@0 418
tomwalters@0 419 Patterson, R.D. (1987). "A pulse ribbon model of monaural phase
tomwalters@0 420 perception," J. Acoust. Soc. Am., 82, 1560-1586.
tomwalters@0 421
tomwalters@0 422 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
tomwalters@0 423 J. Acoust. Soc. Am. 96, 1409-1418.
tomwalters@0 424
tomwalters@0 425 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
tomwalters@0 426 models." J. Acoust. Soc. Am. 96, 1419-1428.
tomwalters@0 427
tomwalters@0 428 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
tomwalters@0 429 sound quality," in: Advances in Hearing Research: Proceedings of the
tomwalters@0 430 10th International Symposium on Hearing, edited by G. Manley, G.
tomwalters@0 431 Klump, C. Koppl, H. Fastl, & H. Oeckinghaus, World Scientific,
tomwalters@0 432 Singapore, (in press).
tomwalters@0 433
tomwalters@0 434 Patterson, R.D., Anderson, T., and Allerhand, M. (1994). "The auditory
tomwalters@0 435 image model as a preprocessor for spoken language," in Proc. Third
tomwalters@0 436 ICSLP, Yokohama, Japan 1395- 1398.
tomwalters@0 437
tomwalters@0 438 Patterson, R.D., Milroy, R. and Allerhand, M. (1993). "What is the
tomwalters@0 439 octave of a harmonically rich note?" In: Proc. 2nd Int. Conf. on Music
tomwalters@0 440 and the Cognitive Sciences, edited by I. Cross and I Deliege (Harwood,
tomwalters@0 441 Switzerland) 69-81.
tomwalters@0 442
tomwalters@0 443 Patterson, R.D. and B.C.J. Moore (1986). "Auditory filters and
tomwalters@0 444 excitation patterns as representations of frequency resolution," in
tomwalters@0 445 Frequency Selectivity in Hearing, edited by B. C. J. Moore, (Academic,
tomwalters@0 446 London) pp. 123-177.
tomwalters@0 447
tomwalters@0 448 Patterson, R.D., Holdsworth, J. and Allerhand M. (1992) "Auditory
tomwalters@0 449 Models as preprocessors for speech recognition," In: The Auditory
tomwalters@0 450 Processing of Speech: From the auditory periphery to words, edited by
tomwalters@0 451 M. E. H. Schouten (Mouton de Gruyter, Berlin) 67-83.
tomwalters@0 452
tomwalters@0 453 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C.,
tomwalters@0 454 and Allerhand M. (1992) "Complex sounds and auditory images," In:
tomwalters@0 455 Auditory physiology and perception, edited by Y Cazals, L. Demany, and
tomwalters@0 456 K. Horner (Pergamon, Oxford) 429-446.
tomwalters@0 457
tomwalters@0 458 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
tomwalters@0 459 Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing,
tomwalters@0 460 Albuquerque, New Mexico, April 1990.
tomwalters@0 461
tomwalters@0 462
tomwalters@0 463 Figure 1. The three-stage structure of the AIM software package.
tomwalters@0 464 Left-hand column: functional route, right-hand column: physiological
tomwalters@0 465 route. For each module, the figure shows the function (bold type), the
tomwalters@0 466 implementation (in the rectangle), and the simulation it produces
tomwalters@0 467 (italics).
tomwalters@0 468
tomwalters@0 469 Figure 2. Responses of the model to the vowel in 'hat' processed
tomwalters@0 470 through the functional route: (top) basilar membrane motion, (middle)
tomwalters@0 471 neural activity pattern, and (bottom) auditory image.
tomwalters@0 472
tomwalters@0 473 Figure 3. Responses of the model to the vowel in 'hat' processed
tomwalters@0 474 through the physiological route: (top) basilar membrane motion,
tomwalters@0 475 (middle) neural activity pattern, and (bottom) autocorrelogram image.
tomwalters@0 476