aim92: docs/PAG95.doc annotate

annotate docs/PAG95.doc @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).

author	tomwalters
date	Fri, 20 May 2011 15:19:45 +0100
parents
children

rev	line source
tomwalters@0	1 Revised for JASA, 3 April 95 1
tomwalters@0	2
tomwalters@0	3
tomwalters@0	4 Time-domain modelling of peripheral auditory processing:
tomwalters@0	5 A modular architecture and a software platform*
tomwalters@0	6
tomwalters@0	7 Roy D. Patterson and Mike H. Allerhand
tomwalters@0	8 MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, UK
tomwalters@0	9
tomwalters@0	10 Christian Gigure Laboratory of Experimental Audiology, University
tomwalters@0	11 Hospital Utrecht, 3508 GA Utrecht, The Netherlands
tomwalters@0	12
tomwalters@0	13 (Received December, 1994) (Revised 31 March 1995)
tomwalters@0	14
tomwalters@0	15 A software package with a modular architecture has been developed to
tomwalters@0	16 support perceptual modelling of the fine-grain spectro-temporal
tomwalters@0	17 information observed in the auditory nerve. The package contains both
tomwalters@0	18 functional and physiological modules to simulate auditory spectral
tomwalters@0	19 analysis, neural encoding and temporal integration, including new
tomwalters@0	20 forms of periodicity-sensitive temporal integration that generate
tomwalters@0	21 stabilized auditory images. Combinations of the modules enable the
tomwalters@0	22 user to approximate a wide variety of existing, time-domain, auditory
tomwalters@0	23 models. Sequences of auditory images can be replayed to produce
tomwalters@0	24 cartoons of auditory perceptions that illustrate the dynamic response
tomwalters@0	25 of the auditory system to everyday sounds.
tomwalters@0	26
tomwalters@0	27 PACS numbers: 43.64.Bt, 43.66.Ba, 43.71.An
tomwalters@0	28
tomwalters@0	29 Running head: Auditory Image Model Software
tomwalters@0	30
tomwalters@0	31
tomwalters@0	32 INTRODUCTION
tomwalters@0	33
tomwalters@0	34 Several years ago, we developed a functional model of the cochlea to
tomwalters@0	35 simulate the phase-locked activity that complex sounds produce in the
tomwalters@0	36 auditory nerve. The purpose was to investigate the role of the
tomwalters@0	37 fine-grain timing information in auditory perception generally
tomwalters@0	38 (Patterson et al., 1992a; Patterson and Akeroyd, 1995), and in speech
tomwalters@0	39 perception in particular (Patterson, Holdsworth and Allerhand, 1992b).
tomwalters@0	40 The architecture of the resulting Auditory Image Model (AIM) is shown
tomwalters@0	41 in the left-hand column of Fig. 1. The responses of the three modules
tomwalters@0	42 to the vowel in 'hat' are shown in the three panels of Fig. 2.
tomwalters@0	43 Briefly, the spectral analysis stage converts the sound wave into the
tomwalters@0	44 model's representation of basilar membrane motion (BMM). For the vowel
tomwalters@0	45 in 'hat', each glottal cycle generates a version of the basic vowel
tomwalters@0	46 structure in the BMM (top panel). The neural encoding stage
tomwalters@0	47 stabilizes the BMM in level and sharpens features like vowel formants,
tomwalters@0	48 to produce a simulation of the neural activity pattern (NAP) produced
tomwalters@0	49 by the sound in the auditory nerve (middle panel). The temporal
tomwalters@0	50 integration stage stabilizes the repeating structure in the NAP and
tomwalters@0	51 produces a simulation of our perception of the vowel (bottom panel),
tomwalters@0	52 referred to as the auditory image. Sequences of simulated images can
tomwalters@0	53 be generated at regular intervals and replayed as an animated cartoon
tomwalters@0	54 to show the dynamic behaviour of the auditory images produced by
tomwalters@0	55 everyday sounds.
tomwalters@0	56
tomwalters@0	57 An earlier version of the AIM software was made available to
tomwalters@0	58 collaborators via the Internet. From there it spread to the speech and
tomwalters@0	59 music communities, indicating a more general interest in auditory
tomwalters@0	60 models than we had originally anticipated. This has prompted us to
tomwalters@0	61 prepare documentation and a formal release of the software (AIM R7).
tomwalters@0	62
tomwalters@0	63 A number of users wanted to compare the outputs from the functional
tomwalters@0	64 model, which is almost level independent, with those from
tomwalters@0	65 physiological models of the cochlea, which are fundamentally level
tomwalters@0	66 dependent. Others wanted to compare the auditory images produced by
tomwalters@0	67 strobed temporal integration with correlograms. As a result, we have
tomwalters@0	68 installed alternative modules for each of the three main stages as
tomwalters@0	69 shown in the right-hand column of Fig. 1. The alternative spectral
tomwalters@0	70 analysis module is a non-linear, transmission line filterbank based on
tomwalters@0	71 Gigure and Woodland (1994a). The neural encoding module is based on
tomwalters@0	72 the inner haircell model of Meddis (1988). The temporal integration
tomwalters@0	73 module generates correlograms like those of Slaney and Lyon (1990) or
tomwalters@0	74 Meddis and Hewitt (1991), using the algorithm proposed by Allerhand
tomwalters@0	75 and Patterson (1992). The responses of the three modules to the vowel
tomwalters@0	76 in 'hat' are shown in Fig. 3 for the case where the level of the vowel
tomwalters@0	77 is 60 dB SPL. The patterns are broadly similar to those of the
tomwalters@0	78 functional modules but the details differ, particularly at the output
tomwalters@0	79 of the third stage. The differences grow more pronounced when the
tomwalters@0	80 level of the vowel is reduced to 30 dB SPL or increased to 90 dB SPL.
tomwalters@0	81 Figures 2 and 3 together illustrate how the software can be used to
tomwalters@0	82 compare and contrast different auditory models. The new modules also
tomwalters@0	83 open the way to time-domain simulation of hearing impairment and
tomwalters@0	84 distortion products of cochlear origin.
tomwalters@0	85
tomwalters@0	86 Switches were installed to enable the user to shift from the
tomwalters@0	87 functional to the physiological version of AIM at the output of each
tomwalters@0	88 stage of the model. This architecture enables the system to implement
tomwalters@0	89 other popular auditory models such as the gammatone- filterbank,
tomwalters@0	90 Meddis-haircell, correlogram models proposed by Assmann and
tomwalters@0	91 Summerfield (1990), Meddis and Hewitt (1991), and Brown and Cooke
tomwalters@0	92 (1994). The remainder of this letter describes the integrated software
tomwalters@0	93 package with emphasis on the functional and physiological routes, and
tomwalters@0	94 on practical aspects of obtaining the software package.*
tomwalters@0	95
tomwalters@0	96
tomwalters@0	97
tomwalters@0	98 I. THE AUDITORY IMAGE MODEL
tomwalters@0	99
tomwalters@0	100 A. The spectral analysis stage
tomwalters@0	101
tomwalters@0	102 Spectral analysis is performed by a bank of auditory filters which
tomwalters@0	103 converts a digitized wave into an array of filtered waves like those
tomwalters@0	104 shown in the top panels of Figs 2 and 3. The set of waves is AIM's
tomwalters@0	105 representation of basilar membrane motion. The software distributes
tomwalters@0	106 the filters linearly along a frequency scale measured in Equivalent
tomwalters@0	107 Rectangular Bandwidths (ERB's). The ERB scale was proposed by Glasberg
tomwalters@0	108 and Moore (1990) based on physiological research summarized in
tomwalters@0	109 Greenwood (1990) and psychoacoustic research summarized in Patterson
tomwalters@0	110 and Moore (1986). The constants of the ERB function can also be set to
tomwalters@0	111 produce a reasonable approximation to the Bark scale. Options enable
tomwalters@0	112 the user to specify the number of channels in the filterbank and the
tomwalters@0	113 minimum and maximum filter center frequencies.
tomwalters@0	114
tomwalters@0	115 AIM provides both a functional auditory filter and a physiological
tomwalters@0	116 auditory filter for generating the BMM: the former is a linear,
tomwalters@0	117 gammatone filter (Patterson et al., 1992a); the latter is a
tomwalters@0	118 non-linear, transmission-line filter (Gigure and Woodland, 1994a).
tomwalters@0	119 The impulse response of the gammatone filter provides an excellent fit
tomwalters@0	120 to the impulse response of primary auditory neurons in cats, and its
tomwalters@0	121 amplitude characteristic is very similar to that of the 'roex' filter
tomwalters@0	122 commonly used to represent the human auditory filter. The motivation
tomwalters@0	123 for the gammatone filterbank and the available implementations are
tomwalters@0	124 summarized in Patterson (1994a). The input wave is passed through an
tomwalters@0	125 optional middle-ear filter adapted from Lutman and Martin (1979).
tomwalters@0	126
tomwalters@0	127 In the physiological version, a 'wave digital filter' is used to
tomwalters@0	128 implement the classical, one-dimensional, transmission-line
tomwalters@0	129 approximation to cochlear hydrodynamics. A feedback circuit
tomwalters@0	130 representing the fast motile response of the outer haircells generates
tomwalters@0	131 level- dependent basilar membrane motion (Gigure and Woodland,
tomwalters@0	132 1994a). The filterbank generates combination tones of the type
tomwalters@0	133 f1-n(f2-f1) which propagate to the appropriate channel, and it has the
tomwalters@0	134 potential to generate cochlear echoes. Options enable the user to
tomwalters@0	135 customize the transmission line filter by specifying the feedback gain
tomwalters@0	136 and saturation level of the outer haircell circuit. The middle ear
tomwalters@0	137 filter forms an integral part of the simulation in this case.
tomwalters@0	138 Together, it and the transmission line filterbank provide a
tomwalters@0	139 bi-directional model of auditory spectral analysis.
tomwalters@0	140
tomwalters@0	141 The upper panels of Figs 2 and 3 show the responses of the two
tomwalters@0	142 filterbanks to the vowel in 'hat'. They have 75 channels covering the
tomwalters@0	143 frequency range 100 to 6000 Hz (3.3 to 30.6 ERB's). In the
tomwalters@0	144 high-frequency channels, the filters are broad and the glottal pulses
tomwalters@0	145 generate impulse responses which decay relatively quickly. In the
tomwalters@0	146 low-frequency channels, the filters are narrow and so they resolve
tomwalters@0	147 individual continuous harmonics. The rightward skew in the
tomwalters@0	148 low-frequency channels is the 'phase lag,' or 'propagation delay,' of
tomwalters@0	149 the cochlea, which arises because the narrower low-frequency filters
tomwalters@0	150 respond more slowly to input. The transmission line filterbank shows
tomwalters@0	151 more ringing in the valleys than the gammatone filterbank because of
tomwalters@0	152 its dynamic signal compression; as amplitude decreases the damping of
tomwalters@0	153 the basilar membrane is reduced to increase sensitivity and frequency
tomwalters@0	154 resolution.
tomwalters@0	155
tomwalters@0	156
tomwalters@0	157 B. The neural encoding stage
tomwalters@0	158
tomwalters@0	159 The second stage of AIM simulates the mechanical/neural transduction
tomwalters@0	160 process performed by the inner haircells. It converts the BMM into a
tomwalters@0	161 neural activity pattern (NAP), which is AIM's representation of the
tomwalters@0	162 afferent activity in the auditory nerve. Two alternative simulations
tomwalters@0	163 are provided for generating the NAP: a bank of two-dimensional
tomwalters@0	164 adaptive- thresholding units (Holdsworth and Patterson, 1993), or a
tomwalters@0	165 bank of inner haircell simulators (Meddis, 1988).
tomwalters@0	166
tomwalters@0	167 The adaptive thresholding mechanism is a functional representation of
tomwalters@0	168 neural encoding. It begins by rectifying and compressing the BMM; then
tomwalters@0	169 it applies adaptation in time and suppression across frequency. The
tomwalters@0	170 adaptation and suppression are coupled and they jointly sharpen
tomwalters@0	171 features like vowel formants in the compressed BMM representation.
tomwalters@0	172 Briefly, an adaptive threshold value is maintained for each channel
tomwalters@0	173 and updated at the sampling rate. The new value is the largest of a)
tomwalters@0	174 the previous value reduced by a fast-acting temporal decay factor, b)
tomwalters@0	175 the previous value reduced by a longer-term temporal decay factor, c)
tomwalters@0	176 the adapted level in the channel immediately above, reduced by a
tomwalters@0	177 frequency spread factor, or d) the adapted level in the channel
tomwalters@0	178 immediately below, reduced by the same frequency spread factor. The
tomwalters@0	179 mechanism produces output whenever the input exceeds the adaptive
tomwalters@0	180 threshold, and the output level is the difference between the input
tomwalters@0	181 and the adaptive threshold. The parameters that control the spread of
tomwalters@0	182 activity in time and frequency are options in AIM.
tomwalters@0	183
tomwalters@0	184 The Meddis (1988) module simulates the operation of an individual
tomwalters@0	185 inner haircell; specifically, it simulates the flow of
tomwalters@0	186 neurotransmitter across three reservoirs that are postulated to exist
tomwalters@0	187 in and around the haircell. The module reproduces important properties
tomwalters@0	188 of single afferent fibres such as two-component time adaptation and
tomwalters@0	189 phase-locking. The transmitter flow equations are solved using the
tomwalters@0	190 wave-digital-filter algorithm described in Gigure and Woodland
tomwalters@0	191 (1994a). There is one haircell simulator for each channel of the
tomwalters@0	192 filterbank. Options allow the user to shift the entire rate-intensity
tomwalters@0	193 function to a higher or lower level, and to specify the type of fibre
tomwalters@0	194 (medium or high spontaneous-rate).
tomwalters@0	195
tomwalters@0	196 The middle panels in Figures 2 and 3 show the NAPs obtained with
tomwalters@0	197 adaptive thresholding and the Meddis module in response to BMMs from
tomwalters@0	198 the gammatone and transmission line filterbanks of Figs 1 and 2,
tomwalters@0	199 respectively. The phase lag of the BMM is preserved in the NAP. The
tomwalters@0	200 positive half-cycles of the BMM waves have been sharpened in time, an
tomwalters@0	201 effect which is more obvious in the adaptive thresholding NAP.
tomwalters@0	202 Sharpening is also evident in the frequency dimension of the adaptive
tomwalters@0	203 thresholding NAP. The individual 'haircells' are not coupled across
tomwalters@0	204 channels in the Meddis module, and thus there is no frequency
tomwalters@0	205 sharpening in this case. The physiological NAP reveals that the
tomwalters@0	206 activity between glottal pulses in the high-frequency channels is due
tomwalters@0	207 to the strong sixth harmonic in the first formant of the vowel.
tomwalters@0	208
tomwalters@0	209
tomwalters@0	210 C. The temporal integration stage
tomwalters@0	211
tomwalters@0	212 Periodic sounds give rise to static, rather than oscillating,
tomwalters@0	213 perceptions indicating that temporal integration is applied to the NAP
tomwalters@0	214 in the production of our initial perception of a sound -- our auditory
tomwalters@0	215 image. Traditionally, auditory temporal integration is represented by
tomwalters@0	216 a simple leaky integration process and AIM provides a bank of lowpass
tomwalters@0	217 filters to enable the user to generate auditory spectra (Patterson,
tomwalters@0	218 1994a) and auditory spectrograms (Patterson et al., 1992b). However,
tomwalters@0	219 the leaky integrator removes the phase-locked fine structure observed
tomwalters@0	220 in the NAP, and this conflicts with perceptual data indicating that
tomwalters@0	221 the fine structure plays an important role in determining sound
tomwalters@0	222 quality and source identification (Patterson, 1994b; Patterson and
tomwalters@0	223 Akeroyd, 1995). As a result, AIM includes two modules which preserve
tomwalters@0	224 much of the time-interval information in the NAP during temporal
tomwalters@0	225 integration, and which produce a better representation of our auditory
tomwalters@0	226 images. In the functional version of AIM, this is accomplished with
tomwalters@0	227 strobed temporal integration (Patterson et al., 1992a,b); in the
tomwalters@0	228 physiological version, it is accomplished with a bank of
tomwalters@0	229 autocorrelators (Slaney and Lyon, 1990; Meddis and Hewitt, 1991).
tomwalters@0	230
tomwalters@0	231 In the case of strobed temporal integration (STI), a bank of delay
tomwalters@0	232 lines is used to form a buffer store for the NAP, one delay line per
tomwalters@0	233 channel, and as the NAP proceeds along the buffer it decays linearly
tomwalters@0	234 with time, at about 2.5 %/ms. Each channel of the buffer is assigned a
tomwalters@0	235 strobe unit which monitors activity in that channel looking for local
tomwalters@0	236 maxima in the stream of NAP pulses. When one is found, the unit
tomwalters@0	237 initiates temporal integration in that channel; that is, it transfers
tomwalters@0	238 a copy of the NAP at that instant to the corresponding channel of an
tomwalters@0	239 image buffer and adds it point-for-point with whatever is already
tomwalters@0	240 there. The local maximum itself is mapped to the 0-ms point in the
tomwalters@0	241 image buffer. The multi-channel version of this STI process produces
tomwalters@0	242 AIM's representation of our auditory image of a sound. Periodic and
tomwalters@0	243 quasi-periodic sounds cause regular strobing which leads to simulated
tomwalters@0	244 auditory images that are static, or nearly static, and which have the
tomwalters@0	245 same temporal resolution as the NAP. Dynamic sounds are represented
tomwalters@0	246 as a sequence of auditory image frames. If the rate of change in a
tomwalters@0	247 sound is not too rapid, as is diphthongs, features are seen to move
tomwalters@0	248 smoothly as the sound proceeds, much as characters move smoothly in
tomwalters@0	249 animated cartoons.
tomwalters@0	250
tomwalters@0	251 An alternative form of temporal integration is provided by the
tomwalters@0	252 correlogram (Slaney and Lyon, 1990; Meddis and Hewitt, 1991). It
tomwalters@0	253 extracts periodicity information and preserves intra-period fine
tomwalters@0	254 structure by autocorrelating each channel of the NAP. The correlogram
tomwalters@0	255 is the multi-channel version of this process. It was originally
tomwalters@0	256 introduced as a model of pitch perception (Licklider, 1951) with a
tomwalters@0	257 neural wiring diagram to illustrate that it was physiologically
tomwalters@0	258 plausible. To date, however, there is no physiological evidence for
tomwalters@0	259 autocorrelation in the auditory system, and the installation of the
tomwalters@0	260 module in the physiological route was a matter of convenience. The
tomwalters@0	261 current implementation is a recursive, or running, autocorrelation. A
tomwalters@0	262 functionally equivalent FFT-based method is also provided (Allerhand
tomwalters@0	263 and Patterson, 1992). A comparison of the correlogram in the bottom
tomwalters@0	264 panel of Fig. 3 with the auditory image in the bottom panel of Fig. 2
tomwalters@0	265 shows that the vowel structure is more symmetric in the correlogram
tomwalters@0	266 and there are larger level contrasts in the correlogram. It is not
tomwalters@0	267 yet known whether one of the representations is more realistic or more
tomwalters@0	268 useful. The present purpose is to note that the software package can
tomwalters@0	269 be used to compare auditory representations in a way not previously
tomwalters@0	270 possible.
tomwalters@0	271
tomwalters@0	272
tomwalters@0	273
tomwalters@0	274 II. THE SOFTWARE/HARDWARE PLATFORM
tomwalters@0	275
tomwalters@0	276 i. The software package: The code is distributed as a compressed
tomwalters@0	277 archive (in unix tar format), and can be obtained via ftp from the
tomwalters@0	278 address: ftp.mrc-apu.cam.ac.uk (Name=anonymous; Password=<your email
tomwalters@0	279 address>). All the software is contained in a single archive:
tomwalters@0	280 pub/aim/aim.tar.Z. The associated text file pub/aim/ReadMe contains
tomwalters@0	281 instructions for installing and compiling the software. The AIM
tomwalters@0	282 package consists of a makefile and several sub-directories. Five of
tomwalters@0	283 these (filter, glib, model, stitch and wdf) contain the C code for
tomwalters@0	284 AIM. An aim/tools directory contains C code for ancillary software
tomwalters@0	285 tools. These software tools are provided for pre/post-processing of
tomwalters@0	286 model input/output. A variety of functions are offered, including:
tomwalters@0	287 stimulus generation, signal processing, and data manipulation. An
tomwalters@0	288 aim/man directory contains on-line manual pages describing AIM and the
tomwalters@0	289 software tools. An aim/scripts directory contains demonstration
tomwalters@0	290 scripts for a guided tour through the model. Sounds used to test and
tomwalters@0	291 demonstrate the model are provided in the aim/waves directory. These
tomwalters@0	292 sounds were sampled at 20 kHz, and each sample is a 2-byte number in
tomwalters@0	293 little-endian byte order; a tool is provided to swap byte order when
tomwalters@0	294 necessary.
tomwalters@0	295
tomwalters@0	296 ii. System requirements: The software is written in C. The code
tomwalters@0	297 generated by the native C compilers included with Ultrix (version 4.3a
tomwalters@0	298 and above) and SunOS (version 4.1.3 and above) has been extensively
tomwalters@0	299 tested. The code from the GNU C compiler (version 2.5.7 and above) is
tomwalters@0	300 also reliable. The total disc usage of the AIM source code is about
tomwalters@0	301 700 kbytes. The package also includes 500 kbytes of sources for
tomwalters@0	302 ancillary software tools, and 200 kbytes of documentation. The
tomwalters@0	303 executable programs occupy about 1000 kbytes, and executable programs
tomwalters@0	304 for ancillary tools occupy 7000 kbytes. About 800 Kbytes of temporary
tomwalters@0	305 space are required for object files during compilation. The graphical
tomwalters@0	306 interface uses X11 (R4 and above) with either the OpenWindows or Motif
tomwalters@0	307 user interface. The programs can be compiled using the base Xlib
tomwalters@0	308 library (libX11.a), and will run on both 1- bit (mono) and multi-plane
tomwalters@0	309 (colour or greyscale) displays.
tomwalters@0	310
tomwalters@0	311 iii. Compilation and operation: The makefile includes targets to
tomwalters@0	312 compile the source code for AIM and the associated tools on a range of
tomwalters@0	313 machines (DEC, SUN, SGI, HP); the targets differ only in the pathnames
tomwalters@0	314 for the local X11 base library (libX11.a) and header files (X11/X.h
tomwalters@0	315 and X11/Xlib.h). AIM can be compiled without the display code if the
tomwalters@0	316 graphics interface is not required or if X11 is not available (make
tomwalters@0	317 noplot). The executable for AIM is called gen. Compilation also
tomwalters@0	318 generates symbolic links to gen, such as genbmm, gennap and gensai,
tomwalters@0	319 which are used to select the desired output (BMM, NAP or SAI). The
tomwalters@0	320 links and the executables for the aim/tools are installed in the
tomwalters@0	321 aim/bin directory after compilation. Options are specified as:
tomwalters@0	322 name=value on the command line; unspecified options are assigned
tomwalters@0	323 default values. The model output takes the form of binary data routed
tomwalters@0	324 by default to the model's graphical displays. Output can also be
tomwalters@0	325 routed to plotting hardware, or other post- processing software.
tomwalters@0	326
tomwalters@0	327
tomwalters@0	328
tomwalters@0	329 III. APPLICATIONS AND SUMMARY
tomwalters@0	330
tomwalters@0	331 In hearing research, the functional version of AIM has been used to
tomwalters@0	332 model phase perception (Patterson, 1987a), octave perception
tomwalters@0	333 (Patterson et al., 1993), and timbre perception (Patterson, 1994b).
tomwalters@0	334 The physiological version has been used to simulate cochlear hearing
tomwalters@0	335 loss (Gigure, Woodland, and Robinson, 1993; Gigure and Woodland,
tomwalters@0	336 1994b), and combination tones of cochlear origin (Gigure, Kunov, and
tomwalters@0	337 Smoorenburg, 1995). In speech research, the functional version has
tomwalters@0	338 been used to explain syllabic stress (Allerhand et al., 1992), and
tomwalters@0	339 both versions have been used as preprocessors for speech recognition
tomwalters@0	340 systems (e.g. Patterson, Anderson, and Allerhand, 1994; Gigure et
tomwalters@0	341 al., 1993). In summary, the AIM software package provides a modular
tomwalters@0	342 architecture for time- domain computational studies of peripheral
tomwalters@0	343 auditory processing.
tomwalters@0	344
tomwalters@0	345
tomwalters@0	346 * Instructions for acquiring the software package electronically are
tomwalters@0	347 presented in Section II. This document refers to AIM R7 which is the
tomwalters@0	348 first official release.
tomwalters@0	349
tomwalters@0	350
tomwalters@0	351 ACKNOWLEDGEMENTS
tomwalters@0	352
tomwalters@0	353 The gammatone filterbank, adaptive thresholding, and much of the
tomwalters@0	354 software platform were written by John Holdsworth; the options handler
tomwalters@0	355 is by Paul Manson, and the revised STI module by Jay Datta. Michael
tomwalters@0	356 Akeroyd extended the postscript facilities and developed the xreview
tomwalters@0	357 routine for auditory image cartoons. The software development was
tomwalters@0	358 supported by grants from DRA Farnborough (U.K.), Esprit BR 3207 (EEC),
tomwalters@0	359 and the Hearing Research Trust (U.K.). We thank Malcolm Slaney and
tomwalters@0	360 Michael Akeroyd for helpful comments on an earlier version of the
tomwalters@0	361 paper.
tomwalters@0	362
tomwalters@0	363
tomwalters@0	364 Allerhand, M., and Patterson, R.D. (1992). "Correlograms and auditory
tomwalters@0	365 images," Proc. Inst. Acoust. 14, 281-288.
tomwalters@0	366
tomwalters@0	367 Allerhand, M., Butterfield, S., Cutler, A., and Patterson, R.D.
tomwalters@0	368 (1992). "Assessing syllable strength via an auditory model," Proc.
tomwalters@0	369 Inst. Acoust. 14, 297-304.
tomwalters@0	370
tomwalters@0	371 Assmann, P.F., and Summerfield, Q. (1990). "Modelling the perception
tomwalters@0	372 of concurrent vowels: Vowels with different fundamental frequencies,"
tomwalters@0	373 J. Acoust. Soc. Am., 88, 680- 697.
tomwalters@0	374
tomwalters@0	375 Brown, G.J., and Cooke, M. (1994) "Computational auditory scene
tomwalters@0	376 analysis," Computer Speech and Language 8, 297-336.
tomwalters@0	377
tomwalters@0	378 Gigure, C., Woodland, P.C., and Robinson, A.J. (1993). "Application
tomwalters@0	379 of an auditory model to the computer simulation of hearing impairment:
tomwalters@0	380 Preliminary results," Can. Acoust. 21, 135-136.
tomwalters@0	381
tomwalters@0	382 Gigure, C., and Woodland, P.C. (1994a). "A computational model of
tomwalters@0	383 the auditory periphery for speech and hearing research. I. Ascending
tomwalters@0	384 path," J. Acoust. Soc. Am. 95, 331-342.
tomwalters@0	385
tomwalters@0	386 Gigure, C., and Woodland, P.C. (1994b). "A computational model of
tomwalters@0	387 the auditory periphery for speech and hearing research. II: Descending
tomwalters@0	388 paths,'' J. Acoust. Soc. Am. 95, 343-349.
tomwalters@0	389
tomwalters@0	390 Gigure, C., Kunov, H., and Smoorenburg, G.F. (1995). "Computational
tomwalters@0	391 modelling of psycho-acoustic combination tones and distortion-product
tomwalters@0	392 otoacoustic emissions," 15th Int. Cong. on Acoustics, Trondheim
tomwalters@0	393 (Norway), 26-30 June.
tomwalters@0	394
tomwalters@0	395 Glasberg, B.R., and Moore, B.C.J. (1990). "Derivation of auditory
tomwalters@0	396 filter shapes from notched-noise data," Hear. Res. 47, 103-38.
tomwalters@0	397
tomwalters@0	398 Greenwood, D.D. (1990). "A cochlear frequency-position function for
tomwalters@0	399 several species - 29 years later," J. Acoust. Soc. Am. 87, 2592-2605.
tomwalters@0	400
tomwalters@0	401 Holdsworth, J.W., and Patterson, R.D. (1991). "Analysis of
tomwalters@0	402 waveforms," UK Patent No. GB 2-234-078-A (23.1.91). London: UK
tomwalters@0	403 Patent Office.
tomwalters@0	404
tomwalters@0	405 Licklider, J. C. R. (1951). "A duplex theory of pitch perception,"
tomwalters@0	406 Experientia, 7, 128- 133.
tomwalters@0	407
tomwalters@0	408 Lutman, M.E. and Martin, A.M. (1979). "Development of an
tomwalters@0	409 electroacoustic analogue model of the middle ear and acoustic reflex,"
tomwalters@0	410 J. Sound. Vib. 64, 133-157.
tomwalters@0	411
tomwalters@0	412 Meddis, R. (1988). "Simulation of auditory-neural transduction:
tomwalters@0	413 Further studies," J. Acoust. Soc. Am. 83, 1056-1063.
tomwalters@0	414
tomwalters@0	415 Meddis, R. and Hewitt, M.J. (1991). "Modelling the perception of
tomwalters@0	416 concurrent vowels with different fundamental frequencies," J. Acoust.
tomwalters@0	417 Soc. Am. 91, 233-45.
tomwalters@0	418
tomwalters@0	419 Patterson, R.D. (1987). "A pulse ribbon model of monaural phase
tomwalters@0	420 perception," J. Acoust. Soc. Am., 82, 1560-1586.
tomwalters@0	421
tomwalters@0	422 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
tomwalters@0	423 J. Acoust. Soc. Am. 96, 1409-1418.
tomwalters@0	424
tomwalters@0	425 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
tomwalters@0	426 models." J. Acoust. Soc. Am. 96, 1419-1428.
tomwalters@0	427
tomwalters@0	428 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
tomwalters@0	429 sound quality," in: Advances in Hearing Research: Proceedings of the
tomwalters@0	430 10th International Symposium on Hearing, edited by G. Manley, G.
tomwalters@0	431 Klump, C. Koppl, H. Fastl, & H. Oeckinghaus, World Scientific,
tomwalters@0	432 Singapore, (in press).
tomwalters@0	433
tomwalters@0	434 Patterson, R.D., Anderson, T., and Allerhand, M. (1994). "The auditory
tomwalters@0	435 image model as a preprocessor for spoken language," in Proc. Third
tomwalters@0	436 ICSLP, Yokohama, Japan 1395- 1398.
tomwalters@0	437
tomwalters@0	438 Patterson, R.D., Milroy, R. and Allerhand, M. (1993). "What is the
tomwalters@0	439 octave of a harmonically rich note?" In: Proc. 2nd Int. Conf. on Music
tomwalters@0	440 and the Cognitive Sciences, edited by I. Cross and I Deliege (Harwood,
tomwalters@0	441 Switzerland) 69-81.
tomwalters@0	442
tomwalters@0	443 Patterson, R.D. and B.C.J. Moore (1986). "Auditory filters and
tomwalters@0	444 excitation patterns as representations of frequency resolution," in
tomwalters@0	445 Frequency Selectivity in Hearing, edited by B. C. J. Moore, (Academic,
tomwalters@0	446 London) pp. 123-177.
tomwalters@0	447
tomwalters@0	448 Patterson, R.D., Holdsworth, J. and Allerhand M. (1992) "Auditory
tomwalters@0	449 Models as preprocessors for speech recognition," In: The Auditory
tomwalters@0	450 Processing of Speech: From the auditory periphery to words, edited by
tomwalters@0	451 M. E. H. Schouten (Mouton de Gruyter, Berlin) 67-83.
tomwalters@0	452
tomwalters@0	453 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C.,
tomwalters@0	454 and Allerhand M. (1992) "Complex sounds and auditory images," In:
tomwalters@0	455 Auditory physiology and perception, edited by Y Cazals, L. Demany, and
tomwalters@0	456 K. Horner (Pergamon, Oxford) 429-446.
tomwalters@0	457
tomwalters@0	458 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
tomwalters@0	459 Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing,
tomwalters@0	460 Albuquerque, New Mexico, April 1990.
tomwalters@0	461
tomwalters@0	462
tomwalters@0	463 Figure 1. The three-stage structure of the AIM software package.
tomwalters@0	464 Left-hand column: functional route, right-hand column: physiological
tomwalters@0	465 route. For each module, the figure shows the function (bold type), the
tomwalters@0	466 implementation (in the rectangle), and the simulation it produces
tomwalters@0	467 (italics).
tomwalters@0	468
tomwalters@0	469 Figure 2. Responses of the model to the vowel in 'hat' processed
tomwalters@0	470 through the functional route: (top) basilar membrane motion, (middle)
tomwalters@0	471 neural activity pattern, and (bottom) auditory image.
tomwalters@0	472
tomwalters@0	473 Figure 3. Responses of the model to the vowel in 'hat' processed
tomwalters@0	474 through the physiological route: (top) basilar membrane motion,
tomwalters@0	475 (middle) neural activity pattern, and (bottom) autocorrelogram image.
tomwalters@0	476

Mercurial > hg > aim92

annotate docs/PAG95.doc @ 0:5242703e91d3 tip