comparison docs/aimStrobeCriterion @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:5242703e91d3
1 docs/aimStrobeCriterion (text)
2 scripts/aimStrobeCriterion (figures)
3
4
5 STROBED TEMPORAL INTEGRATION AND THE STABILISED AUDITORY IMAGE
6
7 Roy D. Patterson, Jay Datta and Mike Allerhand
8 MRC Applied Psychology Unit
9 15 Chaucer Road, Cambridge, CB2 2EF UK
10
11 email: roy.patterson, jay.datta or mike.allerhand @mrc-apu.cam.ac.uk
12
13 2 August 1995
14
15
16 ABSTRACT
17
18 This document describes the Strobed Temporal Integration
19 mechanism used to convert neural activity patterns into stabilised
20 auditory images. The specific version of the Auditory Image Model is
21 AIM R7, as described in Patterson, Allerhand, and Giguere (1995)
22
23
24
25 INTRODUCTION
26
27 When a periodic sound occurs with a pitch in the musical
28 range, the cochlea produces a detailed, multi-channel, time-interval
29 pattern that repeats once per cycle of the wave. The auditory images
30 that we hear in response to periodic sounds are perfectly stable.
31 That is, despite the fact that the level of activity in the neural
32 activity pattern is fluctuating over a large range within the course
33 of each cycle, the loudness of the sound is fixed. This indicates
34 that some form of temporal integration is applied to the NAP prior to
35 our initial perception of the sound. The auditory images of periodic
36 sounds can have a very rich timbre, or sound quality, that can reveal
37 a great deal about the sound source such as the quality of the musical
38 instrument or the finesse of the musician. This suggests that much of
39 the detailed time-interval information produced by the cochlea is
40 preserved in the stabilised auditory image.
41
42 The fact that we hear stable auditory images with rich sound
43 quality presents auditory theorists with a problem. The temporal
44 integration mechanism in traditional auditory models is a low-pass
45 filter that removes the fine-grain time-interval information from the
46 internal representation of the sound -- time interval information that
47 appears to be required for timbre perception. Strobed temporal
48 integration was introduced to solve this problem. At one and the same
49 time, it performs the temporal integration necessary to produce stable
50 auditory images and it preserved the majority of the time-interval
51 information observed in the neural activity pattern (NAP) produced by
52 the cochlea.
53
54 It is not a difficult problem to produce a high-resolution,
55 stabilised version of the NAP provided you know the moment in time at
56 which the pattern in the NAP will repeat. For example, consider the
57 NAP of the first note of the wave CEGC in Figure 0.1 from Patterson et
58 al. (1992). The wave is a train of clicks separated by 8-ms gaps; the
59 upper channels of the NAP show that the response is a sequence of
60 filter impulse responses spaced at 8 ms intervals. A stabilised
61 representation of the NAP can be produced by setting up an image
62 buffer that has the same number of channels as the NAP, and simply
63 transferring a copy of the pattern in each channel of the NAP to the
64 corresponding channel of the image buffer once every 8 ms. In the
65 NAP, the pattern flows from right to left as time progresses, and
66 since the cycles are continually entering the NAP from the right hand
67 side and exiting the NAP from the left hand side, the pattern after
68 every 8 ms is identical to the pattern 8 ms ago. So if the transfer
69 from the NAP to the auditory image is performed every 8 ms exactly,
70 successive contributions from the NAP to the image are all identical.
71
72 In the image buffer, activity does not move from right to
73 left, it simply decays into the floor exponentially over time with a
74 half life of about 30 ms. When a new contribution arrives from the
75 NAP, it is added point for point with whatever is currently in the
76 corresponding channel of the image buffer. In the current example,
77 after a copy of the NAP arrives in the auditory image, and during the
78 30 ms over which it would decay to half its original value, three more
79 copies of the NAP pattern arrive and are added into the auditory
80 image. Thus, for typical musical notes and typical vowels, the rate
81 of temporal integration from the NAP into the auditory image is high
82 and there is little time between successive integration events for the
83 image itself to decay. This is the source of the stability of the
84 auditory image.
85
86 Provided the integration is performed once per cycle of the
87 sound, the majority of the time-interval information in the NAP will
88 be preserved in the auditory image, thereby providing a solution to
89 the problem of how to produce stable images without removing the
90 fine-grain time-interval information associated with sound quality.
91 The auditory image produced by this process is shown in Figure 0.2
92 from Patterson et al. (1992). The transfer is performed on each
93 channel of the NAP separately and it is performed at the point in the
94 cycle where the activity in the NAP is a maximum. The maximum of the
95 most recent cycle to arrive in the NAP is added into the auditory
96 image at the 0-ms point, and as a result, the NAP peaks are aligned
97 vertically in the auditory image. This passive alignment process
98 explains the loss of global phase information observed empirically
99 (see Patterson, 1987, for a review).
100
101 Thus it would appear that the problem of converting the
102 oscillating NAP into a stabilised, high-resolution image reduces to
103 the problem of finding the pitch of the sound and performing temporal
104 integration at multiples of the pitch period. There are now a number
105 of computational auditory models with a proven ability to extract the
106 pitch of complex sounds (see Brown and Cook, 1994, for a review) and
107 they could be used to direct strobed temporal integration. However,
108 experiments with vowels (McKeown and Patterson, 1995; Robinson and
109 Patterson, 1995a) and musical notes (Robinson and Patterson, 1995b)
110 indicate that 4 to 8 cycles of the sound are required to produce an
111 accurate estimate of the pitch, whereas the sound quality information
112 necessary to identify a vowel or a musical instrument can be extracted
113 from one cycle of the wave. This suggests, that if the auditory
114 system does use strobed temporal integration to produce a stable, high
115 resolution auditory image, it does it with a mechanism that operates
116 more locally in time than pitch extraction mechanisms. This is the
117 background that led to the development of the strobed temporal
118 integration mechanism in the auditory image model.
119
120 In Sections 1 and 2 of this document, following Allerhand and
121 Patterson (1992), we describe two simple criteria for selecting strobe
122 points in the NAP and show that they produce auditory images that are
123 very similar to the correlograms produced by Assman and Summerfield
124 (1990), Slaney and Lyon (1990), or Meddis and Hewitt (1991a, 1991b).
125 The structures that arise in this form of auditory image are much more
126 symmetric than the corresponding structures in the NAP (Allerhand and
127 Patterson, 1992). There is mounting evidence, however, that the
128 auditory system is highly sensitive to temporal asymmetry (Patterson,
129 1994a, 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996),
130 and so the loss of asymmetry associated with the simple strobe
131 criterion seems likely to limit the value of this representation of
132 our perceptions. In the remaining Sections, an ordered sequence of
133 restrictions is added to the simple criteria for initiating temporal
134 integration, to restore asymmetry to the structures that arise in the
135 auditory image.
136
137
138 1. Strobe on Every Non-Zero Point in the NAP.
139
140 The initial criterion is very simple; temporal integration is
141 initiated on each and every non-zero point in the NAP. In AIM
142 software, the option that determines which strobe criterion will be
143 used is 'stcrit_ai' and it is set equal to one for this simplest
144 strobe criterion. Allerhand and Patterson (1992) showed that when
145 temporal integration from the NAP to the auditory image is initiated
146 on each and every non-zero point in the NAP function, the result is
147 very similar to a correlogram -- a representation that is commonly
148 used in time-domain models of hearing to extract the pitch of complex
149 periodic sounds (see Brown and Cook, 1994, for a review). For
150 example, compare the auditory image with stcrit_ai=1 (Figure 1.1) and
151 the correlogram (Figure 1.2) of the first note of the sound cegc.
152 Both figures show stabilised representations of the time-interval
153 pattern that the sound produces in the NAP, and in both cases, the
154 individual channels have been aligned vertically on the largest peak
155 in the NAP function. The patterns in the auditory image and the
156 correlogram both differ from the pattern in the NAP in one important
157 way; there is a reflection of the NAP pulses associated with the
158 ringing of the auditory filters, on the side opposite to where they
159 originally appear. That is, autocorrelation and STI with stcrit_ai=1
160 reduce the temporal asymmetry observed in the NAP. The asymmetry
161 information is not entirely removed but it is largely removed.
162 Experiments with sounds that have asymmetric temporal modulation show
163 that listeners are sensitive to temporal asymmetry (Patterson, 1994a,
164 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), and so
165 the removal of asymmetry information seems likely to prove a
166 disadvantage when attempting to explain auditory perception.
167
168 The autocorrelation process is symmetric in time by its very
169 nature. Mechanical processes that produce sound in the world are
170 typically asymmetric in time because they usually have some inertia.
171 Resonators struck impulsively ring after the pulse and not before.
172 This principle also applies to the processes that analyse the sound in
173 the auditory system. The impulse response of the auditory filter
174 rises faster than it falls; the adaptation process in the inner
175 haircell adapts up faster at the onset of a sound than it adapts down
176 after the sound passes. So asymmetry is the norm in the world and it
177 is not surprising that the auditory system is sensitive to it.
178
179
180 2. Strobe on the Peak of Each NAP pulse.
181
182 When temporal integration is initiated on every non-zero NAP
183 point, the successive NAP functions that are transferred to the
184 auditory image are highly correlated. This suggests that we could
185 attain essentially the same auditory image for vastly less computation
186 by restricting temporal integration to the larger points on the
187 individual NAP pulses. This leads, in turn, to the suggestion that
188 temporal integration be limited to the peak of the individual NAP
189 pulses. The result of this restriction is illustrated in Figure 2.1
190 which shows the auditory image of the first note of CEGC with this
191 more restricted strobe criterion. Since the peak restriction greatly
192 reduces the rate of temporal integration, the absolute levels of
193 structures in this form of auditory image are considerably lower than
194 those in the previous form of image. The pattern of time intervals,
195 however, is very similar in the two forms of auditory image. They
196 both preserve a detailed representation of the time-interval pattern
197 in the NAP, and, they both loose much of the asymmetry in the NAP.
198
199
200 3. Avoid Strobing in the Temporal Shadow after a large NAP Pulse.
201
202 The loss of asymmetry in the click-train structure of the
203 auditory image, arises when temporal integration is initiated on the
204 smaller NAP pulses associated with the ringing of the auditory filters
205 after each click in the train. This can be demonstrated by
206 introducing a fixed strobe threshold below which NAP peaks do not
207 initiate temporal integration, and progressively raising this strobe
208 threshold to exclude more and more of the lower level NAP pulses. (In
209 AIM, a fixed threshold is set with option stthresh_ai and
210 stcrit_ai=1.) The auditory image becomes less and less symmetric and
211 more and more like the original NAP pattern for the click train as the
212 strobe threshold is increased. Fixed thresholds of this sort are not
213 realistic for simulating the operation of auditory system, firstly
214 because the strobe threshold eventually exceeds the largest NAP pulse
215 and temporal integration ceases entirely, and secondly because, in the
216 natural environment, the levels of sounds are constantly changing.
217 Nevertheless, the example illustrates how NAP asymmetry is lost with
218 simple strobe criteria. The problem with autocorrelation is similar;
219 the correlation values at lags associated with the smaller NAP pulses
220 introduce symmetric reflections into structure that appear in the
221 correlogram.
222
223 An alternative means of restricting temporal integration to
224 the larger pulses in the NAP of the click train is to use an adaptive
225 strobe threshold which is temporally asymmetric. In the simplest
226 case, when the strobe unit monitoring a NAP channel encounters a
227 pulse, strobe threshold is set to the full height of the NAP pulse.
228 But following the peak threshold does not fall as fast as the NAP
229 function, rather it is restricted to decaying at a fixed percentage of
230 the peak height per ms. In AIM, the rate of decay is set to 5% per
231 ms, so the threshold decays faster after larger peaks, and in the
232 absence of further NAP peaks, returns to 0 in 20 ms. The NAP function
233 for the 1.0-kHz channel of the NAP is presented in Figure 3.1 along
234 with the adaptive threshold function. Together they illustrate what is
235 referred to as the "temporal shadow criterion" for strobed temporal
236 integration.
237
238 In the figure, the vertical lines below the abscissa of the
239 NAP function mark the NAP pulses that initiate temporal integration.
240 They show that the first NAP pulse strobes temporal integration and
241 strobe threshold is set to the peak height. It immediately begins to
242 decay, but then it encounters another NAP pulse that exceeds strobe
243 threshold and so the process of strobing temporal integration and
244 raising strobe threshold is promptly repeated. At this point,
245 however, strobe threshold is high relative to the NAP pulses and,
246 strobe threshold is falling more slowly than the NAP pulses, so the
247 algorithm proceeds through the rest of the cycle without encountering
248 another NAP pulse from the ringing part of the NAP function. In this
249 way, the strobe mechanism is synchronised to the period of the sound
250 even though no explicit information about the pitch of the sound is
251 provided to the strobe mechanism. It is the auditory image with the
252 temporal shaddow criterion that was presented originally in Figure
253 0.2. (stcrit_ai=3).
254
255 The 'temporal shadow criterion' produces stable auditory
256 images with accurate, asymmetry for a wide variety of naturally
257 occurring sounds like vowels and musical notes. The reason is that
258 the NAPs of these sounds have a restricted range of periods and within
259 those periods the asymmetry is typically characterised by the
260 rapid-rise/slow-fall form. There are, however, periodic sounds with
261 very low pitch and NAP functions that rise slowly over the course of
262 the period and fall rapidly at the end of the period, and the
263 perceptions produced by these sounds indicate that the auditory strobe
264 mechanism is somewhat more sophisticated than the temporal shadow
265 strobe mechanism. These "ramped" sounds are the subject of the next
266 section.
267
268
269 4. Avoid Temporal Integration on NAP Peaks Followed by Larger NAP Peaks.
270
271 A pair of the sounds that illustrate the limitations of the
272 temporal shadow criterion are presented in Figures 4.1a and 4.2a; the
273 former is an exponentially damped sinusoid that repeats every 25-ms,
274 the latter is an exponentially ramped sinusoid with the same envelope
275 period. The carrier frequency in this case is 800 Hz and the half
276 life of the exponential is 4-ms. The half life is on the same order
277 as the exponential decay of the impulse response of a gammatone
278 auditory filter with a centre frequency in the region of 800 Hz. The
279 example is taken from Patterson (1994a).
280
281 The neural activity patterns produced by the damped and ramped
282 sinusoids are shown in Figures 4.1b and 4.2b, respectively. The
283 frequency range of the filterbank is from an octave below the carrier
284 frequency to an octave above the carrier frequency. The highest and
285 lowest channels in Figure 4.1b show the transient response of the
286 filterbank to the onset of the damped sinusoid, and similarly the
287 high- and low-frequency channels in Figure 4.2b show the transient
288 response of the filterbank to the offset of the ramped sinusoid. In
289 the high-frequency channels, the onset response of the damped sinusoid
290 and the offset response of the ramped sinusoid are composed of impulse
291 responses from the individual auditory filters. The centre section of
292 each figure shows the response to the carrier. Here we see that the
293 asymmetry in the waveform is preserved in the NAP: in Figure 4.1b, the
294 carrier component is at its highest level just as the transient
295 response ends and the carrier component decays away over the course of
296 the period; in Figure 4.2b, the carrier activity rises over the course
297 of the ramped cycle and ends at its peak level in the transient
298 response.
299
300 Auditory images of these damped and ramped sinusoids are
301 presented in Figures 4.3 and 4.4, respectively. The upper rows show
302 the images obtained when the strobe initiates temporal integration on
303 every peak in the NAP; the middle rows show the images obtained with
304 the temporal shadow criterion. The images in the upper row illustrate
305 the problem of preserving NAP asymmetry during temporal integration.
306 When the mechanism strobes on every peak, the temporal asymmetry
307 observed in the NAP of the damped sinusoid is actually reversed in the
308 auditory image of the damped sinusoid (Figure 4.3a). In the case of
309 the ramped sinusoid, the asymmetry observed in the NAP is largely lost
310 in the image of the ramped sinusoid (Figure 4.4a); there is activity
311 at all time intervals in the central channels, whereas there is a gap
312 in activity in the NAP of the ramped sinusoid, once per cycle, just
313 after the abrupt reduction in amplitude. It is also the case that
314 there are irregular fringes along the edges of the main structure in
315 the auditory image of the ramped sinusoid (Figure 4.4a). This
316 provides further evidence that the time interval pattern in the NAP is
317 being disrupted by the temporal integration process in the
318 construction of the auditory image.
319
320 The introduction of the temporal shadow criterion for
321 initiating temporal integration produces a dramatic improvement in the
322 auditory image of the damped sinusoid (Figure 4.3b). The structure in
323 the image is highly asymmetric and, once the alignment process is
324 taken into account, the structure in the image is seen to be a very
325 faithful reproduction of that in the NAP. The imposition of the
326 temporal shadow criterion improves the auditory image of the ramped
327 sound (Figure 4.4b). in as much as it eliminates the fringes seen in
328 Figure 4.4a. But it does not solve the asymmetry problem. The
329 structure in the auditory image of Figure 4.4a is still more symmetric
330 than it is asymmetric, whereas the structure in the corresponding NAP
331 is highly asymmetric.
332
333 The source of the problem is illustrated in Figures 4.5a and
334 4.6a which show the NAPs and adaptive thresholds for 80-ms segments of
335 the damped and ramped sinusoids, respectively. The vertical markers
336 below the abscissa in Figure 4.5a show that after the first cycle, the
337 strobe mechanism is synchronised to the period of the wave and
338 initiates temporal integration once per cycle on the largest NAP peak.
339 So this criterion preserves the asymmetry of the damped sound in its
340 auditory image. In contrast, Figure 4.6a shows that on the way up the
341 ramped portion of each cycle, the rising NAP pulses repeatedly exceed
342 the adaptive threshold resulting in repeated initiation of temporal
343 integration. Since, in this region of the cycle, the mechanism
344 initiates temporal integration on every cycle, the auditory image does
345 not preserve the asymmetry observed in the corresponding NAP. The
346 irregular fringe is reduced because the mechanism reliably skips the
347 portion of the cycle where the level of activity in the NAP is
348 changing most rapidly.
349
350 The high rate of strobing revealed in Figure 4.6a means that
351 the level of activity in the ramped auditory image of Figure 4.4b is
352 considerably greater than that in the damped image (Figure 4.3b). It
353 does not show in those Figures because they have been normalised for
354 display purposes. In terms of the auditory model, however, the
355 greater overall level in the image of the ramped sound would lead to
356 the prediction that ramped sounds are considerably louder than damped
357 sounds, and this is not the case; they have roughly equal loudness.
358 All of these observations taken together suggest that the strobe rate
359 should be limited and that the limitation should favour larger NAP
360 peaks, closer to the local maximum.
361
362 The solution in this case is to delay temporal integration a
363 few milliseconds after each suprathreshold NAP pulse, to determine
364 whether another, larger, NAP pulse is about to occur. Specifically,
365 when a NAP peak is identified, it is labeled as a potential strobe
366 point, but the initiation of temporal integration is delayed for
367 several milliseconds. In AIM, the value is set with option
368 'stlag_ai'. If, during this time, no new larger NAP pulses are
369 encountered, the candidate strobe point is used to initiate temporal
370 integration. If a larger NAP pulse is encountered, it becomes the new
371 strobe candidate and replaces the previous strobe candidate, the
372 strobe lag is reset to stlag_ai ms and the process begins again. The
373 auditory images of damped and ramped sinusoids produced with this
374 'local-max' strobe criterion are shown in Figures 4.3c and 4.4c,
375 respectively. The strobe lag restriction has virtually no effect on
376 the auditory image of the damped sinusoid, but it improves the image
377 of the ramped sinusoid markedly. The asymmetry observed in the NAP of
378 the ramped sinusoid is now preserved in its auditory image.
379
380 The NAP functions and the adaptive thresholds for the damped
381 and ramped sinusoids are shown in Figures 4.5b and 4.6b, respectively.
382 A comparison of the strobe points for the damped sinusoid under the
383 temporal shadow criterion (Figure 4.5a) and the local max criterion
384 (Figure 4.5b) shows that there is one small difference; the very first
385 strobe point under the temporal shadow criterion is omitted under the
386 local max criterion because a larger NAP pulse follows it within
387 stlag_ai ms. So the second NAP pulse replaces the first as the strobe
388 candidate. In the case of the ramped sinusoid, shifting to the local
389 max criterion has a dramatic effect. The NAP functions and adaptive
390 thresholds in Figures 4.6a and 7.6b are identical, but most of the
391 strobe points identified under the temporal shadow criterion (Figure
392 4.6a) are immediately followed by larger NAP pulses as we proceed up
393 the ramp. As a result the majority of the candidate pulses are
394 repressed in favour of the one that occurs at the offset of the ramp.
395 So, with the exception of the onset of the sound, the mechanism
396 synchronises to the period of the sound and there is one strobe per
397 cycle of the sound. The local max criterion also leads to damped and
398 ramped auditory images with roughly the same level of activity in the
399 auditory image, and so it is also a better predictor of the loudness
400 of these sounds. Finally, note that the strobe lag restricts the
401 maximum strobe rate of the mechanism. This is important because,
402 without it, the level of a sinusoid would increase with its frequency
403 in the auditory image.
404
405
406 5. Limiting the Lag of the Local Max Criterion.
407
408 In the second experiment with damped and ramped sinusoids
409 (Patterson, 1994b), the longest envelope period was 100-ms, and in
410 that condition, the distinction between damped and ramped sinusoids is
411 audible for half lives as long as 64 ms. In channels near the carrier
412 frequency, the NAP function produced by the ramped sinusoid is a long,
413 slowly rising, sequence of peaks. The local-max strobe criterion
414 delays temporal integration to the end of the ramp and initiates
415 temporal integration once per cycle, as previously, with the 25-ms
416 envelope stimuli. The example, however, raises the question of what
417 would happen in the case of a very long duration slowly rising tone,
418 say a tone that rises from absolute threshold to 80 dB SPL over the
419 course of 5 seconds. A listener would undoubtedly hear the sound
420 shortly after it comes on, and hear its loudness increase
421 progressively over the course of the 5-second rise. The local-max
422 strobe mechanism would initiate temporal integration once, shortly
423 after the onset of the sound, because of overshoot in the neural
424 encoding stage of AIM. But thereafter, it would suppress temporal
425 integration throughout the rise of the NAP function and strobe once at
426 the end of the rise. Thus the auditory image would be empty at a time
427 when we know the listener would hear the tone. To solve this problem,
428 the strobe lag of the local max mechanism is limited to twice the
429 stlag_ai value; that is, after a NAP pulse becomes a strobe candidate,
430 either that NAP pulse or a larger one must initiate temporal
431 integration within the next 2*stlag_ai ms. So the strobe lag restricts
432 not only the maximum strobe rate for static sinusoids, but also the
433 minimum strobe rate for slowly increasing sinusoids.
434
435
436 6. Aperiodic Strobing and Irregularity in the Auditory Image.
437
438 To this point, the discussion of strobe criteria has focussed
439 on activity in the carrier channel of the NAP and auditory image, and
440 the relationship between strobe criteria and the preservation of NAP
441 asymmetry through temporal integration. It was noted in passing,
442 that, away from the carrier channel, auditory images of ramped sounds
443 have fringes of irregular activity, for all strobe criteria prior to
444 the local max criterion. We might expect such fringes to impart a
445 roughness or noisy quality to the perception of ramped sounds, but
446 typically they are static and clear. In this final Section, the
447 activity produced by a ramped sinusoid in the 640 Hz channel of the
448 NAP and auditory image is examined, to illustrate the relationship
449 between strobe restrictions and the fringe of irregularity in the
450 auditory image.
451
452 The NAP produced in the 640 Hz channel of the filterbank by a
453 ramped sinusoid with an 800-Hz carrier, a 25-ms envelope period, and a
454 4-ms half life is shown in Figure 6.1. The level of the ramped
455 sinusoid rises rapidly, relative to the decay rate of the impulse
456 response of the auditory filter and, as a result, the activity in the
457 rising part of the NAP is dominated by carrier-period time intervals
458 (Patterson, 1994a). When the amplitude of the ramped sinusoid drops
459 abruptly, the energy stored in the filter decays away in a wave with
460 periods appropriate to the centre frequency of the channel. Now
461 consider the activity produced by this NAP in the 640-Hz channel of
462 the auditory image for strobe criteria 2, 3 and 4, the 'every peak',
463 'temporal shaddow,' and 'local max' criteria, respectively.
464
465 Figure 6.2a shows the case where there is no adaptive
466 threshold and the mechanism strobes on the peak of every NAP pulse.
467 This is the version of STI most similar to autocorrelation. Strobing
468 on every peak causes carrier periods from the ramp to be mixed with
469 centre-frequency periods after the offset of the ramp. This is the
470 source of the irregularity in Fig. 6.2a, and the source of the
471 irregular fringe in the full auditory image (Fig. 4.5a) (Allerhand and
472 Patterson, 1992).
473
474 The activity produced with the temporal shadow criterion is
475 shown in the Figure 6.2b. The adaptive threshold function and the
476 strobe points shown with the NAP in Fig. 6.1 were generated with the
477 temporal shaddow criterion. In this case, the mechanism initiates
478 temporal integration on each peak in the ramped portion of the NAP,
479 but it skips the peaks associated with the ringing of the filter after
480 the ramp terminates. Strobing occurs in synchrony with the carrier
481 periods in the ramped portion of the NAP and this removes the
482 irregularity from the ramped portion of the auditory image between 0
483 ms and about 10 ms. There is still irregularity in the region from 0
484 to -10 ms, and in the region from 25 to 15 ms, because strobing in
485 synchrony with the carrier period mixes carrier periods and centre
486 frequency periods in this region of the image.
487
488 A further improvement occurs when the local max criterion is
489 introduced and strobing on successive carrier periods of the ramped
490 section of the NAP is suppressed. The activity in the 640-Hz channel
491 of the image is shown in Figure 6.2c. The irregular activity has been
492 removed; the image shows carrier periods to the left of the 0-ms point
493 and centre frequency periods to the right of the 0-ms point. Thus,
494 strobing on local maxima synchronises temporal integration to the
495 period of the wave and preserves not only the basic asymmetry of the
496 NAP, but also the contrasting time interval patterns associated with
497 different sections of the NAP cycle.
498
499
500
501 REFERENCES
502
503 Akeroyd, M.A. and Patterson, R.D. (1995). "Discrimination of wideband
504 noises modulated by a temporally asymmetric function,"
505 J. Acoust. Soc. Am. (in press).
506
507 Assman, P. F. and Q. Summerfield (1990). "Modelling the perception of
508 concurrent vowels: Vowels with different fundamental frequencies,"
509 J. Acoust. Soc. Am. 88, 680-697.
510
511 Brown, G.J. and Cooke, M. (1994). "Computational auditory scene
512 analysis," Computer Speech and Language 8, 297-336.
513
514 Irino, T. and Patterson, R.D. (1996). "Temporal asymmetry in the
515 auditory system," J. Acoust. Soc. Am. (revision submitted
516 August 95).
517
518 McKeown, D. and Patterson, R.D. (1995). "The time course of auditory
519 segregation: concurrant vowels that vary in duration,"
520 J. Acoust. Soc. Am. (in press).
521
522 Meddis, R. and M. J. Hewitt (1991a). "Virtual pitch and phase
523 sensitivity of a computer model of the auditory periphery: I
524 pitch identification," J. Acoust. Soc. Am. 89, 2866-82.
525
526 Meddis, R. and M. J. Hewitt (1991b). "Virtual pitch and phase
527 sensitivity of a computer model of the auditory periphery: II
528 phase sensitivity," J. Acoust. Soc. Am. 89, 2883-94.
529
530 Patterson, R.D. (1987b). "A pulse ribbon model of monaural
531 phase perception," J. Acoust. Soc. Am. 82, 1560-1586.
532
533 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang,
534 C. and Allerhand M. (1992) "Complex sounds and auditory images,"
535 In: Auditory physiology and perception, Y Cazals, L. Demany,
536 K. Horner (eds), Pergamon, Oxford, 429-446.
537
538 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
539 J. Acoust. Soc. Am. 96, 1409-1418.
540
541 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
542 models." J. Acoust. Soc. Am. 96, 1419-1428.
543
544 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
545 sound quality," in: Advances in Hearing Research: Proceedings of
546 the 10th International Symposium on Hearing, G. Manley, G. Klump,
547 C. Koppl, H. Fastl, & H. Oeckinghaus, (Eds). World Scientific,
548 Singapore, (in press).
549
550 Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain
551 modelling of peripheral auditory processing: A modular architecture
552 and a software platform," J. Acoust. Soc. Am. 98, (in press).
553
554 Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to
555 identify the instrument, the octave, or the pitch-chroma of a
556 musical note," Music Perception (in press).
557
558 Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to
559 identify vowels, their octave, and their pitch-chroma," J. Acoust. Soc.
560 Am 98, (in press).
561
562 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
563 Proc. IEEE Int. Conf. Acoust. Speech Signal Processing,
564 Albuquerque, New Mexico.
565
566
567
568
569 ===========================================================================
570 #!/bin/sh
571
572 # script/aimStrobeCriterion
573 # Annotated script for generating the figures in docs/aimStrobeCriterion
574
575 echo "FIGURES FOR SECTION 0"
576
577 mv .gennaprc .oldgennaprc # a safety precaution
578 mv .gensairc .oldgensairc # a safety precaution
579 echo | gennap powc=off -update # make sure that powc is off
580 echo | gensai powc=off -update # make sure that powc is off
581
582 echo
583 echo "FIGURES FOR SECTION 0"
584 echo "Figure 0.1: Neural Activity Pattern (NAP) of cegc"
585 gennap input=cegc_br top=3000 swap=off bits=12 gain_gtf=4 # all default values
586
587 echo "Figure 0.2: Stabilised Auditory Image (SAI) of cegc"
588 gensai stcrit=3 input=cegc_br length=100ms frstep_aid=96ms top=2500
589
590 echo
591 echo "FIGURES FOR SECTION 1"
592
593 echo "Figure 1.1 SAI of cegc strobing on every non-zero point in the NAP"
594 echo " (stcrit_ai=1). This one is slow to calculate."
595 gensai stcrit_ai=1 top=17000 input=cegc_br length=100ms frstep_aid=96ms
596
597 # Top has to be raised because this strobe criterion causes constant
598 # temporal integration.
599
600
601 echo "Figure 1.2: SAI via autocorrelation -- a correlogram"
602 echo | gennap input=cegc_br display=off length=125ms top=3000 output=stdout > cegc_br_gtf.nap
603 #gennap -use start=48 display=on cegc_br_gtf # optional display of the NAP
604 # After making a NAP with display=off, gennap -use requires you to set display=on.
605
606 acgram start=50 wid=70ms lag=35ms frames=1 scale=.02 cegc_br_gtf.nap > cegc_gtf.sai
607 gensai -use top=5000 input=cegc_gtf
608
609 rm cegc_br_gtf.nap cegc_gtf.sai
610
611 echo
612 echo "FIGURES FOR SECTION 2"
613
614 echo "Figure 2.1: SAI of cegc strobing on the peak of every NAP pulse"
615 echo " (stcrit_ai=2)"
616 gensai stcrit_ai=2 top=10000 input=cegc_br length=100ms frstep_aid=96ms
617
618 echo
619 echo "FIGURES FOR SECTION 3"
620
621 echo "Demonstration of preservation of asymmetry when stthresh is elevated"
622 # Note stthresh only operates when stcrit_ai=1.
623 gensai stcrit_ai=1 top=5000 input=cegc_br length=68ms frstep_aid=66ms stthresh_ai=5000
624
625 echo "Figure 3.1: NAP of cegc with temporal shaddow criterion (stcrit_ai=3)"
626 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
627 StrobeCriterionDisplay cegc_br 1000 100 3 2.5 17000 2000
628
629 # Type 'StrobeCriterionDisplay -help' for a listing of the options and
630 # their order.
631 # Control of Xplots:
632 # Click mouse button 1 to display coordinates of points.
633 # Click mouse button 2 to redraw.
634 # Click mouse button 3 to remove the display (i.e. quit).
635
636 echo
637 echo "FIGURES FOR SECTION 4"
638
639 echo "Figure 4.1a: Waveform of Damped Sinusoid (4 cycles)"
640 genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_d swap=on
641
642 echo "Figure 4.2a: Waveform of Ramped Sinusoid (4 cycles)"
643 genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_r swap=on
644
645 echo "Figure 4.1b: NAP of the Damped Sinusoid (2 cycles)"
646 gennap input=dr_f8_t4_d gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > damped.nap
647 gennap -use start=50 leng=50 display=on damped
648
649 echo "Figure 4.2b: NAP of the Ramped Sinusoid (2 cycles)"
650 gennap input=dr_f8_t4_r gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > ramped.nap
651 gennap -use start=60 leng=50 display=on ramped
652
653 rm damped.nap ramped.nap
654
655 echo "Figure 4.3a: SAI of the Damped Sinusoid strobing on every NAP peak"
656 echo " (stcrit_ai=2)"
657 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
658
659 echo "Figure 4.4a: SAI of the Ramped Sinusoid strobing on every NAP peak"
660 echo " (stcrit_ai=2)"
661 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
662
663 echo "Figure 4.3b: SAI of the Damped Sinusoid with temporal shaddow criterion"
664 echo " (stcrit_ai=3)"
665 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=1000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
666
667 echo "Figure 4.4b: SAI of the Ramped Sinusoid with temporal shaddow criterion"
668 echo " (stcrit_ai=3)"
669 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
670
671 echo "Figure 4.3c: SAI of the Damped Sinusoid with the local max criterion"
672 echo " (stcrit_ai=4)"
673 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
674
675 echo "Figure 4.4c: SAI of the Ramped Sinusoid with the local max criterion"
676 echo " (stcrit_ai=4)"
677 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
678
679 echo | gennap swap=on bits=16 gain_gtf=0.0625 -update
680 echo | gensai swap=on bits=16 gain_gtf=0.0625 -update
681
682
683 echo "Figure 4.5a: NAP of Damped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
684 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
685 StrobeCriterionDisplay dr_f8_t4_d 800 120 3 2.5 14000 2400
686
687 echo "Figure 4.5b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
688 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
689 StrobeCriterionDisplay dr_f8_t4_d 800 120 4 2.5 14000 2400
690
691 echo "Figure 4.6a: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
692 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
693 StrobeCriterionDisplay dr_f8_t4_r 800 120 3 2.5 7500 2400
694
695 echo "Figure 4.6b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
696 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
697 StrobeCriterionDisplay dr_f8_t4_r 800 120 4 2.5 7500 2400
698
699 echo
700 echo "FIGURES FOR SECTION 5"
701
702 echo
703 echo "FIGURES FOR SECTION 6"
704
705 echo "Figure 6.1: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
706 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
707 StrobeCriterionDisplay dr_f8_t4_r 640 120 3 2.5 7000 2000
708
709 echo "Figure 6.2a: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=2)"
710 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=32000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
711
712 echo "Figure 6.2b: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=3)"
713 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=10000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
714 echo "Figure 6.2c: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=4)"
715 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=1200 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
716
717