Mercurial > hg > aim92
comparison docs/aimStrobeCriterion @ 0:5242703e91d3 tip
Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author | tomwalters |
---|---|
date | Fri, 20 May 2011 15:19:45 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5242703e91d3 |
---|---|
1 docs/aimStrobeCriterion (text) | |
2 scripts/aimStrobeCriterion (figures) | |
3 | |
4 | |
5 STROBED TEMPORAL INTEGRATION AND THE STABILISED AUDITORY IMAGE | |
6 | |
7 Roy D. Patterson, Jay Datta and Mike Allerhand | |
8 MRC Applied Psychology Unit | |
9 15 Chaucer Road, Cambridge, CB2 2EF UK | |
10 | |
11 email: roy.patterson, jay.datta or mike.allerhand @mrc-apu.cam.ac.uk | |
12 | |
13 2 August 1995 | |
14 | |
15 | |
16 ABSTRACT | |
17 | |
18 This document describes the Strobed Temporal Integration | |
19 mechanism used to convert neural activity patterns into stabilised | |
20 auditory images. The specific version of the Auditory Image Model is | |
21 AIM R7, as described in Patterson, Allerhand, and Giguere (1995) | |
22 | |
23 | |
24 | |
25 INTRODUCTION | |
26 | |
27 When a periodic sound occurs with a pitch in the musical | |
28 range, the cochlea produces a detailed, multi-channel, time-interval | |
29 pattern that repeats once per cycle of the wave. The auditory images | |
30 that we hear in response to periodic sounds are perfectly stable. | |
31 That is, despite the fact that the level of activity in the neural | |
32 activity pattern is fluctuating over a large range within the course | |
33 of each cycle, the loudness of the sound is fixed. This indicates | |
34 that some form of temporal integration is applied to the NAP prior to | |
35 our initial perception of the sound. The auditory images of periodic | |
36 sounds can have a very rich timbre, or sound quality, that can reveal | |
37 a great deal about the sound source such as the quality of the musical | |
38 instrument or the finesse of the musician. This suggests that much of | |
39 the detailed time-interval information produced by the cochlea is | |
40 preserved in the stabilised auditory image. | |
41 | |
42 The fact that we hear stable auditory images with rich sound | |
43 quality presents auditory theorists with a problem. The temporal | |
44 integration mechanism in traditional auditory models is a low-pass | |
45 filter that removes the fine-grain time-interval information from the | |
46 internal representation of the sound -- time interval information that | |
47 appears to be required for timbre perception. Strobed temporal | |
48 integration was introduced to solve this problem. At one and the same | |
49 time, it performs the temporal integration necessary to produce stable | |
50 auditory images and it preserved the majority of the time-interval | |
51 information observed in the neural activity pattern (NAP) produced by | |
52 the cochlea. | |
53 | |
54 It is not a difficult problem to produce a high-resolution, | |
55 stabilised version of the NAP provided you know the moment in time at | |
56 which the pattern in the NAP will repeat. For example, consider the | |
57 NAP of the first note of the wave CEGC in Figure 0.1 from Patterson et | |
58 al. (1992). The wave is a train of clicks separated by 8-ms gaps; the | |
59 upper channels of the NAP show that the response is a sequence of | |
60 filter impulse responses spaced at 8 ms intervals. A stabilised | |
61 representation of the NAP can be produced by setting up an image | |
62 buffer that has the same number of channels as the NAP, and simply | |
63 transferring a copy of the pattern in each channel of the NAP to the | |
64 corresponding channel of the image buffer once every 8 ms. In the | |
65 NAP, the pattern flows from right to left as time progresses, and | |
66 since the cycles are continually entering the NAP from the right hand | |
67 side and exiting the NAP from the left hand side, the pattern after | |
68 every 8 ms is identical to the pattern 8 ms ago. So if the transfer | |
69 from the NAP to the auditory image is performed every 8 ms exactly, | |
70 successive contributions from the NAP to the image are all identical. | |
71 | |
72 In the image buffer, activity does not move from right to | |
73 left, it simply decays into the floor exponentially over time with a | |
74 half life of about 30 ms. When a new contribution arrives from the | |
75 NAP, it is added point for point with whatever is currently in the | |
76 corresponding channel of the image buffer. In the current example, | |
77 after a copy of the NAP arrives in the auditory image, and during the | |
78 30 ms over which it would decay to half its original value, three more | |
79 copies of the NAP pattern arrive and are added into the auditory | |
80 image. Thus, for typical musical notes and typical vowels, the rate | |
81 of temporal integration from the NAP into the auditory image is high | |
82 and there is little time between successive integration events for the | |
83 image itself to decay. This is the source of the stability of the | |
84 auditory image. | |
85 | |
86 Provided the integration is performed once per cycle of the | |
87 sound, the majority of the time-interval information in the NAP will | |
88 be preserved in the auditory image, thereby providing a solution to | |
89 the problem of how to produce stable images without removing the | |
90 fine-grain time-interval information associated with sound quality. | |
91 The auditory image produced by this process is shown in Figure 0.2 | |
92 from Patterson et al. (1992). The transfer is performed on each | |
93 channel of the NAP separately and it is performed at the point in the | |
94 cycle where the activity in the NAP is a maximum. The maximum of the | |
95 most recent cycle to arrive in the NAP is added into the auditory | |
96 image at the 0-ms point, and as a result, the NAP peaks are aligned | |
97 vertically in the auditory image. This passive alignment process | |
98 explains the loss of global phase information observed empirically | |
99 (see Patterson, 1987, for a review). | |
100 | |
101 Thus it would appear that the problem of converting the | |
102 oscillating NAP into a stabilised, high-resolution image reduces to | |
103 the problem of finding the pitch of the sound and performing temporal | |
104 integration at multiples of the pitch period. There are now a number | |
105 of computational auditory models with a proven ability to extract the | |
106 pitch of complex sounds (see Brown and Cook, 1994, for a review) and | |
107 they could be used to direct strobed temporal integration. However, | |
108 experiments with vowels (McKeown and Patterson, 1995; Robinson and | |
109 Patterson, 1995a) and musical notes (Robinson and Patterson, 1995b) | |
110 indicate that 4 to 8 cycles of the sound are required to produce an | |
111 accurate estimate of the pitch, whereas the sound quality information | |
112 necessary to identify a vowel or a musical instrument can be extracted | |
113 from one cycle of the wave. This suggests, that if the auditory | |
114 system does use strobed temporal integration to produce a stable, high | |
115 resolution auditory image, it does it with a mechanism that operates | |
116 more locally in time than pitch extraction mechanisms. This is the | |
117 background that led to the development of the strobed temporal | |
118 integration mechanism in the auditory image model. | |
119 | |
120 In Sections 1 and 2 of this document, following Allerhand and | |
121 Patterson (1992), we describe two simple criteria for selecting strobe | |
122 points in the NAP and show that they produce auditory images that are | |
123 very similar to the correlograms produced by Assman and Summerfield | |
124 (1990), Slaney and Lyon (1990), or Meddis and Hewitt (1991a, 1991b). | |
125 The structures that arise in this form of auditory image are much more | |
126 symmetric than the corresponding structures in the NAP (Allerhand and | |
127 Patterson, 1992). There is mounting evidence, however, that the | |
128 auditory system is highly sensitive to temporal asymmetry (Patterson, | |
129 1994a, 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), | |
130 and so the loss of asymmetry associated with the simple strobe | |
131 criterion seems likely to limit the value of this representation of | |
132 our perceptions. In the remaining Sections, an ordered sequence of | |
133 restrictions is added to the simple criteria for initiating temporal | |
134 integration, to restore asymmetry to the structures that arise in the | |
135 auditory image. | |
136 | |
137 | |
138 1. Strobe on Every Non-Zero Point in the NAP. | |
139 | |
140 The initial criterion is very simple; temporal integration is | |
141 initiated on each and every non-zero point in the NAP. In AIM | |
142 software, the option that determines which strobe criterion will be | |
143 used is 'stcrit_ai' and it is set equal to one for this simplest | |
144 strobe criterion. Allerhand and Patterson (1992) showed that when | |
145 temporal integration from the NAP to the auditory image is initiated | |
146 on each and every non-zero point in the NAP function, the result is | |
147 very similar to a correlogram -- a representation that is commonly | |
148 used in time-domain models of hearing to extract the pitch of complex | |
149 periodic sounds (see Brown and Cook, 1994, for a review). For | |
150 example, compare the auditory image with stcrit_ai=1 (Figure 1.1) and | |
151 the correlogram (Figure 1.2) of the first note of the sound cegc. | |
152 Both figures show stabilised representations of the time-interval | |
153 pattern that the sound produces in the NAP, and in both cases, the | |
154 individual channels have been aligned vertically on the largest peak | |
155 in the NAP function. The patterns in the auditory image and the | |
156 correlogram both differ from the pattern in the NAP in one important | |
157 way; there is a reflection of the NAP pulses associated with the | |
158 ringing of the auditory filters, on the side opposite to where they | |
159 originally appear. That is, autocorrelation and STI with stcrit_ai=1 | |
160 reduce the temporal asymmetry observed in the NAP. The asymmetry | |
161 information is not entirely removed but it is largely removed. | |
162 Experiments with sounds that have asymmetric temporal modulation show | |
163 that listeners are sensitive to temporal asymmetry (Patterson, 1994a, | |
164 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), and so | |
165 the removal of asymmetry information seems likely to prove a | |
166 disadvantage when attempting to explain auditory perception. | |
167 | |
168 The autocorrelation process is symmetric in time by its very | |
169 nature. Mechanical processes that produce sound in the world are | |
170 typically asymmetric in time because they usually have some inertia. | |
171 Resonators struck impulsively ring after the pulse and not before. | |
172 This principle also applies to the processes that analyse the sound in | |
173 the auditory system. The impulse response of the auditory filter | |
174 rises faster than it falls; the adaptation process in the inner | |
175 haircell adapts up faster at the onset of a sound than it adapts down | |
176 after the sound passes. So asymmetry is the norm in the world and it | |
177 is not surprising that the auditory system is sensitive to it. | |
178 | |
179 | |
180 2. Strobe on the Peak of Each NAP pulse. | |
181 | |
182 When temporal integration is initiated on every non-zero NAP | |
183 point, the successive NAP functions that are transferred to the | |
184 auditory image are highly correlated. This suggests that we could | |
185 attain essentially the same auditory image for vastly less computation | |
186 by restricting temporal integration to the larger points on the | |
187 individual NAP pulses. This leads, in turn, to the suggestion that | |
188 temporal integration be limited to the peak of the individual NAP | |
189 pulses. The result of this restriction is illustrated in Figure 2.1 | |
190 which shows the auditory image of the first note of CEGC with this | |
191 more restricted strobe criterion. Since the peak restriction greatly | |
192 reduces the rate of temporal integration, the absolute levels of | |
193 structures in this form of auditory image are considerably lower than | |
194 those in the previous form of image. The pattern of time intervals, | |
195 however, is very similar in the two forms of auditory image. They | |
196 both preserve a detailed representation of the time-interval pattern | |
197 in the NAP, and, they both loose much of the asymmetry in the NAP. | |
198 | |
199 | |
200 3. Avoid Strobing in the Temporal Shadow after a large NAP Pulse. | |
201 | |
202 The loss of asymmetry in the click-train structure of the | |
203 auditory image, arises when temporal integration is initiated on the | |
204 smaller NAP pulses associated with the ringing of the auditory filters | |
205 after each click in the train. This can be demonstrated by | |
206 introducing a fixed strobe threshold below which NAP peaks do not | |
207 initiate temporal integration, and progressively raising this strobe | |
208 threshold to exclude more and more of the lower level NAP pulses. (In | |
209 AIM, a fixed threshold is set with option stthresh_ai and | |
210 stcrit_ai=1.) The auditory image becomes less and less symmetric and | |
211 more and more like the original NAP pattern for the click train as the | |
212 strobe threshold is increased. Fixed thresholds of this sort are not | |
213 realistic for simulating the operation of auditory system, firstly | |
214 because the strobe threshold eventually exceeds the largest NAP pulse | |
215 and temporal integration ceases entirely, and secondly because, in the | |
216 natural environment, the levels of sounds are constantly changing. | |
217 Nevertheless, the example illustrates how NAP asymmetry is lost with | |
218 simple strobe criteria. The problem with autocorrelation is similar; | |
219 the correlation values at lags associated with the smaller NAP pulses | |
220 introduce symmetric reflections into structure that appear in the | |
221 correlogram. | |
222 | |
223 An alternative means of restricting temporal integration to | |
224 the larger pulses in the NAP of the click train is to use an adaptive | |
225 strobe threshold which is temporally asymmetric. In the simplest | |
226 case, when the strobe unit monitoring a NAP channel encounters a | |
227 pulse, strobe threshold is set to the full height of the NAP pulse. | |
228 But following the peak threshold does not fall as fast as the NAP | |
229 function, rather it is restricted to decaying at a fixed percentage of | |
230 the peak height per ms. In AIM, the rate of decay is set to 5% per | |
231 ms, so the threshold decays faster after larger peaks, and in the | |
232 absence of further NAP peaks, returns to 0 in 20 ms. The NAP function | |
233 for the 1.0-kHz channel of the NAP is presented in Figure 3.1 along | |
234 with the adaptive threshold function. Together they illustrate what is | |
235 referred to as the "temporal shadow criterion" for strobed temporal | |
236 integration. | |
237 | |
238 In the figure, the vertical lines below the abscissa of the | |
239 NAP function mark the NAP pulses that initiate temporal integration. | |
240 They show that the first NAP pulse strobes temporal integration and | |
241 strobe threshold is set to the peak height. It immediately begins to | |
242 decay, but then it encounters another NAP pulse that exceeds strobe | |
243 threshold and so the process of strobing temporal integration and | |
244 raising strobe threshold is promptly repeated. At this point, | |
245 however, strobe threshold is high relative to the NAP pulses and, | |
246 strobe threshold is falling more slowly than the NAP pulses, so the | |
247 algorithm proceeds through the rest of the cycle without encountering | |
248 another NAP pulse from the ringing part of the NAP function. In this | |
249 way, the strobe mechanism is synchronised to the period of the sound | |
250 even though no explicit information about the pitch of the sound is | |
251 provided to the strobe mechanism. It is the auditory image with the | |
252 temporal shaddow criterion that was presented originally in Figure | |
253 0.2. (stcrit_ai=3). | |
254 | |
255 The 'temporal shadow criterion' produces stable auditory | |
256 images with accurate, asymmetry for a wide variety of naturally | |
257 occurring sounds like vowels and musical notes. The reason is that | |
258 the NAPs of these sounds have a restricted range of periods and within | |
259 those periods the asymmetry is typically characterised by the | |
260 rapid-rise/slow-fall form. There are, however, periodic sounds with | |
261 very low pitch and NAP functions that rise slowly over the course of | |
262 the period and fall rapidly at the end of the period, and the | |
263 perceptions produced by these sounds indicate that the auditory strobe | |
264 mechanism is somewhat more sophisticated than the temporal shadow | |
265 strobe mechanism. These "ramped" sounds are the subject of the next | |
266 section. | |
267 | |
268 | |
269 4. Avoid Temporal Integration on NAP Peaks Followed by Larger NAP Peaks. | |
270 | |
271 A pair of the sounds that illustrate the limitations of the | |
272 temporal shadow criterion are presented in Figures 4.1a and 4.2a; the | |
273 former is an exponentially damped sinusoid that repeats every 25-ms, | |
274 the latter is an exponentially ramped sinusoid with the same envelope | |
275 period. The carrier frequency in this case is 800 Hz and the half | |
276 life of the exponential is 4-ms. The half life is on the same order | |
277 as the exponential decay of the impulse response of a gammatone | |
278 auditory filter with a centre frequency in the region of 800 Hz. The | |
279 example is taken from Patterson (1994a). | |
280 | |
281 The neural activity patterns produced by the damped and ramped | |
282 sinusoids are shown in Figures 4.1b and 4.2b, respectively. The | |
283 frequency range of the filterbank is from an octave below the carrier | |
284 frequency to an octave above the carrier frequency. The highest and | |
285 lowest channels in Figure 4.1b show the transient response of the | |
286 filterbank to the onset of the damped sinusoid, and similarly the | |
287 high- and low-frequency channels in Figure 4.2b show the transient | |
288 response of the filterbank to the offset of the ramped sinusoid. In | |
289 the high-frequency channels, the onset response of the damped sinusoid | |
290 and the offset response of the ramped sinusoid are composed of impulse | |
291 responses from the individual auditory filters. The centre section of | |
292 each figure shows the response to the carrier. Here we see that the | |
293 asymmetry in the waveform is preserved in the NAP: in Figure 4.1b, the | |
294 carrier component is at its highest level just as the transient | |
295 response ends and the carrier component decays away over the course of | |
296 the period; in Figure 4.2b, the carrier activity rises over the course | |
297 of the ramped cycle and ends at its peak level in the transient | |
298 response. | |
299 | |
300 Auditory images of these damped and ramped sinusoids are | |
301 presented in Figures 4.3 and 4.4, respectively. The upper rows show | |
302 the images obtained when the strobe initiates temporal integration on | |
303 every peak in the NAP; the middle rows show the images obtained with | |
304 the temporal shadow criterion. The images in the upper row illustrate | |
305 the problem of preserving NAP asymmetry during temporal integration. | |
306 When the mechanism strobes on every peak, the temporal asymmetry | |
307 observed in the NAP of the damped sinusoid is actually reversed in the | |
308 auditory image of the damped sinusoid (Figure 4.3a). In the case of | |
309 the ramped sinusoid, the asymmetry observed in the NAP is largely lost | |
310 in the image of the ramped sinusoid (Figure 4.4a); there is activity | |
311 at all time intervals in the central channels, whereas there is a gap | |
312 in activity in the NAP of the ramped sinusoid, once per cycle, just | |
313 after the abrupt reduction in amplitude. It is also the case that | |
314 there are irregular fringes along the edges of the main structure in | |
315 the auditory image of the ramped sinusoid (Figure 4.4a). This | |
316 provides further evidence that the time interval pattern in the NAP is | |
317 being disrupted by the temporal integration process in the | |
318 construction of the auditory image. | |
319 | |
320 The introduction of the temporal shadow criterion for | |
321 initiating temporal integration produces a dramatic improvement in the | |
322 auditory image of the damped sinusoid (Figure 4.3b). The structure in | |
323 the image is highly asymmetric and, once the alignment process is | |
324 taken into account, the structure in the image is seen to be a very | |
325 faithful reproduction of that in the NAP. The imposition of the | |
326 temporal shadow criterion improves the auditory image of the ramped | |
327 sound (Figure 4.4b). in as much as it eliminates the fringes seen in | |
328 Figure 4.4a. But it does not solve the asymmetry problem. The | |
329 structure in the auditory image of Figure 4.4a is still more symmetric | |
330 than it is asymmetric, whereas the structure in the corresponding NAP | |
331 is highly asymmetric. | |
332 | |
333 The source of the problem is illustrated in Figures 4.5a and | |
334 4.6a which show the NAPs and adaptive thresholds for 80-ms segments of | |
335 the damped and ramped sinusoids, respectively. The vertical markers | |
336 below the abscissa in Figure 4.5a show that after the first cycle, the | |
337 strobe mechanism is synchronised to the period of the wave and | |
338 initiates temporal integration once per cycle on the largest NAP peak. | |
339 So this criterion preserves the asymmetry of the damped sound in its | |
340 auditory image. In contrast, Figure 4.6a shows that on the way up the | |
341 ramped portion of each cycle, the rising NAP pulses repeatedly exceed | |
342 the adaptive threshold resulting in repeated initiation of temporal | |
343 integration. Since, in this region of the cycle, the mechanism | |
344 initiates temporal integration on every cycle, the auditory image does | |
345 not preserve the asymmetry observed in the corresponding NAP. The | |
346 irregular fringe is reduced because the mechanism reliably skips the | |
347 portion of the cycle where the level of activity in the NAP is | |
348 changing most rapidly. | |
349 | |
350 The high rate of strobing revealed in Figure 4.6a means that | |
351 the level of activity in the ramped auditory image of Figure 4.4b is | |
352 considerably greater than that in the damped image (Figure 4.3b). It | |
353 does not show in those Figures because they have been normalised for | |
354 display purposes. In terms of the auditory model, however, the | |
355 greater overall level in the image of the ramped sound would lead to | |
356 the prediction that ramped sounds are considerably louder than damped | |
357 sounds, and this is not the case; they have roughly equal loudness. | |
358 All of these observations taken together suggest that the strobe rate | |
359 should be limited and that the limitation should favour larger NAP | |
360 peaks, closer to the local maximum. | |
361 | |
362 The solution in this case is to delay temporal integration a | |
363 few milliseconds after each suprathreshold NAP pulse, to determine | |
364 whether another, larger, NAP pulse is about to occur. Specifically, | |
365 when a NAP peak is identified, it is labeled as a potential strobe | |
366 point, but the initiation of temporal integration is delayed for | |
367 several milliseconds. In AIM, the value is set with option | |
368 'stlag_ai'. If, during this time, no new larger NAP pulses are | |
369 encountered, the candidate strobe point is used to initiate temporal | |
370 integration. If a larger NAP pulse is encountered, it becomes the new | |
371 strobe candidate and replaces the previous strobe candidate, the | |
372 strobe lag is reset to stlag_ai ms and the process begins again. The | |
373 auditory images of damped and ramped sinusoids produced with this | |
374 'local-max' strobe criterion are shown in Figures 4.3c and 4.4c, | |
375 respectively. The strobe lag restriction has virtually no effect on | |
376 the auditory image of the damped sinusoid, but it improves the image | |
377 of the ramped sinusoid markedly. The asymmetry observed in the NAP of | |
378 the ramped sinusoid is now preserved in its auditory image. | |
379 | |
380 The NAP functions and the adaptive thresholds for the damped | |
381 and ramped sinusoids are shown in Figures 4.5b and 4.6b, respectively. | |
382 A comparison of the strobe points for the damped sinusoid under the | |
383 temporal shadow criterion (Figure 4.5a) and the local max criterion | |
384 (Figure 4.5b) shows that there is one small difference; the very first | |
385 strobe point under the temporal shadow criterion is omitted under the | |
386 local max criterion because a larger NAP pulse follows it within | |
387 stlag_ai ms. So the second NAP pulse replaces the first as the strobe | |
388 candidate. In the case of the ramped sinusoid, shifting to the local | |
389 max criterion has a dramatic effect. The NAP functions and adaptive | |
390 thresholds in Figures 4.6a and 7.6b are identical, but most of the | |
391 strobe points identified under the temporal shadow criterion (Figure | |
392 4.6a) are immediately followed by larger NAP pulses as we proceed up | |
393 the ramp. As a result the majority of the candidate pulses are | |
394 repressed in favour of the one that occurs at the offset of the ramp. | |
395 So, with the exception of the onset of the sound, the mechanism | |
396 synchronises to the period of the sound and there is one strobe per | |
397 cycle of the sound. The local max criterion also leads to damped and | |
398 ramped auditory images with roughly the same level of activity in the | |
399 auditory image, and so it is also a better predictor of the loudness | |
400 of these sounds. Finally, note that the strobe lag restricts the | |
401 maximum strobe rate of the mechanism. This is important because, | |
402 without it, the level of a sinusoid would increase with its frequency | |
403 in the auditory image. | |
404 | |
405 | |
406 5. Limiting the Lag of the Local Max Criterion. | |
407 | |
408 In the second experiment with damped and ramped sinusoids | |
409 (Patterson, 1994b), the longest envelope period was 100-ms, and in | |
410 that condition, the distinction between damped and ramped sinusoids is | |
411 audible for half lives as long as 64 ms. In channels near the carrier | |
412 frequency, the NAP function produced by the ramped sinusoid is a long, | |
413 slowly rising, sequence of peaks. The local-max strobe criterion | |
414 delays temporal integration to the end of the ramp and initiates | |
415 temporal integration once per cycle, as previously, with the 25-ms | |
416 envelope stimuli. The example, however, raises the question of what | |
417 would happen in the case of a very long duration slowly rising tone, | |
418 say a tone that rises from absolute threshold to 80 dB SPL over the | |
419 course of 5 seconds. A listener would undoubtedly hear the sound | |
420 shortly after it comes on, and hear its loudness increase | |
421 progressively over the course of the 5-second rise. The local-max | |
422 strobe mechanism would initiate temporal integration once, shortly | |
423 after the onset of the sound, because of overshoot in the neural | |
424 encoding stage of AIM. But thereafter, it would suppress temporal | |
425 integration throughout the rise of the NAP function and strobe once at | |
426 the end of the rise. Thus the auditory image would be empty at a time | |
427 when we know the listener would hear the tone. To solve this problem, | |
428 the strobe lag of the local max mechanism is limited to twice the | |
429 stlag_ai value; that is, after a NAP pulse becomes a strobe candidate, | |
430 either that NAP pulse or a larger one must initiate temporal | |
431 integration within the next 2*stlag_ai ms. So the strobe lag restricts | |
432 not only the maximum strobe rate for static sinusoids, but also the | |
433 minimum strobe rate for slowly increasing sinusoids. | |
434 | |
435 | |
436 6. Aperiodic Strobing and Irregularity in the Auditory Image. | |
437 | |
438 To this point, the discussion of strobe criteria has focussed | |
439 on activity in the carrier channel of the NAP and auditory image, and | |
440 the relationship between strobe criteria and the preservation of NAP | |
441 asymmetry through temporal integration. It was noted in passing, | |
442 that, away from the carrier channel, auditory images of ramped sounds | |
443 have fringes of irregular activity, for all strobe criteria prior to | |
444 the local max criterion. We might expect such fringes to impart a | |
445 roughness or noisy quality to the perception of ramped sounds, but | |
446 typically they are static and clear. In this final Section, the | |
447 activity produced by a ramped sinusoid in the 640 Hz channel of the | |
448 NAP and auditory image is examined, to illustrate the relationship | |
449 between strobe restrictions and the fringe of irregularity in the | |
450 auditory image. | |
451 | |
452 The NAP produced in the 640 Hz channel of the filterbank by a | |
453 ramped sinusoid with an 800-Hz carrier, a 25-ms envelope period, and a | |
454 4-ms half life is shown in Figure 6.1. The level of the ramped | |
455 sinusoid rises rapidly, relative to the decay rate of the impulse | |
456 response of the auditory filter and, as a result, the activity in the | |
457 rising part of the NAP is dominated by carrier-period time intervals | |
458 (Patterson, 1994a). When the amplitude of the ramped sinusoid drops | |
459 abruptly, the energy stored in the filter decays away in a wave with | |
460 periods appropriate to the centre frequency of the channel. Now | |
461 consider the activity produced by this NAP in the 640-Hz channel of | |
462 the auditory image for strobe criteria 2, 3 and 4, the 'every peak', | |
463 'temporal shaddow,' and 'local max' criteria, respectively. | |
464 | |
465 Figure 6.2a shows the case where there is no adaptive | |
466 threshold and the mechanism strobes on the peak of every NAP pulse. | |
467 This is the version of STI most similar to autocorrelation. Strobing | |
468 on every peak causes carrier periods from the ramp to be mixed with | |
469 centre-frequency periods after the offset of the ramp. This is the | |
470 source of the irregularity in Fig. 6.2a, and the source of the | |
471 irregular fringe in the full auditory image (Fig. 4.5a) (Allerhand and | |
472 Patterson, 1992). | |
473 | |
474 The activity produced with the temporal shadow criterion is | |
475 shown in the Figure 6.2b. The adaptive threshold function and the | |
476 strobe points shown with the NAP in Fig. 6.1 were generated with the | |
477 temporal shaddow criterion. In this case, the mechanism initiates | |
478 temporal integration on each peak in the ramped portion of the NAP, | |
479 but it skips the peaks associated with the ringing of the filter after | |
480 the ramp terminates. Strobing occurs in synchrony with the carrier | |
481 periods in the ramped portion of the NAP and this removes the | |
482 irregularity from the ramped portion of the auditory image between 0 | |
483 ms and about 10 ms. There is still irregularity in the region from 0 | |
484 to -10 ms, and in the region from 25 to 15 ms, because strobing in | |
485 synchrony with the carrier period mixes carrier periods and centre | |
486 frequency periods in this region of the image. | |
487 | |
488 A further improvement occurs when the local max criterion is | |
489 introduced and strobing on successive carrier periods of the ramped | |
490 section of the NAP is suppressed. The activity in the 640-Hz channel | |
491 of the image is shown in Figure 6.2c. The irregular activity has been | |
492 removed; the image shows carrier periods to the left of the 0-ms point | |
493 and centre frequency periods to the right of the 0-ms point. Thus, | |
494 strobing on local maxima synchronises temporal integration to the | |
495 period of the wave and preserves not only the basic asymmetry of the | |
496 NAP, but also the contrasting time interval patterns associated with | |
497 different sections of the NAP cycle. | |
498 | |
499 | |
500 | |
501 REFERENCES | |
502 | |
503 Akeroyd, M.A. and Patterson, R.D. (1995). "Discrimination of wideband | |
504 noises modulated by a temporally asymmetric function," | |
505 J. Acoust. Soc. Am. (in press). | |
506 | |
507 Assman, P. F. and Q. Summerfield (1990). "Modelling the perception of | |
508 concurrent vowels: Vowels with different fundamental frequencies," | |
509 J. Acoust. Soc. Am. 88, 680-697. | |
510 | |
511 Brown, G.J. and Cooke, M. (1994). "Computational auditory scene | |
512 analysis," Computer Speech and Language 8, 297-336. | |
513 | |
514 Irino, T. and Patterson, R.D. (1996). "Temporal asymmetry in the | |
515 auditory system," J. Acoust. Soc. Am. (revision submitted | |
516 August 95). | |
517 | |
518 McKeown, D. and Patterson, R.D. (1995). "The time course of auditory | |
519 segregation: concurrant vowels that vary in duration," | |
520 J. Acoust. Soc. Am. (in press). | |
521 | |
522 Meddis, R. and M. J. Hewitt (1991a). "Virtual pitch and phase | |
523 sensitivity of a computer model of the auditory periphery: I | |
524 pitch identification," J. Acoust. Soc. Am. 89, 2866-82. | |
525 | |
526 Meddis, R. and M. J. Hewitt (1991b). "Virtual pitch and phase | |
527 sensitivity of a computer model of the auditory periphery: II | |
528 phase sensitivity," J. Acoust. Soc. Am. 89, 2883-94. | |
529 | |
530 Patterson, R.D. (1987b). "A pulse ribbon model of monaural | |
531 phase perception," J. Acoust. Soc. Am. 82, 1560-1586. | |
532 | |
533 Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, | |
534 C. and Allerhand M. (1992) "Complex sounds and auditory images," | |
535 In: Auditory physiology and perception, Y Cazals, L. Demany, | |
536 K. Horner (eds), Pergamon, Oxford, 429-446. | |
537 | |
538 Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models," | |
539 J. Acoust. Soc. Am. 96, 1409-1418. | |
540 | |
541 Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval | |
542 models." J. Acoust. Soc. Am. 96, 1419-1428. | |
543 | |
544 Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and | |
545 sound quality," in: Advances in Hearing Research: Proceedings of | |
546 the 10th International Symposium on Hearing, G. Manley, G. Klump, | |
547 C. Koppl, H. Fastl, & H. Oeckinghaus, (Eds). World Scientific, | |
548 Singapore, (in press). | |
549 | |
550 Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain | |
551 modelling of peripheral auditory processing: A modular architecture | |
552 and a software platform," J. Acoust. Soc. Am. 98, (in press). | |
553 | |
554 Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to | |
555 identify the instrument, the octave, or the pitch-chroma of a | |
556 musical note," Music Perception (in press). | |
557 | |
558 Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to | |
559 identify vowels, their octave, and their pitch-chroma," J. Acoust. Soc. | |
560 Am 98, (in press). | |
561 | |
562 Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in | |
563 Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, | |
564 Albuquerque, New Mexico. | |
565 | |
566 | |
567 | |
568 | |
569 =========================================================================== | |
570 #!/bin/sh | |
571 | |
572 # script/aimStrobeCriterion | |
573 # Annotated script for generating the figures in docs/aimStrobeCriterion | |
574 | |
575 echo "FIGURES FOR SECTION 0" | |
576 | |
577 mv .gennaprc .oldgennaprc # a safety precaution | |
578 mv .gensairc .oldgensairc # a safety precaution | |
579 echo | gennap powc=off -update # make sure that powc is off | |
580 echo | gensai powc=off -update # make sure that powc is off | |
581 | |
582 echo | |
583 echo "FIGURES FOR SECTION 0" | |
584 echo "Figure 0.1: Neural Activity Pattern (NAP) of cegc" | |
585 gennap input=cegc_br top=3000 swap=off bits=12 gain_gtf=4 # all default values | |
586 | |
587 echo "Figure 0.2: Stabilised Auditory Image (SAI) of cegc" | |
588 gensai stcrit=3 input=cegc_br length=100ms frstep_aid=96ms top=2500 | |
589 | |
590 echo | |
591 echo "FIGURES FOR SECTION 1" | |
592 | |
593 echo "Figure 1.1 SAI of cegc strobing on every non-zero point in the NAP" | |
594 echo " (stcrit_ai=1). This one is slow to calculate." | |
595 gensai stcrit_ai=1 top=17000 input=cegc_br length=100ms frstep_aid=96ms | |
596 | |
597 # Top has to be raised because this strobe criterion causes constant | |
598 # temporal integration. | |
599 | |
600 | |
601 echo "Figure 1.2: SAI via autocorrelation -- a correlogram" | |
602 echo | gennap input=cegc_br display=off length=125ms top=3000 output=stdout > cegc_br_gtf.nap | |
603 #gennap -use start=48 display=on cegc_br_gtf # optional display of the NAP | |
604 # After making a NAP with display=off, gennap -use requires you to set display=on. | |
605 | |
606 acgram start=50 wid=70ms lag=35ms frames=1 scale=.02 cegc_br_gtf.nap > cegc_gtf.sai | |
607 gensai -use top=5000 input=cegc_gtf | |
608 | |
609 rm cegc_br_gtf.nap cegc_gtf.sai | |
610 | |
611 echo | |
612 echo "FIGURES FOR SECTION 2" | |
613 | |
614 echo "Figure 2.1: SAI of cegc strobing on the peak of every NAP pulse" | |
615 echo " (stcrit_ai=2)" | |
616 gensai stcrit_ai=2 top=10000 input=cegc_br length=100ms frstep_aid=96ms | |
617 | |
618 echo | |
619 echo "FIGURES FOR SECTION 3" | |
620 | |
621 echo "Demonstration of preservation of asymmetry when stthresh is elevated" | |
622 # Note stthresh only operates when stcrit_ai=1. | |
623 gensai stcrit_ai=1 top=5000 input=cegc_br length=68ms frstep_aid=66ms stthresh_ai=5000 | |
624 | |
625 echo "Figure 3.1: NAP of cegc with temporal shaddow criterion (stcrit_ai=3)" | |
626 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
627 StrobeCriterionDisplay cegc_br 1000 100 3 2.5 17000 2000 | |
628 | |
629 # Type 'StrobeCriterionDisplay -help' for a listing of the options and | |
630 # their order. | |
631 # Control of Xplots: | |
632 # Click mouse button 1 to display coordinates of points. | |
633 # Click mouse button 2 to redraw. | |
634 # Click mouse button 3 to remove the display (i.e. quit). | |
635 | |
636 echo | |
637 echo "FIGURES FOR SECTION 4" | |
638 | |
639 echo "Figure 4.1a: Waveform of Damped Sinusoid (4 cycles)" | |
640 genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_d swap=on | |
641 | |
642 echo "Figure 4.2a: Waveform of Ramped Sinusoid (4 cycles)" | |
643 genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_r swap=on | |
644 | |
645 echo "Figure 4.1b: NAP of the Damped Sinusoid (2 cycles)" | |
646 gennap input=dr_f8_t4_d gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > damped.nap | |
647 gennap -use start=50 leng=50 display=on damped | |
648 | |
649 echo "Figure 4.2b: NAP of the Ramped Sinusoid (2 cycles)" | |
650 gennap input=dr_f8_t4_r gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > ramped.nap | |
651 gennap -use start=60 leng=50 display=on ramped | |
652 | |
653 rm damped.nap ramped.nap | |
654 | |
655 echo "Figure 4.3a: SAI of the Damped Sinusoid strobing on every NAP peak" | |
656 echo " (stcrit_ai=2)" | |
657 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
658 | |
659 echo "Figure 4.4a: SAI of the Ramped Sinusoid strobing on every NAP peak" | |
660 echo " (stcrit_ai=2)" | |
661 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
662 | |
663 echo "Figure 4.3b: SAI of the Damped Sinusoid with temporal shaddow criterion" | |
664 echo " (stcrit_ai=3)" | |
665 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=1000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
666 | |
667 echo "Figure 4.4b: SAI of the Ramped Sinusoid with temporal shaddow criterion" | |
668 echo " (stcrit_ai=3)" | |
669 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
670 | |
671 echo "Figure 4.3c: SAI of the Damped Sinusoid with the local max criterion" | |
672 echo " (stcrit_ai=4)" | |
673 gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
674 | |
675 echo "Figure 4.4c: SAI of the Ramped Sinusoid with the local max criterion" | |
676 echo " (stcrit_ai=4)" | |
677 gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
678 | |
679 echo | gennap swap=on bits=16 gain_gtf=0.0625 -update | |
680 echo | gensai swap=on bits=16 gain_gtf=0.0625 -update | |
681 | |
682 | |
683 echo "Figure 4.5a: NAP of Damped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" | |
684 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
685 StrobeCriterionDisplay dr_f8_t4_d 800 120 3 2.5 14000 2400 | |
686 | |
687 echo "Figure 4.5b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)" | |
688 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
689 StrobeCriterionDisplay dr_f8_t4_d 800 120 4 2.5 14000 2400 | |
690 | |
691 echo "Figure 4.6a: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" | |
692 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
693 StrobeCriterionDisplay dr_f8_t4_r 800 120 3 2.5 7500 2400 | |
694 | |
695 echo "Figure 4.6b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)" | |
696 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
697 StrobeCriterionDisplay dr_f8_t4_r 800 120 4 2.5 7500 2400 | |
698 | |
699 echo | |
700 echo "FIGURES FOR SECTION 5" | |
701 | |
702 echo | |
703 echo "FIGURES FOR SECTION 6" | |
704 | |
705 echo "Figure 6.1: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" | |
706 echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" | |
707 StrobeCriterionDisplay dr_f8_t4_r 640 120 3 2.5 7000 2000 | |
708 | |
709 echo "Figure 6.2a: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=2)" | |
710 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=32000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
711 | |
712 echo "Figure 6.2b: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=3)" | |
713 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=10000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
714 echo "Figure 6.2c: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=4)" | |
715 gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=1200 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 | |
716 | |
717 |