tomwalters@0: docs/aimStrobeCriterion (text) tomwalters@0: scripts/aimStrobeCriterion (figures) tomwalters@0: tomwalters@0: tomwalters@0: STROBED TEMPORAL INTEGRATION AND THE STABILISED AUDITORY IMAGE tomwalters@0: tomwalters@0: Roy D. Patterson, Jay Datta and Mike Allerhand tomwalters@0: MRC Applied Psychology Unit tomwalters@0: 15 Chaucer Road, Cambridge, CB2 2EF UK tomwalters@0: tomwalters@0: email: roy.patterson, jay.datta or mike.allerhand @mrc-apu.cam.ac.uk tomwalters@0: tomwalters@0: 2 August 1995 tomwalters@0: tomwalters@0: tomwalters@0: ABSTRACT tomwalters@0: tomwalters@0: This document describes the Strobed Temporal Integration tomwalters@0: mechanism used to convert neural activity patterns into stabilised tomwalters@0: auditory images. The specific version of the Auditory Image Model is tomwalters@0: AIM R7, as described in Patterson, Allerhand, and Giguere (1995) tomwalters@0: tomwalters@0: tomwalters@0: tomwalters@0: INTRODUCTION tomwalters@0: tomwalters@0: When a periodic sound occurs with a pitch in the musical tomwalters@0: range, the cochlea produces a detailed, multi-channel, time-interval tomwalters@0: pattern that repeats once per cycle of the wave. The auditory images tomwalters@0: that we hear in response to periodic sounds are perfectly stable. tomwalters@0: That is, despite the fact that the level of activity in the neural tomwalters@0: activity pattern is fluctuating over a large range within the course tomwalters@0: of each cycle, the loudness of the sound is fixed. This indicates tomwalters@0: that some form of temporal integration is applied to the NAP prior to tomwalters@0: our initial perception of the sound. The auditory images of periodic tomwalters@0: sounds can have a very rich timbre, or sound quality, that can reveal tomwalters@0: a great deal about the sound source such as the quality of the musical tomwalters@0: instrument or the finesse of the musician. This suggests that much of tomwalters@0: the detailed time-interval information produced by the cochlea is tomwalters@0: preserved in the stabilised auditory image. tomwalters@0: tomwalters@0: The fact that we hear stable auditory images with rich sound tomwalters@0: quality presents auditory theorists with a problem. The temporal tomwalters@0: integration mechanism in traditional auditory models is a low-pass tomwalters@0: filter that removes the fine-grain time-interval information from the tomwalters@0: internal representation of the sound -- time interval information that tomwalters@0: appears to be required for timbre perception. Strobed temporal tomwalters@0: integration was introduced to solve this problem. At one and the same tomwalters@0: time, it performs the temporal integration necessary to produce stable tomwalters@0: auditory images and it preserved the majority of the time-interval tomwalters@0: information observed in the neural activity pattern (NAP) produced by tomwalters@0: the cochlea. tomwalters@0: tomwalters@0: It is not a difficult problem to produce a high-resolution, tomwalters@0: stabilised version of the NAP provided you know the moment in time at tomwalters@0: which the pattern in the NAP will repeat. For example, consider the tomwalters@0: NAP of the first note of the wave CEGC in Figure 0.1 from Patterson et tomwalters@0: al. (1992). The wave is a train of clicks separated by 8-ms gaps; the tomwalters@0: upper channels of the NAP show that the response is a sequence of tomwalters@0: filter impulse responses spaced at 8 ms intervals. A stabilised tomwalters@0: representation of the NAP can be produced by setting up an image tomwalters@0: buffer that has the same number of channels as the NAP, and simply tomwalters@0: transferring a copy of the pattern in each channel of the NAP to the tomwalters@0: corresponding channel of the image buffer once every 8 ms. In the tomwalters@0: NAP, the pattern flows from right to left as time progresses, and tomwalters@0: since the cycles are continually entering the NAP from the right hand tomwalters@0: side and exiting the NAP from the left hand side, the pattern after tomwalters@0: every 8 ms is identical to the pattern 8 ms ago. So if the transfer tomwalters@0: from the NAP to the auditory image is performed every 8 ms exactly, tomwalters@0: successive contributions from the NAP to the image are all identical. tomwalters@0: tomwalters@0: In the image buffer, activity does not move from right to tomwalters@0: left, it simply decays into the floor exponentially over time with a tomwalters@0: half life of about 30 ms. When a new contribution arrives from the tomwalters@0: NAP, it is added point for point with whatever is currently in the tomwalters@0: corresponding channel of the image buffer. In the current example, tomwalters@0: after a copy of the NAP arrives in the auditory image, and during the tomwalters@0: 30 ms over which it would decay to half its original value, three more tomwalters@0: copies of the NAP pattern arrive and are added into the auditory tomwalters@0: image. Thus, for typical musical notes and typical vowels, the rate tomwalters@0: of temporal integration from the NAP into the auditory image is high tomwalters@0: and there is little time between successive integration events for the tomwalters@0: image itself to decay. This is the source of the stability of the tomwalters@0: auditory image. tomwalters@0: tomwalters@0: Provided the integration is performed once per cycle of the tomwalters@0: sound, the majority of the time-interval information in the NAP will tomwalters@0: be preserved in the auditory image, thereby providing a solution to tomwalters@0: the problem of how to produce stable images without removing the tomwalters@0: fine-grain time-interval information associated with sound quality. tomwalters@0: The auditory image produced by this process is shown in Figure 0.2 tomwalters@0: from Patterson et al. (1992). The transfer is performed on each tomwalters@0: channel of the NAP separately and it is performed at the point in the tomwalters@0: cycle where the activity in the NAP is a maximum. The maximum of the tomwalters@0: most recent cycle to arrive in the NAP is added into the auditory tomwalters@0: image at the 0-ms point, and as a result, the NAP peaks are aligned tomwalters@0: vertically in the auditory image. This passive alignment process tomwalters@0: explains the loss of global phase information observed empirically tomwalters@0: (see Patterson, 1987, for a review). tomwalters@0: tomwalters@0: Thus it would appear that the problem of converting the tomwalters@0: oscillating NAP into a stabilised, high-resolution image reduces to tomwalters@0: the problem of finding the pitch of the sound and performing temporal tomwalters@0: integration at multiples of the pitch period. There are now a number tomwalters@0: of computational auditory models with a proven ability to extract the tomwalters@0: pitch of complex sounds (see Brown and Cook, 1994, for a review) and tomwalters@0: they could be used to direct strobed temporal integration. However, tomwalters@0: experiments with vowels (McKeown and Patterson, 1995; Robinson and tomwalters@0: Patterson, 1995a) and musical notes (Robinson and Patterson, 1995b) tomwalters@0: indicate that 4 to 8 cycles of the sound are required to produce an tomwalters@0: accurate estimate of the pitch, whereas the sound quality information tomwalters@0: necessary to identify a vowel or a musical instrument can be extracted tomwalters@0: from one cycle of the wave. This suggests, that if the auditory tomwalters@0: system does use strobed temporal integration to produce a stable, high tomwalters@0: resolution auditory image, it does it with a mechanism that operates tomwalters@0: more locally in time than pitch extraction mechanisms. This is the tomwalters@0: background that led to the development of the strobed temporal tomwalters@0: integration mechanism in the auditory image model. tomwalters@0: tomwalters@0: In Sections 1 and 2 of this document, following Allerhand and tomwalters@0: Patterson (1992), we describe two simple criteria for selecting strobe tomwalters@0: points in the NAP and show that they produce auditory images that are tomwalters@0: very similar to the correlograms produced by Assman and Summerfield tomwalters@0: (1990), Slaney and Lyon (1990), or Meddis and Hewitt (1991a, 1991b). tomwalters@0: The structures that arise in this form of auditory image are much more tomwalters@0: symmetric than the corresponding structures in the NAP (Allerhand and tomwalters@0: Patterson, 1992). There is mounting evidence, however, that the tomwalters@0: auditory system is highly sensitive to temporal asymmetry (Patterson, tomwalters@0: 1994a, 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), tomwalters@0: and so the loss of asymmetry associated with the simple strobe tomwalters@0: criterion seems likely to limit the value of this representation of tomwalters@0: our perceptions. In the remaining Sections, an ordered sequence of tomwalters@0: restrictions is added to the simple criteria for initiating temporal tomwalters@0: integration, to restore asymmetry to the structures that arise in the tomwalters@0: auditory image. tomwalters@0: tomwalters@0: tomwalters@0: 1. Strobe on Every Non-Zero Point in the NAP. tomwalters@0: tomwalters@0: The initial criterion is very simple; temporal integration is tomwalters@0: initiated on each and every non-zero point in the NAP. In AIM tomwalters@0: software, the option that determines which strobe criterion will be tomwalters@0: used is 'stcrit_ai' and it is set equal to one for this simplest tomwalters@0: strobe criterion. Allerhand and Patterson (1992) showed that when tomwalters@0: temporal integration from the NAP to the auditory image is initiated tomwalters@0: on each and every non-zero point in the NAP function, the result is tomwalters@0: very similar to a correlogram -- a representation that is commonly tomwalters@0: used in time-domain models of hearing to extract the pitch of complex tomwalters@0: periodic sounds (see Brown and Cook, 1994, for a review). For tomwalters@0: example, compare the auditory image with stcrit_ai=1 (Figure 1.1) and tomwalters@0: the correlogram (Figure 1.2) of the first note of the sound cegc. tomwalters@0: Both figures show stabilised representations of the time-interval tomwalters@0: pattern that the sound produces in the NAP, and in both cases, the tomwalters@0: individual channels have been aligned vertically on the largest peak tomwalters@0: in the NAP function. The patterns in the auditory image and the tomwalters@0: correlogram both differ from the pattern in the NAP in one important tomwalters@0: way; there is a reflection of the NAP pulses associated with the tomwalters@0: ringing of the auditory filters, on the side opposite to where they tomwalters@0: originally appear. That is, autocorrelation and STI with stcrit_ai=1 tomwalters@0: reduce the temporal asymmetry observed in the NAP. The asymmetry tomwalters@0: information is not entirely removed but it is largely removed. tomwalters@0: Experiments with sounds that have asymmetric temporal modulation show tomwalters@0: that listeners are sensitive to temporal asymmetry (Patterson, 1994a, tomwalters@0: 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), and so tomwalters@0: the removal of asymmetry information seems likely to prove a tomwalters@0: disadvantage when attempting to explain auditory perception. tomwalters@0: tomwalters@0: The autocorrelation process is symmetric in time by its very tomwalters@0: nature. Mechanical processes that produce sound in the world are tomwalters@0: typically asymmetric in time because they usually have some inertia. tomwalters@0: Resonators struck impulsively ring after the pulse and not before. tomwalters@0: This principle also applies to the processes that analyse the sound in tomwalters@0: the auditory system. The impulse response of the auditory filter tomwalters@0: rises faster than it falls; the adaptation process in the inner tomwalters@0: haircell adapts up faster at the onset of a sound than it adapts down tomwalters@0: after the sound passes. So asymmetry is the norm in the world and it tomwalters@0: is not surprising that the auditory system is sensitive to it. tomwalters@0: tomwalters@0: tomwalters@0: 2. Strobe on the Peak of Each NAP pulse. tomwalters@0: tomwalters@0: When temporal integration is initiated on every non-zero NAP tomwalters@0: point, the successive NAP functions that are transferred to the tomwalters@0: auditory image are highly correlated. This suggests that we could tomwalters@0: attain essentially the same auditory image for vastly less computation tomwalters@0: by restricting temporal integration to the larger points on the tomwalters@0: individual NAP pulses. This leads, in turn, to the suggestion that tomwalters@0: temporal integration be limited to the peak of the individual NAP tomwalters@0: pulses. The result of this restriction is illustrated in Figure 2.1 tomwalters@0: which shows the auditory image of the first note of CEGC with this tomwalters@0: more restricted strobe criterion. Since the peak restriction greatly tomwalters@0: reduces the rate of temporal integration, the absolute levels of tomwalters@0: structures in this form of auditory image are considerably lower than tomwalters@0: those in the previous form of image. The pattern of time intervals, tomwalters@0: however, is very similar in the two forms of auditory image. They tomwalters@0: both preserve a detailed representation of the time-interval pattern tomwalters@0: in the NAP, and, they both loose much of the asymmetry in the NAP. tomwalters@0: tomwalters@0: tomwalters@0: 3. Avoid Strobing in the Temporal Shadow after a large NAP Pulse. tomwalters@0: tomwalters@0: The loss of asymmetry in the click-train structure of the tomwalters@0: auditory image, arises when temporal integration is initiated on the tomwalters@0: smaller NAP pulses associated with the ringing of the auditory filters tomwalters@0: after each click in the train. This can be demonstrated by tomwalters@0: introducing a fixed strobe threshold below which NAP peaks do not tomwalters@0: initiate temporal integration, and progressively raising this strobe tomwalters@0: threshold to exclude more and more of the lower level NAP pulses. (In tomwalters@0: AIM, a fixed threshold is set with option stthresh_ai and tomwalters@0: stcrit_ai=1.) The auditory image becomes less and less symmetric and tomwalters@0: more and more like the original NAP pattern for the click train as the tomwalters@0: strobe threshold is increased. Fixed thresholds of this sort are not tomwalters@0: realistic for simulating the operation of auditory system, firstly tomwalters@0: because the strobe threshold eventually exceeds the largest NAP pulse tomwalters@0: and temporal integration ceases entirely, and secondly because, in the tomwalters@0: natural environment, the levels of sounds are constantly changing. tomwalters@0: Nevertheless, the example illustrates how NAP asymmetry is lost with tomwalters@0: simple strobe criteria. The problem with autocorrelation is similar; tomwalters@0: the correlation values at lags associated with the smaller NAP pulses tomwalters@0: introduce symmetric reflections into structure that appear in the tomwalters@0: correlogram. tomwalters@0: tomwalters@0: An alternative means of restricting temporal integration to tomwalters@0: the larger pulses in the NAP of the click train is to use an adaptive tomwalters@0: strobe threshold which is temporally asymmetric. In the simplest tomwalters@0: case, when the strobe unit monitoring a NAP channel encounters a tomwalters@0: pulse, strobe threshold is set to the full height of the NAP pulse. tomwalters@0: But following the peak threshold does not fall as fast as the NAP tomwalters@0: function, rather it is restricted to decaying at a fixed percentage of tomwalters@0: the peak height per ms. In AIM, the rate of decay is set to 5% per tomwalters@0: ms, so the threshold decays faster after larger peaks, and in the tomwalters@0: absence of further NAP peaks, returns to 0 in 20 ms. The NAP function tomwalters@0: for the 1.0-kHz channel of the NAP is presented in Figure 3.1 along tomwalters@0: with the adaptive threshold function. Together they illustrate what is tomwalters@0: referred to as the "temporal shadow criterion" for strobed temporal tomwalters@0: integration. tomwalters@0: tomwalters@0: In the figure, the vertical lines below the abscissa of the tomwalters@0: NAP function mark the NAP pulses that initiate temporal integration. tomwalters@0: They show that the first NAP pulse strobes temporal integration and tomwalters@0: strobe threshold is set to the peak height. It immediately begins to tomwalters@0: decay, but then it encounters another NAP pulse that exceeds strobe tomwalters@0: threshold and so the process of strobing temporal integration and tomwalters@0: raising strobe threshold is promptly repeated. At this point, tomwalters@0: however, strobe threshold is high relative to the NAP pulses and, tomwalters@0: strobe threshold is falling more slowly than the NAP pulses, so the tomwalters@0: algorithm proceeds through the rest of the cycle without encountering tomwalters@0: another NAP pulse from the ringing part of the NAP function. In this tomwalters@0: way, the strobe mechanism is synchronised to the period of the sound tomwalters@0: even though no explicit information about the pitch of the sound is tomwalters@0: provided to the strobe mechanism. It is the auditory image with the tomwalters@0: temporal shaddow criterion that was presented originally in Figure tomwalters@0: 0.2. (stcrit_ai=3). tomwalters@0: tomwalters@0: The 'temporal shadow criterion' produces stable auditory tomwalters@0: images with accurate, asymmetry for a wide variety of naturally tomwalters@0: occurring sounds like vowels and musical notes. The reason is that tomwalters@0: the NAPs of these sounds have a restricted range of periods and within tomwalters@0: those periods the asymmetry is typically characterised by the tomwalters@0: rapid-rise/slow-fall form. There are, however, periodic sounds with tomwalters@0: very low pitch and NAP functions that rise slowly over the course of tomwalters@0: the period and fall rapidly at the end of the period, and the tomwalters@0: perceptions produced by these sounds indicate that the auditory strobe tomwalters@0: mechanism is somewhat more sophisticated than the temporal shadow tomwalters@0: strobe mechanism. These "ramped" sounds are the subject of the next tomwalters@0: section. tomwalters@0: tomwalters@0: tomwalters@0: 4. Avoid Temporal Integration on NAP Peaks Followed by Larger NAP Peaks. tomwalters@0: tomwalters@0: A pair of the sounds that illustrate the limitations of the tomwalters@0: temporal shadow criterion are presented in Figures 4.1a and 4.2a; the tomwalters@0: former is an exponentially damped sinusoid that repeats every 25-ms, tomwalters@0: the latter is an exponentially ramped sinusoid with the same envelope tomwalters@0: period. The carrier frequency in this case is 800 Hz and the half tomwalters@0: life of the exponential is 4-ms. The half life is on the same order tomwalters@0: as the exponential decay of the impulse response of a gammatone tomwalters@0: auditory filter with a centre frequency in the region of 800 Hz. The tomwalters@0: example is taken from Patterson (1994a). tomwalters@0: tomwalters@0: The neural activity patterns produced by the damped and ramped tomwalters@0: sinusoids are shown in Figures 4.1b and 4.2b, respectively. The tomwalters@0: frequency range of the filterbank is from an octave below the carrier tomwalters@0: frequency to an octave above the carrier frequency. The highest and tomwalters@0: lowest channels in Figure 4.1b show the transient response of the tomwalters@0: filterbank to the onset of the damped sinusoid, and similarly the tomwalters@0: high- and low-frequency channels in Figure 4.2b show the transient tomwalters@0: response of the filterbank to the offset of the ramped sinusoid. In tomwalters@0: the high-frequency channels, the onset response of the damped sinusoid tomwalters@0: and the offset response of the ramped sinusoid are composed of impulse tomwalters@0: responses from the individual auditory filters. The centre section of tomwalters@0: each figure shows the response to the carrier. Here we see that the tomwalters@0: asymmetry in the waveform is preserved in the NAP: in Figure 4.1b, the tomwalters@0: carrier component is at its highest level just as the transient tomwalters@0: response ends and the carrier component decays away over the course of tomwalters@0: the period; in Figure 4.2b, the carrier activity rises over the course tomwalters@0: of the ramped cycle and ends at its peak level in the transient tomwalters@0: response. tomwalters@0: tomwalters@0: Auditory images of these damped and ramped sinusoids are tomwalters@0: presented in Figures 4.3 and 4.4, respectively. The upper rows show tomwalters@0: the images obtained when the strobe initiates temporal integration on tomwalters@0: every peak in the NAP; the middle rows show the images obtained with tomwalters@0: the temporal shadow criterion. The images in the upper row illustrate tomwalters@0: the problem of preserving NAP asymmetry during temporal integration. tomwalters@0: When the mechanism strobes on every peak, the temporal asymmetry tomwalters@0: observed in the NAP of the damped sinusoid is actually reversed in the tomwalters@0: auditory image of the damped sinusoid (Figure 4.3a). In the case of tomwalters@0: the ramped sinusoid, the asymmetry observed in the NAP is largely lost tomwalters@0: in the image of the ramped sinusoid (Figure 4.4a); there is activity tomwalters@0: at all time intervals in the central channels, whereas there is a gap tomwalters@0: in activity in the NAP of the ramped sinusoid, once per cycle, just tomwalters@0: after the abrupt reduction in amplitude. It is also the case that tomwalters@0: there are irregular fringes along the edges of the main structure in tomwalters@0: the auditory image of the ramped sinusoid (Figure 4.4a). This tomwalters@0: provides further evidence that the time interval pattern in the NAP is tomwalters@0: being disrupted by the temporal integration process in the tomwalters@0: construction of the auditory image. tomwalters@0: tomwalters@0: The introduction of the temporal shadow criterion for tomwalters@0: initiating temporal integration produces a dramatic improvement in the tomwalters@0: auditory image of the damped sinusoid (Figure 4.3b). The structure in tomwalters@0: the image is highly asymmetric and, once the alignment process is tomwalters@0: taken into account, the structure in the image is seen to be a very tomwalters@0: faithful reproduction of that in the NAP. The imposition of the tomwalters@0: temporal shadow criterion improves the auditory image of the ramped tomwalters@0: sound (Figure 4.4b). in as much as it eliminates the fringes seen in tomwalters@0: Figure 4.4a. But it does not solve the asymmetry problem. The tomwalters@0: structure in the auditory image of Figure 4.4a is still more symmetric tomwalters@0: than it is asymmetric, whereas the structure in the corresponding NAP tomwalters@0: is highly asymmetric. tomwalters@0: tomwalters@0: The source of the problem is illustrated in Figures 4.5a and tomwalters@0: 4.6a which show the NAPs and adaptive thresholds for 80-ms segments of tomwalters@0: the damped and ramped sinusoids, respectively. The vertical markers tomwalters@0: below the abscissa in Figure 4.5a show that after the first cycle, the tomwalters@0: strobe mechanism is synchronised to the period of the wave and tomwalters@0: initiates temporal integration once per cycle on the largest NAP peak. tomwalters@0: So this criterion preserves the asymmetry of the damped sound in its tomwalters@0: auditory image. In contrast, Figure 4.6a shows that on the way up the tomwalters@0: ramped portion of each cycle, the rising NAP pulses repeatedly exceed tomwalters@0: the adaptive threshold resulting in repeated initiation of temporal tomwalters@0: integration. Since, in this region of the cycle, the mechanism tomwalters@0: initiates temporal integration on every cycle, the auditory image does tomwalters@0: not preserve the asymmetry observed in the corresponding NAP. The tomwalters@0: irregular fringe is reduced because the mechanism reliably skips the tomwalters@0: portion of the cycle where the level of activity in the NAP is tomwalters@0: changing most rapidly. tomwalters@0: tomwalters@0: The high rate of strobing revealed in Figure 4.6a means that tomwalters@0: the level of activity in the ramped auditory image of Figure 4.4b is tomwalters@0: considerably greater than that in the damped image (Figure 4.3b). It tomwalters@0: does not show in those Figures because they have been normalised for tomwalters@0: display purposes. In terms of the auditory model, however, the tomwalters@0: greater overall level in the image of the ramped sound would lead to tomwalters@0: the prediction that ramped sounds are considerably louder than damped tomwalters@0: sounds, and this is not the case; they have roughly equal loudness. tomwalters@0: All of these observations taken together suggest that the strobe rate tomwalters@0: should be limited and that the limitation should favour larger NAP tomwalters@0: peaks, closer to the local maximum. tomwalters@0: tomwalters@0: The solution in this case is to delay temporal integration a tomwalters@0: few milliseconds after each suprathreshold NAP pulse, to determine tomwalters@0: whether another, larger, NAP pulse is about to occur. Specifically, tomwalters@0: when a NAP peak is identified, it is labeled as a potential strobe tomwalters@0: point, but the initiation of temporal integration is delayed for tomwalters@0: several milliseconds. In AIM, the value is set with option tomwalters@0: 'stlag_ai'. If, during this time, no new larger NAP pulses are tomwalters@0: encountered, the candidate strobe point is used to initiate temporal tomwalters@0: integration. If a larger NAP pulse is encountered, it becomes the new tomwalters@0: strobe candidate and replaces the previous strobe candidate, the tomwalters@0: strobe lag is reset to stlag_ai ms and the process begins again. The tomwalters@0: auditory images of damped and ramped sinusoids produced with this tomwalters@0: 'local-max' strobe criterion are shown in Figures 4.3c and 4.4c, tomwalters@0: respectively. The strobe lag restriction has virtually no effect on tomwalters@0: the auditory image of the damped sinusoid, but it improves the image tomwalters@0: of the ramped sinusoid markedly. The asymmetry observed in the NAP of tomwalters@0: the ramped sinusoid is now preserved in its auditory image. tomwalters@0: tomwalters@0: The NAP functions and the adaptive thresholds for the damped tomwalters@0: and ramped sinusoids are shown in Figures 4.5b and 4.6b, respectively. tomwalters@0: A comparison of the strobe points for the damped sinusoid under the tomwalters@0: temporal shadow criterion (Figure 4.5a) and the local max criterion tomwalters@0: (Figure 4.5b) shows that there is one small difference; the very first tomwalters@0: strobe point under the temporal shadow criterion is omitted under the tomwalters@0: local max criterion because a larger NAP pulse follows it within tomwalters@0: stlag_ai ms. So the second NAP pulse replaces the first as the strobe tomwalters@0: candidate. In the case of the ramped sinusoid, shifting to the local tomwalters@0: max criterion has a dramatic effect. The NAP functions and adaptive tomwalters@0: thresholds in Figures 4.6a and 7.6b are identical, but most of the tomwalters@0: strobe points identified under the temporal shadow criterion (Figure tomwalters@0: 4.6a) are immediately followed by larger NAP pulses as we proceed up tomwalters@0: the ramp. As a result the majority of the candidate pulses are tomwalters@0: repressed in favour of the one that occurs at the offset of the ramp. tomwalters@0: So, with the exception of the onset of the sound, the mechanism tomwalters@0: synchronises to the period of the sound and there is one strobe per tomwalters@0: cycle of the sound. The local max criterion also leads to damped and tomwalters@0: ramped auditory images with roughly the same level of activity in the tomwalters@0: auditory image, and so it is also a better predictor of the loudness tomwalters@0: of these sounds. Finally, note that the strobe lag restricts the tomwalters@0: maximum strobe rate of the mechanism. This is important because, tomwalters@0: without it, the level of a sinusoid would increase with its frequency tomwalters@0: in the auditory image. tomwalters@0: tomwalters@0: tomwalters@0: 5. Limiting the Lag of the Local Max Criterion. tomwalters@0: tomwalters@0: In the second experiment with damped and ramped sinusoids tomwalters@0: (Patterson, 1994b), the longest envelope period was 100-ms, and in tomwalters@0: that condition, the distinction between damped and ramped sinusoids is tomwalters@0: audible for half lives as long as 64 ms. In channels near the carrier tomwalters@0: frequency, the NAP function produced by the ramped sinusoid is a long, tomwalters@0: slowly rising, sequence of peaks. The local-max strobe criterion tomwalters@0: delays temporal integration to the end of the ramp and initiates tomwalters@0: temporal integration once per cycle, as previously, with the 25-ms tomwalters@0: envelope stimuli. The example, however, raises the question of what tomwalters@0: would happen in the case of a very long duration slowly rising tone, tomwalters@0: say a tone that rises from absolute threshold to 80 dB SPL over the tomwalters@0: course of 5 seconds. A listener would undoubtedly hear the sound tomwalters@0: shortly after it comes on, and hear its loudness increase tomwalters@0: progressively over the course of the 5-second rise. The local-max tomwalters@0: strobe mechanism would initiate temporal integration once, shortly tomwalters@0: after the onset of the sound, because of overshoot in the neural tomwalters@0: encoding stage of AIM. But thereafter, it would suppress temporal tomwalters@0: integration throughout the rise of the NAP function and strobe once at tomwalters@0: the end of the rise. Thus the auditory image would be empty at a time tomwalters@0: when we know the listener would hear the tone. To solve this problem, tomwalters@0: the strobe lag of the local max mechanism is limited to twice the tomwalters@0: stlag_ai value; that is, after a NAP pulse becomes a strobe candidate, tomwalters@0: either that NAP pulse or a larger one must initiate temporal tomwalters@0: integration within the next 2*stlag_ai ms. So the strobe lag restricts tomwalters@0: not only the maximum strobe rate for static sinusoids, but also the tomwalters@0: minimum strobe rate for slowly increasing sinusoids. tomwalters@0: tomwalters@0: tomwalters@0: 6. Aperiodic Strobing and Irregularity in the Auditory Image. tomwalters@0: tomwalters@0: To this point, the discussion of strobe criteria has focussed tomwalters@0: on activity in the carrier channel of the NAP and auditory image, and tomwalters@0: the relationship between strobe criteria and the preservation of NAP tomwalters@0: asymmetry through temporal integration. It was noted in passing, tomwalters@0: that, away from the carrier channel, auditory images of ramped sounds tomwalters@0: have fringes of irregular activity, for all strobe criteria prior to tomwalters@0: the local max criterion. We might expect such fringes to impart a tomwalters@0: roughness or noisy quality to the perception of ramped sounds, but tomwalters@0: typically they are static and clear. In this final Section, the tomwalters@0: activity produced by a ramped sinusoid in the 640 Hz channel of the tomwalters@0: NAP and auditory image is examined, to illustrate the relationship tomwalters@0: between strobe restrictions and the fringe of irregularity in the tomwalters@0: auditory image. tomwalters@0: tomwalters@0: The NAP produced in the 640 Hz channel of the filterbank by a tomwalters@0: ramped sinusoid with an 800-Hz carrier, a 25-ms envelope period, and a tomwalters@0: 4-ms half life is shown in Figure 6.1. The level of the ramped tomwalters@0: sinusoid rises rapidly, relative to the decay rate of the impulse tomwalters@0: response of the auditory filter and, as a result, the activity in the tomwalters@0: rising part of the NAP is dominated by carrier-period time intervals tomwalters@0: (Patterson, 1994a). When the amplitude of the ramped sinusoid drops tomwalters@0: abruptly, the energy stored in the filter decays away in a wave with tomwalters@0: periods appropriate to the centre frequency of the channel. Now tomwalters@0: consider the activity produced by this NAP in the 640-Hz channel of tomwalters@0: the auditory image for strobe criteria 2, 3 and 4, the 'every peak', tomwalters@0: 'temporal shaddow,' and 'local max' criteria, respectively. tomwalters@0: tomwalters@0: Figure 6.2a shows the case where there is no adaptive tomwalters@0: threshold and the mechanism strobes on the peak of every NAP pulse. tomwalters@0: This is the version of STI most similar to autocorrelation. Strobing tomwalters@0: on every peak causes carrier periods from the ramp to be mixed with tomwalters@0: centre-frequency periods after the offset of the ramp. This is the tomwalters@0: source of the irregularity in Fig. 6.2a, and the source of the tomwalters@0: irregular fringe in the full auditory image (Fig. 4.5a) (Allerhand and tomwalters@0: Patterson, 1992). tomwalters@0: tomwalters@0: The activity produced with the temporal shadow criterion is tomwalters@0: shown in the Figure 6.2b. The adaptive threshold function and the tomwalters@0: strobe points shown with the NAP in Fig. 6.1 were generated with the tomwalters@0: temporal shaddow criterion. In this case, the mechanism initiates tomwalters@0: temporal integration on each peak in the ramped portion of the NAP, tomwalters@0: but it skips the peaks associated with the ringing of the filter after tomwalters@0: the ramp terminates. Strobing occurs in synchrony with the carrier tomwalters@0: periods in the ramped portion of the NAP and this removes the tomwalters@0: irregularity from the ramped portion of the auditory image between 0 tomwalters@0: ms and about 10 ms. There is still irregularity in the region from 0 tomwalters@0: to -10 ms, and in the region from 25 to 15 ms, because strobing in tomwalters@0: synchrony with the carrier period mixes carrier periods and centre tomwalters@0: frequency periods in this region of the image. tomwalters@0: tomwalters@0: A further improvement occurs when the local max criterion is tomwalters@0: introduced and strobing on successive carrier periods of the ramped tomwalters@0: section of the NAP is suppressed. The activity in the 640-Hz channel tomwalters@0: of the image is shown in Figure 6.2c. The irregular activity has been tomwalters@0: removed; the image shows carrier periods to the left of the 0-ms point tomwalters@0: and centre frequency periods to the right of the 0-ms point. Thus, tomwalters@0: strobing on local maxima synchronises temporal integration to the tomwalters@0: period of the wave and preserves not only the basic asymmetry of the tomwalters@0: NAP, but also the contrasting time interval patterns associated with tomwalters@0: different sections of the NAP cycle. tomwalters@0: tomwalters@0: tomwalters@0: tomwalters@0: REFERENCES tomwalters@0: tomwalters@0: Akeroyd, M.A. and Patterson, R.D. (1995). "Discrimination of wideband tomwalters@0: noises modulated by a temporally asymmetric function," tomwalters@0: J. Acoust. Soc. Am. (in press). tomwalters@0: tomwalters@0: Assman, P. F. and Q. Summerfield (1990). "Modelling the perception of tomwalters@0: concurrent vowels: Vowels with different fundamental frequencies," tomwalters@0: J. Acoust. Soc. Am. 88, 680-697. tomwalters@0: tomwalters@0: Brown, G.J. and Cooke, M. (1994). "Computational auditory scene tomwalters@0: analysis," Computer Speech and Language 8, 297-336. tomwalters@0: tomwalters@0: Irino, T. and Patterson, R.D. (1996). "Temporal asymmetry in the tomwalters@0: auditory system," J. Acoust. Soc. Am. (revision submitted tomwalters@0: August 95). tomwalters@0: tomwalters@0: McKeown, D. and Patterson, R.D. (1995). "The time course of auditory tomwalters@0: segregation: concurrant vowels that vary in duration," tomwalters@0: J. Acoust. Soc. Am. (in press). tomwalters@0: tomwalters@0: Meddis, R. and M. J. Hewitt (1991a). "Virtual pitch and phase tomwalters@0: sensitivity of a computer model of the auditory periphery: I tomwalters@0: pitch identification," J. Acoust. Soc. Am. 89, 2866-82. tomwalters@0: tomwalters@0: Meddis, R. and M. J. Hewitt (1991b). "Virtual pitch and phase tomwalters@0: sensitivity of a computer model of the auditory periphery: II tomwalters@0: phase sensitivity," J. Acoust. Soc. Am. 89, 2883-94. tomwalters@0: tomwalters@0: Patterson, R.D. (1987b). "A pulse ribbon model of monaural tomwalters@0: phase perception," J. Acoust. Soc. Am. 82, 1560-1586. tomwalters@0: tomwalters@0: Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, tomwalters@0: C. and Allerhand M. (1992) "Complex sounds and auditory images," tomwalters@0: In: Auditory physiology and perception, Y Cazals, L. Demany, tomwalters@0: K. Horner (eds), Pergamon, Oxford, 429-446. tomwalters@0: tomwalters@0: Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models," tomwalters@0: J. Acoust. Soc. Am. 96, 1409-1418. tomwalters@0: tomwalters@0: Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval tomwalters@0: models." J. Acoust. Soc. Am. 96, 1419-1428. tomwalters@0: tomwalters@0: Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and tomwalters@0: sound quality," in: Advances in Hearing Research: Proceedings of tomwalters@0: the 10th International Symposium on Hearing, G. Manley, G. Klump, tomwalters@0: C. Koppl, H. Fastl, & H. Oeckinghaus, (Eds). World Scientific, tomwalters@0: Singapore, (in press). tomwalters@0: tomwalters@0: Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain tomwalters@0: modelling of peripheral auditory processing: A modular architecture tomwalters@0: and a software platform," J. Acoust. Soc. Am. 98, (in press). tomwalters@0: tomwalters@0: Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to tomwalters@0: identify the instrument, the octave, or the pitch-chroma of a tomwalters@0: musical note," Music Perception (in press). tomwalters@0: tomwalters@0: Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to tomwalters@0: identify vowels, their octave, and their pitch-chroma," J. Acoust. Soc. tomwalters@0: Am 98, (in press). tomwalters@0: tomwalters@0: Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in tomwalters@0: Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, tomwalters@0: Albuquerque, New Mexico. tomwalters@0: tomwalters@0: tomwalters@0: tomwalters@0: tomwalters@0: =========================================================================== tomwalters@0: #!/bin/sh tomwalters@0: tomwalters@0: # script/aimStrobeCriterion tomwalters@0: # Annotated script for generating the figures in docs/aimStrobeCriterion tomwalters@0: tomwalters@0: echo "FIGURES FOR SECTION 0" tomwalters@0: tomwalters@0: mv .gennaprc .oldgennaprc # a safety precaution tomwalters@0: mv .gensairc .oldgensairc # a safety precaution tomwalters@0: echo | gennap powc=off -update # make sure that powc is off tomwalters@0: echo | gensai powc=off -update # make sure that powc is off tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 0" tomwalters@0: echo "Figure 0.1: Neural Activity Pattern (NAP) of cegc" tomwalters@0: gennap input=cegc_br top=3000 swap=off bits=12 gain_gtf=4 # all default values tomwalters@0: tomwalters@0: echo "Figure 0.2: Stabilised Auditory Image (SAI) of cegc" tomwalters@0: gensai stcrit=3 input=cegc_br length=100ms frstep_aid=96ms top=2500 tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 1" tomwalters@0: tomwalters@0: echo "Figure 1.1 SAI of cegc strobing on every non-zero point in the NAP" tomwalters@0: echo " (stcrit_ai=1). This one is slow to calculate." tomwalters@0: gensai stcrit_ai=1 top=17000 input=cegc_br length=100ms frstep_aid=96ms tomwalters@0: tomwalters@0: # Top has to be raised because this strobe criterion causes constant tomwalters@0: # temporal integration. tomwalters@0: tomwalters@0: tomwalters@0: echo "Figure 1.2: SAI via autocorrelation -- a correlogram" tomwalters@0: echo | gennap input=cegc_br display=off length=125ms top=3000 output=stdout > cegc_br_gtf.nap tomwalters@0: #gennap -use start=48 display=on cegc_br_gtf # optional display of the NAP tomwalters@0: # After making a NAP with display=off, gennap -use requires you to set display=on. tomwalters@0: tomwalters@0: acgram start=50 wid=70ms lag=35ms frames=1 scale=.02 cegc_br_gtf.nap > cegc_gtf.sai tomwalters@0: gensai -use top=5000 input=cegc_gtf tomwalters@0: tomwalters@0: rm cegc_br_gtf.nap cegc_gtf.sai tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 2" tomwalters@0: tomwalters@0: echo "Figure 2.1: SAI of cegc strobing on the peak of every NAP pulse" tomwalters@0: echo " (stcrit_ai=2)" tomwalters@0: gensai stcrit_ai=2 top=10000 input=cegc_br length=100ms frstep_aid=96ms tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 3" tomwalters@0: tomwalters@0: echo "Demonstration of preservation of asymmetry when stthresh is elevated" tomwalters@0: # Note stthresh only operates when stcrit_ai=1. tomwalters@0: gensai stcrit_ai=1 top=5000 input=cegc_br length=68ms frstep_aid=66ms stthresh_ai=5000 tomwalters@0: tomwalters@0: echo "Figure 3.1: NAP of cegc with temporal shaddow criterion (stcrit_ai=3)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay cegc_br 1000 100 3 2.5 17000 2000 tomwalters@0: tomwalters@0: # Type 'StrobeCriterionDisplay -help' for a listing of the options and tomwalters@0: # their order. tomwalters@0: # Control of Xplots: tomwalters@0: # Click mouse button 1 to display coordinates of points. tomwalters@0: # Click mouse button 2 to redraw. tomwalters@0: # Click mouse button 3 to remove the display (i.e. quit). tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 4" tomwalters@0: tomwalters@0: echo "Figure 4.1a: Waveform of Damped Sinusoid (4 cycles)" tomwalters@0: genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_d swap=on tomwalters@0: tomwalters@0: echo "Figure 4.2a: Waveform of Ramped Sinusoid (4 cycles)" tomwalters@0: genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_r swap=on tomwalters@0: tomwalters@0: echo "Figure 4.1b: NAP of the Damped Sinusoid (2 cycles)" tomwalters@0: gennap input=dr_f8_t4_d gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > damped.nap tomwalters@0: gennap -use start=50 leng=50 display=on damped tomwalters@0: tomwalters@0: echo "Figure 4.2b: NAP of the Ramped Sinusoid (2 cycles)" tomwalters@0: gennap input=dr_f8_t4_r gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > ramped.nap tomwalters@0: gennap -use start=60 leng=50 display=on ramped tomwalters@0: tomwalters@0: rm damped.nap ramped.nap tomwalters@0: tomwalters@0: echo "Figure 4.3a: SAI of the Damped Sinusoid strobing on every NAP peak" tomwalters@0: echo " (stcrit_ai=2)" tomwalters@0: gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 4.4a: SAI of the Ramped Sinusoid strobing on every NAP peak" tomwalters@0: echo " (stcrit_ai=2)" tomwalters@0: gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 4.3b: SAI of the Damped Sinusoid with temporal shaddow criterion" tomwalters@0: echo " (stcrit_ai=3)" tomwalters@0: gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=1000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 4.4b: SAI of the Ramped Sinusoid with temporal shaddow criterion" tomwalters@0: echo " (stcrit_ai=3)" tomwalters@0: gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 4.3c: SAI of the Damped Sinusoid with the local max criterion" tomwalters@0: echo " (stcrit_ai=4)" tomwalters@0: gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 4.4c: SAI of the Ramped Sinusoid with the local max criterion" tomwalters@0: echo " (stcrit_ai=4)" tomwalters@0: gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo | gennap swap=on bits=16 gain_gtf=0.0625 -update tomwalters@0: echo | gensai swap=on bits=16 gain_gtf=0.0625 -update tomwalters@0: tomwalters@0: tomwalters@0: echo "Figure 4.5a: NAP of Damped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay dr_f8_t4_d 800 120 3 2.5 14000 2400 tomwalters@0: tomwalters@0: echo "Figure 4.5b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay dr_f8_t4_d 800 120 4 2.5 14000 2400 tomwalters@0: tomwalters@0: echo "Figure 4.6a: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay dr_f8_t4_r 800 120 3 2.5 7500 2400 tomwalters@0: tomwalters@0: echo "Figure 4.6b: NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay dr_f8_t4_r 800 120 4 2.5 7500 2400 tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 5" tomwalters@0: tomwalters@0: echo tomwalters@0: echo "FIGURES FOR SECTION 6" tomwalters@0: tomwalters@0: echo "Figure 6.1: NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)" tomwalters@0: echo " Single Channel NAP with Strobe Threshold and Strobe Points below NAP" tomwalters@0: StrobeCriterionDisplay dr_f8_t4_r 640 120 3 2.5 7000 2000 tomwalters@0: tomwalters@0: echo "Figure 6.2a: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=2)" tomwalters@0: gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=32000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: echo "Figure 6.2b: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=3)" tomwalters@0: gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=10000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: echo "Figure 6.2c: SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=4)" tomwalters@0: gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=1200 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 tomwalters@0: tomwalters@0: