aim92: docs/aimStrobeCriterion comparison

comparison docs/aimStrobeCriterion @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).

author	tomwalters
date	Fri, 20 May 2011 15:19:45 +0100
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:5242703e91d3
+docs/aimStrobeCriterion (text)
+scripts/aimStrobeCriterion (figures)
+STROBED TEMPORAL INTEGRATION AND THE STABILISED AUDITORY IMAGE
+Roy D. Patterson, Jay Datta and Mike Allerhand
+MRC Applied Psychology Unit
+15 Chaucer Road, Cambridge, CB2 2EF UK
+email:  roy.patterson, jay.datta or mike.allerhand  @mrc-apu.cam.ac.uk
+2 August 1995
+ABSTRACT
+	This document describes the Strobed Temporal Integration
+mechanism used to convert neural activity patterns into stabilised
+auditory images. The specific version of the Auditory Image Model is
+AIM R7, as described in Patterson, Allerhand, and Giguere (1995)
+INTRODUCTION
+	When a periodic sound occurs with a pitch in the musical
+range, the cochlea produces a detailed, multi-channel, time-interval
+pattern that repeats once per cycle of the wave.  The auditory images
+that we hear in response to periodic sounds are perfectly stable.
+That is, despite the fact that the level of activity in the neural
+activity pattern is fluctuating over a large range within the course
+of each cycle, the loudness of the sound is fixed.  This indicates
+that some form of temporal integration is applied to the NAP prior to
+our initial perception of the sound.  The auditory images of periodic
+sounds can have a very rich timbre, or sound quality, that can reveal
+a great deal about the sound source such as the quality of the musical
+instrument or the finesse of the musician.  This suggests that much of
+the detailed time-interval information produced by the cochlea is
+preserved in the stabilised auditory image.
+	The fact that we hear stable auditory images with rich sound
+quality presents auditory theorists with a problem.  The temporal
+integration mechanism in traditional auditory models is a low-pass
+filter that removes the fine-grain time-interval information from the
+internal representation of the sound -- time interval information that
+appears to be required for timbre perception.  Strobed temporal
+integration was introduced to solve this problem. At one and the same
+time, it performs the temporal integration necessary to produce stable
+auditory images and it preserved the majority of the time-interval
+information observed in the neural activity pattern (NAP) produced by
+the cochlea.
+	It is not a difficult problem to produce a high-resolution,
+stabilised version of the NAP provided you know the moment in time at
+which the pattern in the NAP will repeat.  For example, consider the
+NAP of the first note of the wave CEGC in Figure 0.1 from Patterson et
+al. (1992).  The wave is a train of clicks separated by 8-ms gaps; the
+upper channels of the NAP show that the response is a sequence of
+filter impulse responses spaced at 8 ms intervals.  A stabilised
+representation of the NAP can be produced by setting up an image
+buffer that has the same number of channels as the NAP, and simply
+transferring a copy of the pattern in each channel of the NAP to the
+corresponding channel of the image buffer once every 8 ms.  In the
+NAP, the pattern flows from right to left as time progresses, and
+since the cycles are continually entering the NAP from the right hand
+side and exiting the NAP from the left hand side, the pattern after
+every 8 ms is identical to the pattern 8 ms ago.  So if the transfer
+from the NAP to the auditory image is performed every 8 ms exactly,
+successive contributions from the NAP to the image are all identical.
+	In the image buffer, activity does not move from right to
+left, it simply decays into the floor exponentially over time with a
+half life of about 30 ms.  When a new contribution arrives from the
+NAP, it is added point for point with whatever is currently in the
+corresponding channel of the image buffer.  In the current example,
+after a copy of the NAP arrives in the auditory image, and during the
+30 ms over which it would decay to half its original value, three more
+copies of the NAP pattern arrive and are added into the auditory
+image.  Thus, for typical musical notes and typical vowels, the rate
+of temporal integration from the NAP into the auditory image is high
+and there is little time between successive integration events for the
+image itself to decay.  This is the source of the stability of the
+auditory image.
+	Provided the integration is performed once per cycle of the
+sound, the majority of the time-interval information in the NAP will
+be preserved in the auditory image, thereby providing a solution to
+the problem of how to produce stable images without removing the
+fine-grain time-interval information associated with sound quality.
+The auditory image produced by this process is shown in Figure 0.2
+from Patterson et al. (1992).  The transfer is performed on each
+channel of the NAP separately and it is performed at the point in the
+cycle where the activity in the NAP is a maximum.  The maximum of the
+most recent cycle to arrive in the NAP is added into the auditory
+image at the 0-ms point, and as a result, the NAP peaks are aligned
+vertically in the auditory image.  This passive alignment process
+explains the loss of global phase information observed empirically
+(see Patterson, 1987, for a review).
+	Thus it would appear that the problem of converting the
+oscillating NAP into a stabilised, high-resolution image reduces to
+the problem of finding the pitch of the sound and performing temporal
+integration at multiples of the pitch period.  There are now a number
+of computational auditory models with a proven ability to extract the
+pitch of complex sounds (see Brown and Cook, 1994, for a review) and
+they could be used to direct strobed temporal integration.  However,
+experiments with vowels (McKeown and Patterson, 1995; Robinson and
+Patterson, 1995a) and musical notes (Robinson and Patterson, 1995b)
+indicate that 4 to 8 cycles of the sound are required to produce an
+accurate estimate of the pitch, whereas the sound quality information
+necessary to identify a vowel or a musical instrument can be extracted
+from one cycle of the wave.  This suggests, that if the auditory
+system does use strobed temporal integration to produce a stable, high
+resolution auditory image, it does it with a mechanism that operates
+more locally in time than pitch extraction mechanisms.  This is the
+background that led to the development of the strobed temporal
+integration mechanism in the auditory image model.
+	In Sections 1 and 2 of this document, following Allerhand and
+Patterson (1992), we describe two simple criteria for selecting strobe
+points in the NAP and show that they produce auditory images that are
+very similar to the correlograms produced by Assman and Summerfield
+(1990), Slaney and Lyon (1990), or Meddis and Hewitt (1991a, 1991b).
+The structures that arise in this form of auditory image are much more
+symmetric than the corresponding structures in the NAP (Allerhand and
+Patterson, 1992).  There is mounting evidence, however, that the
+auditory system is highly sensitive to temporal asymmetry (Patterson,
+1994a, 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996),
+and so the loss of asymmetry associated with the simple strobe
+criterion seems likely to limit the value of this representation of
+our perceptions.  In the remaining Sections, an ordered sequence of
+restrictions is added to the simple criteria for initiating temporal
+integration, to restore asymmetry to the structures that arise in the
+auditory image.
+1.  Strobe on Every Non-Zero Point in the NAP.
+	The initial criterion is very simple; temporal integration is
+initiated on each and every non-zero point in the NAP.  In AIM
+software, the option that determines which strobe criterion will be
+used is 'stcrit_ai' and it is set equal to one for this simplest
+strobe criterion. Allerhand and Patterson (1992) showed that when
+temporal integration from the NAP to the auditory image is initiated
+on each and every non-zero point in the NAP function, the result is
+very similar to a correlogram -- a representation that is commonly
+used in time-domain models of hearing to extract the pitch of complex
+periodic sounds (see Brown and Cook, 1994, for a review).  For
+example, compare the auditory image with stcrit_ai=1 (Figure 1.1) and
+the correlogram (Figure 1.2) of the first note of the sound cegc.
+Both figures show stabilised representations of the time-interval
+pattern that the sound produces in the NAP, and in both cases, the
+individual channels have been aligned vertically on the largest peak
+in the NAP function.  The patterns in the auditory image and the
+correlogram both differ from the pattern in the NAP in one important
+way; there is a reflection of the NAP pulses associated with the
+ringing of the auditory filters, on the side opposite to where they
+originally appear.  That is, autocorrelation and STI with stcrit_ai=1
+reduce the temporal asymmetry observed in the NAP.  The asymmetry
+information is not entirely removed but it is largely removed.
+Experiments with sounds that have asymmetric temporal modulation show
+that listeners are sensitive to temporal asymmetry (Patterson, 1994a,
+1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), and so
+the removal of asymmetry information seems likely to prove a
+disadvantage when attempting to explain auditory perception.
+	The autocorrelation process is symmetric in time by its very
+nature. Mechanical processes that produce sound in the world are
+typically asymmetric in time because they usually have some inertia.
+Resonators struck impulsively ring after the pulse and not before.
+This principle also applies to the processes that analyse the sound in
+the auditory system.  The impulse response of the auditory filter
+rises faster than it falls; the adaptation process in the inner
+haircell adapts up faster at the onset of a sound than it adapts down
+after the sound passes.  So asymmetry is the norm in the world and it
+is not surprising that the auditory system is sensitive to it.
+2.  Strobe on the Peak of Each NAP pulse.
+	When temporal integration is initiated on every non-zero NAP
+point, the successive NAP functions that are transferred to the
+auditory image are highly correlated.  This suggests that we could
+attain essentially the same auditory image for vastly less computation
+by restricting temporal integration to the larger points on the
+individual NAP pulses.  This leads, in turn, to the suggestion that
+temporal integration be limited to the peak of the individual NAP
+pulses.  The result of this restriction is illustrated in Figure 2.1
+which shows the auditory image of the first note of CEGC with this
+more restricted strobe criterion.  Since the peak restriction greatly
+reduces the rate of temporal integration, the absolute levels of
+structures in this form of auditory image are considerably lower than
+those in the previous form of image.  The pattern of time intervals,
+however, is very similar in the two forms of auditory image.  They
+both preserve a detailed representation of the time-interval pattern
+in the NAP, and, they both loose much of the asymmetry in the NAP.
+3.  Avoid Strobing in the Temporal Shadow after a large NAP Pulse.
+	The loss of asymmetry in the click-train structure of the
+auditory image, arises when temporal integration is initiated on the
+smaller NAP pulses associated with the ringing of the auditory filters
+after each click in the train.  This can be demonstrated by
+introducing a fixed strobe threshold below which NAP peaks do not
+initiate temporal integration, and progressively raising this strobe
+threshold to exclude more and more of the lower level NAP pulses.  (In
+AIM, a fixed threshold is set with option stthresh_ai and
+stcrit_ai=1.)  The auditory image becomes less and less symmetric and
+more and more like the original NAP pattern for the click train as the
+strobe threshold is increased.  Fixed thresholds of this sort are not
+realistic for simulating the operation of auditory system, firstly
+because the strobe threshold eventually exceeds the largest NAP pulse
+and temporal integration ceases entirely, and secondly because, in the
+natural environment, the levels of sounds are constantly changing.
+Nevertheless, the example illustrates how NAP asymmetry is lost with
+simple strobe criteria.  The problem with autocorrelation is similar;
+the correlation values at lags associated with the smaller NAP pulses
+introduce symmetric reflections into structure that appear in the
+correlogram.
+	An alternative means of restricting temporal integration to
+the larger pulses in the NAP of the click train is to use an adaptive
+strobe threshold which is temporally asymmetric.  In the simplest
+case, when the strobe unit monitoring a NAP channel encounters a
+pulse, strobe threshold is set to the full height of the NAP pulse.
+But following the peak threshold does not fall as fast as the NAP
+function, rather it is restricted to decaying at a fixed percentage of
+the peak height per ms.  In AIM, the rate of decay is set to 5% per
+ms, so the threshold decays faster after larger peaks, and in the
+absence of further NAP peaks, returns to 0 in 20 ms.  The NAP function
+for the 1.0-kHz channel of the NAP is presented in Figure 3.1 along
+with the adaptive threshold function. Together they illustrate what is
+referred to as the "temporal shadow criterion" for strobed temporal
+integration.
+	In the figure, the vertical lines below the abscissa of the
+NAP function mark the NAP pulses that initiate temporal integration.
+They show that the first NAP pulse strobes temporal integration and
+strobe threshold is set to the peak height.  It immediately begins to
+decay, but then it encounters another NAP pulse that exceeds strobe
+threshold and so the process of strobing temporal integration and
+raising strobe threshold is promptly repeated.  At this point,
+however, strobe threshold is high relative to the NAP pulses and,
+strobe threshold is falling more slowly than the NAP pulses, so the
+algorithm proceeds through the rest of the cycle without encountering
+another NAP pulse from the ringing part of the NAP function.  In this
+way, the strobe mechanism is synchronised to the period of the sound
+even though no explicit information about the pitch of the sound is
+provided to the strobe mechanism.  It is the auditory image with the
+temporal shaddow criterion that was presented originally in Figure
+0.2. (stcrit_ai=3).
+	The 'temporal shadow criterion' produces stable auditory
+images with accurate, asymmetry for a wide variety of naturally
+occurring sounds like vowels and musical notes.  The reason is that
+the NAPs of these sounds have a restricted range of periods and within
+those periods the asymmetry is typically characterised by the
+rapid-rise/slow-fall form.  There are, however, periodic sounds with
+very low pitch and NAP functions that rise slowly over the course of
+the period and fall rapidly at the end of the period, and the
+perceptions produced by these sounds indicate that the auditory strobe
+mechanism is somewhat more sophisticated than the temporal shadow
+strobe mechanism.  These "ramped" sounds are the subject of the next
+section.
+4.  Avoid Temporal Integration on NAP Peaks Followed by Larger NAP Peaks.
+	A pair of the sounds that illustrate the limitations of the
+temporal shadow criterion are presented in Figures 4.1a and 4.2a; the
+former is an exponentially damped sinusoid that repeats every 25-ms,
+the latter is an exponentially ramped sinusoid with the same envelope
+period.  The carrier frequency in this case is 800 Hz and the half
+life of the exponential is 4-ms.  The half life is on the same order
+as the exponential decay of the impulse response of a gammatone
+auditory filter with a centre frequency in the region of 800 Hz.  The
+example is taken from Patterson (1994a).
+	The neural activity patterns produced by the damped and ramped
+sinusoids are shown in Figures 4.1b and 4.2b, respectively.  The
+frequency range of the filterbank is from an octave below the carrier
+frequency to an octave above the carrier frequency.  The highest and
+lowest channels in Figure 4.1b show the transient response of the
+filterbank to the onset of the damped sinusoid, and similarly the
+high- and low-frequency channels in Figure 4.2b show the transient
+response of the filterbank to the offset of the ramped sinusoid.  In
+the high-frequency channels, the onset response of the damped sinusoid
+and the offset response of the ramped sinusoid are composed of impulse
+responses from the individual auditory filters.  The centre section of
+each figure shows the response to the carrier.  Here we see that the
+asymmetry in the waveform is preserved in the NAP: in Figure 4.1b, the
+carrier component is at its highest level just as the transient
+response ends and the carrier component decays away over the course of
+the period; in Figure 4.2b, the carrier activity rises over the course
+of the ramped cycle and ends at its peak level in the transient
+response.
+	Auditory images of these damped and ramped sinusoids are
+presented in Figures 4.3 and 4.4, respectively.  The upper rows show
+the images obtained when the strobe initiates temporal integration on
+every peak in the NAP; the middle rows show the images obtained with
+the temporal shadow criterion.  The images in the upper row illustrate
+the problem of preserving NAP asymmetry during temporal integration.
+When the mechanism strobes on every peak, the temporal asymmetry
+observed in the NAP of the damped sinusoid is actually reversed in the
+auditory image of the damped sinusoid (Figure 4.3a).  In the case of
+the ramped sinusoid, the asymmetry observed in the NAP is largely lost
+in the image of the ramped sinusoid (Figure 4.4a); there is activity
+at all time intervals in the central channels, whereas there is a gap
+in activity in the NAP of the ramped sinusoid, once per cycle, just
+after the abrupt reduction in amplitude.  It is also the case that
+there are irregular fringes along the edges of the main structure in
+the auditory image of the ramped sinusoid (Figure 4.4a).  This
+provides further evidence that the time interval pattern in the NAP is
+being disrupted by the temporal integration process in the
+construction of the auditory image.
+	The introduction of the temporal shadow criterion for
+initiating temporal integration produces a dramatic improvement in the
+auditory image of the damped sinusoid (Figure 4.3b).  The structure in
+the image is highly asymmetric and, once the alignment process is
+taken into account, the structure in the image is seen to be a very
+faithful reproduction of that in the NAP.  The imposition of the
+temporal shadow criterion improves the auditory image of the ramped
+sound (Figure 4.4b). in as much as it eliminates the fringes seen in
+Figure 4.4a.  But it does not solve the asymmetry problem.  The
+structure in the auditory image of Figure 4.4a is still more symmetric
+than it is asymmetric, whereas the structure in the corresponding NAP
+is highly asymmetric.
+	The source of the problem is illustrated in Figures 4.5a and
+4.6a which show the NAPs and adaptive thresholds for 80-ms segments of
+the damped and ramped sinusoids, respectively.  The vertical markers
+below the abscissa in Figure 4.5a show that after the first cycle, the
+strobe mechanism is synchronised to the period of the wave and
+initiates temporal integration once per cycle on the largest NAP peak.
+So this criterion preserves the asymmetry of the damped sound in its
+auditory image.  In contrast, Figure 4.6a shows that on the way up the
+ramped portion of each cycle, the rising NAP pulses repeatedly exceed
+the adaptive threshold resulting in repeated initiation of temporal
+integration.  Since, in this region of the cycle, the mechanism
+initiates temporal integration on every cycle, the auditory image does
+not preserve the asymmetry observed in the corresponding NAP.  The
+irregular fringe is reduced because the mechanism reliably skips the
+portion of the cycle where the level of activity in the NAP is
+changing most rapidly.
+	The high rate of strobing revealed in Figure 4.6a means that
+the level of activity in the ramped auditory image of Figure 4.4b is
+considerably greater than that in the damped image (Figure 4.3b).  It
+does not show in those Figures because they have been normalised for
+display purposes.  In terms of the auditory model, however, the
+greater overall level in the image of the ramped sound would lead to
+the prediction that ramped sounds are considerably louder than damped
+sounds, and this is not the case; they have roughly equal loudness.
+All of these observations taken together suggest that the strobe rate
+should be limited and that the limitation should favour larger NAP
+peaks, closer to the local maximum.
+	The solution in this case is to delay temporal integration a
+few milliseconds after each suprathreshold NAP pulse, to determine
+whether another, larger, NAP pulse is about to occur.  Specifically,
+when a NAP peak is identified, it is labeled as a potential strobe
+point, but the initiation of temporal integration is delayed for
+several milliseconds.  In AIM, the value is set with option
+'stlag_ai'.  If, during this time, no new larger NAP pulses are
+encountered, the candidate strobe point is used to initiate temporal
+integration.  If a larger NAP pulse is encountered, it becomes the new
+strobe candidate and replaces the previous strobe candidate, the
+strobe lag is reset to stlag_ai ms and the process begins again.  The
+auditory images of damped and ramped sinusoids produced with this
+'local-max' strobe criterion are shown in Figures 4.3c and 4.4c,
+respectively.  The strobe lag restriction has virtually no effect on
+the auditory image of the damped sinusoid, but it improves the image
+of the ramped sinusoid markedly.  The asymmetry observed in the NAP of
+the ramped sinusoid is now preserved in its auditory image.
+	The NAP functions and the adaptive thresholds for the damped
+and ramped sinusoids are shown in Figures 4.5b and 4.6b, respectively.
+A comparison of the strobe points for the damped sinusoid under the
+temporal shadow criterion (Figure 4.5a) and the local max criterion
+(Figure 4.5b) shows that there is one small difference; the very first
+strobe point under the temporal shadow criterion is omitted under the
+local max criterion because a larger NAP pulse follows it within
+stlag_ai ms.  So the second NAP pulse replaces the first as the strobe
+candidate.  In the case of the ramped sinusoid, shifting to the local
+max criterion has a dramatic effect.  The NAP functions and adaptive
+thresholds in Figures 4.6a and 7.6b are identical, but most of the
+strobe points identified under the temporal shadow criterion (Figure
+4.6a) are immediately followed by larger NAP pulses as we proceed up
+the ramp.  As a result the majority of the candidate pulses are
+repressed in favour of the one that occurs at the offset of the ramp.
+So, with the exception of the onset of the sound, the mechanism
+synchronises to the period of the sound and there is one strobe per
+cycle of the sound.  The local max criterion also leads to damped and
+ramped auditory images with roughly the same level of activity in the
+auditory image, and so it is also a better predictor of the loudness
+of these sounds. Finally, note that the strobe lag restricts the
+maximum strobe rate of the mechanism. This is important because,
+without it, the level of a sinusoid would increase with its frequency
+in the auditory image.
+5.  Limiting the Lag of the Local Max Criterion.
+	In the second experiment with damped and ramped sinusoids
+(Patterson, 1994b), the longest envelope period was 100-ms, and in
+that condition, the distinction between damped and ramped sinusoids is
+audible for half lives as long as 64 ms.  In channels near the carrier
+frequency, the NAP function produced by the ramped sinusoid is a long,
+slowly rising, sequence of peaks.  The local-max strobe criterion
+delays temporal integration to the end of the ramp and initiates
+temporal integration once per cycle, as previously, with the 25-ms
+envelope stimuli.  The example, however, raises the question of what
+would happen in the case of a very long duration slowly rising tone,
+say a tone that rises from absolute threshold to 80 dB SPL over the
+course of 5 seconds.  A listener would undoubtedly hear the sound
+shortly after it comes on, and hear its loudness increase
+progressively over the course of the 5-second rise.  The local-max
+strobe mechanism would initiate temporal integration once, shortly
+after the onset of the sound, because of overshoot in the neural
+encoding stage of AIM. But thereafter, it would suppress temporal
+integration throughout the rise of the NAP function and strobe once at
+the end of the rise.  Thus the auditory image would be empty at a time
+when we know the listener would hear the tone.  To solve this problem,
+the strobe lag of the local max mechanism is limited to twice the
+stlag_ai value; that is, after a NAP pulse becomes a strobe candidate,
+either that NAP pulse or a larger one must initiate temporal
+integration within the next 2*stlag_ai ms. So the strobe lag restricts
+not only the maximum strobe rate for static sinusoids, but also the
+minimum strobe rate for slowly increasing sinusoids.
+6. Aperiodic Strobing and Irregularity in the Auditory Image.
+	To this point, the discussion of strobe criteria has focussed
+on activity in the carrier channel of the NAP and auditory image, and
+the relationship between strobe criteria and the preservation of NAP
+asymmetry through temporal integration.  It was noted in passing,
+that, away from the carrier channel, auditory images of ramped sounds
+have fringes of irregular activity, for all strobe criteria prior to
+the local max criterion.  We might expect such fringes to impart a
+roughness or noisy quality to the perception of ramped sounds, but
+typically they are static and clear.  In this final Section, the
+activity produced by a ramped sinusoid in the 640 Hz channel of the
+NAP and auditory image is examined, to illustrate the relationship
+between strobe restrictions and the fringe of irregularity in the
+auditory image.
+	The NAP produced in the 640 Hz channel of the filterbank by a
+ramped sinusoid with an 800-Hz carrier, a 25-ms envelope period, and a
+4-ms half life is shown in Figure 6.1.  The level of the ramped
+sinusoid rises rapidly, relative to the decay rate of the impulse
+response of the auditory filter and, as a result, the activity in the
+rising part of the NAP is dominated by carrier-period time intervals
+(Patterson, 1994a).  When the amplitude of the ramped sinusoid drops
+abruptly, the energy stored in the filter decays away in a wave with
+periods appropriate to the centre frequency of the channel.  Now
+consider the activity produced by this NAP in the 640-Hz channel of
+the auditory image for strobe criteria 2, 3 and 4, the 'every peak',
+'temporal shaddow,' and 'local max' criteria, respectively.
+	Figure 6.2a shows the case where there is no adaptive
+threshold and the mechanism strobes on the peak of every NAP pulse.
+This is the version of STI most similar to autocorrelation.  Strobing
+on every peak causes carrier periods from the ramp to be mixed with
+centre-frequency periods after the offset of the ramp.  This is the
+source of the irregularity in Fig. 6.2a, and the source of the
+irregular fringe in the full auditory image (Fig. 4.5a) (Allerhand and
+Patterson, 1992).
+	The activity produced with the temporal shadow criterion is
+shown in the Figure 6.2b. The adaptive threshold function and the
+strobe points shown with the NAP in Fig. 6.1 were generated with the
+temporal shaddow criterion.  In this case, the mechanism initiates
+temporal integration on each peak in the ramped portion of the NAP,
+but it skips the peaks associated with the ringing of the filter after
+the ramp terminates.  Strobing occurs in synchrony with the carrier
+periods in the ramped portion of the NAP and this removes the
+irregularity from the ramped portion of the auditory image between 0
+ms and about 10 ms.  There is still irregularity in the region from 0
+to -10 ms, and in the region from 25 to 15 ms, because strobing in
+synchrony with the carrier period mixes carrier periods and centre
+frequency periods in this region of the image.
+	A further improvement occurs when the local max criterion is
+introduced and strobing on successive carrier periods of the ramped
+section of the NAP is suppressed.  The activity in the 640-Hz channel
+of the image is shown in Figure 6.2c.  The irregular activity has been
+removed; the image shows carrier periods to the left of the 0-ms point
+and centre frequency periods to the right of the 0-ms point.  Thus,
+strobing on local maxima synchronises temporal integration to the
+period of the wave and preserves not only the basic asymmetry of the
+NAP, but also the contrasting time interval patterns associated with
+different sections of the NAP cycle.
+REFERENCES
+Akeroyd, M.A. and Patterson, R.D. (1995). "Discrimination of wideband
+noises modulated by a temporally asymmetric function,"
+J. Acoust. Soc. Am. (in press).
+Assman, P. F. and Q. Summerfield (1990). "Modelling the perception of
+concurrent vowels: Vowels with different fundamental frequencies,"
+J. Acoust. Soc. Am. 88, 680-697.
+Brown, G.J. and Cooke, M. (1994). "Computational auditory scene
+analysis," Computer Speech and Language 8, 297-336.
+Irino, T. and Patterson, R.D. (1996). "Temporal asymmetry in the
+auditory system," J. Acoust. Soc. Am. (revision submitted
+August 95).
+McKeown, D. and Patterson, R.D. (1995). "The time course of auditory
+segregation: concurrant vowels that vary in duration,"
+J. Acoust. Soc. Am. (in press).
+Meddis, R. and M. J. Hewitt (1991a). "Virtual pitch and phase
+sensitivity of a computer model of the auditory periphery: I
+pitch identification," J. Acoust. Soc. Am.  89, 2866-82.
+Meddis, R. and M. J. Hewitt (1991b). "Virtual pitch and phase
+sensitivity of a computer model of the auditory periphery: II
+phase sensitivity," J. Acoust. Soc. Am. 89, 2883-94.
+Patterson, R.D. (1987b). "A pulse ribbon model of monaural
+phase perception,"  J. Acoust. Soc. Am. 82, 1560-1586.
+Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang,
+C. and Allerhand M. (1992) "Complex sounds and auditory images,"
+In: Auditory physiology and perception, Y Cazals, L. Demany,
+K. Horner (eds), Pergamon, Oxford, 429-446.
+Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
+J. Acoust. Soc. Am.  96, 1409-1418.
+Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
+models." J. Acoust. Soc. Am. 96, 1419-1428.
+Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
+sound quality," in: Advances in Hearing Research: Proceedings of
+the 10th International Symposium on Hearing, G. Manley, G. Klump,
+C. Koppl, H. Fastl, & H. Oeckinghaus, (Eds). World Scientific,
+Singapore, (in press).
+Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain
+modelling of peripheral auditory processing: A modular architecture
+and a software platform," J. Acoust. Soc. Am. 98, (in press).
+Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to
+identify the instrument, the octave, or the pitch-chroma of a
+musical note," Music Perception (in press).
+Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to
+identify vowels, their octave, and their pitch-chroma,"  J. Acoust. Soc.
+Am 98, (in press).
+Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
+Proc. IEEE Int. Conf. Acoust. Speech Signal Processing,
+Albuquerque, New Mexico.
+===========================================================================
+#!/bin/sh
+# script/aimStrobeCriterion
+# Annotated script for generating the figures in docs/aimStrobeCriterion
+echo "FIGURES FOR SECTION 0"
+mv .gennaprc .oldgennaprc # a safety precaution
+mv .gensairc .oldgensairc # a safety precaution
+echo | gennap powc=off -update # make sure that powc is off
+echo | gensai powc=off -update # make sure that powc is off
+echo
+echo "FIGURES FOR SECTION 0"
+echo "Figure 0.1:  Neural Activity Pattern (NAP) of cegc"
+gennap input=cegc_br top=3000 swap=off bits=12 gain_gtf=4 # all default values
+echo "Figure 0.2:  Stabilised Auditory Image (SAI) of cegc"
+gensai stcrit=3 input=cegc_br length=100ms frstep_aid=96ms top=2500
+echo
+echo "FIGURES FOR SECTION 1"
+echo "Figure 1.1  SAI of cegc strobing on every non-zero point in the NAP"
+echo " 			(stcrit_ai=1). This one is slow to calculate."
+gensai stcrit_ai=1 top=17000 input=cegc_br length=100ms frstep_aid=96ms
+# Top has to be raised because this strobe criterion causes constant
+# temporal integration.
+echo "Figure 1.2: SAI via autocorrelation -- a correlogram"
+echo | gennap input=cegc_br display=off length=125ms top=3000 output=stdout > cegc_br_gtf.nap
+#gennap -use start=48 display=on cegc_br_gtf # optional display of the NAP
+# After making a NAP with display=off, gennap -use requires you to set display=on.
+acgram start=50 wid=70ms lag=35ms frames=1 scale=.02 cegc_br_gtf.nap > cegc_gtf.sai
+gensai -use top=5000 input=cegc_gtf
+rm cegc_br_gtf.nap cegc_gtf.sai
+echo
+echo "FIGURES FOR SECTION 2"
+echo "Figure 2.1:  SAI of cegc strobing on the peak of every NAP pulse"
+echo "			(stcrit_ai=2)"
+gensai stcrit_ai=2 top=10000 input=cegc_br length=100ms frstep_aid=96ms
+echo
+echo "FIGURES FOR SECTION 3"
+echo "Demonstration of preservation of asymmetry when stthresh is elevated"
+# Note stthresh only operates when stcrit_ai=1.
+gensai stcrit_ai=1 top=5000 input=cegc_br length=68ms frstep_aid=66ms stthresh_ai=5000
+echo "Figure 3.1:  NAP of cegc with temporal shaddow criterion (stcrit_ai=3)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay cegc_br 1000 100 3 2.5 17000 2000
+# Type 'StrobeCriterionDisplay -help' for a listing of the options and
+# 	their order.
+# Control of Xplots:
+#	Click mouse button 1 to display coordinates of points.
+#	Click mouse button 2 to redraw.
+#	Click mouse button 3 to remove the display (i.e. quit).
+echo
+echo "FIGURES FOR SECTION 4"
+echo "Figure 4.1a:   Waveform of Damped Sinusoid (4 cycles)"
+genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_d swap=on
+echo "Figure 4.2a:   Waveform of Ramped Sinusoid (4 cycles)"
+genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_r swap=on
+echo "Figure 4.1b:   NAP of the Damped Sinusoid (2 cycles)"
+gennap input=dr_f8_t4_d gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > damped.nap
+gennap -use start=50 leng=50 display=on damped
+echo "Figure 4.2b:   NAP of the Ramped Sinusoid (2 cycles)"
+gennap input=dr_f8_t4_r gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > ramped.nap
+gennap -use start=60 leng=50 display=on ramped
+rm damped.nap ramped.nap
+echo "Figure 4.3a:   SAI of the Damped Sinusoid strobing on every NAP peak"
+echo "			(stcrit_ai=2)"
+gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 4.4a:   SAI of the Ramped Sinusoid strobing on every NAP peak"
+echo "			(stcrit_ai=2)"
+gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 4.3b:   SAI of the Damped Sinusoid with temporal shaddow criterion"
+echo "			(stcrit_ai=3)"
+gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=1000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 4.4b:   SAI of the Ramped Sinusoid with temporal shaddow criterion"
+echo "			(stcrit_ai=3)"
+gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 4.3c:   SAI of the Damped Sinusoid with the local max criterion"
+echo "			(stcrit_ai=4)"
+gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 4.4c:   SAI of the Ramped Sinusoid with the local max criterion"
+echo "			(stcrit_ai=4)"
+gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo | gennap swap=on bits=16 gain_gtf=0.0625 -update
+echo | gensai swap=on bits=16 gain_gtf=0.0625 -update
+echo "Figure 4.5a:  NAP of Damped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay dr_f8_t4_d 800 120 3 2.5 14000 2400
+echo "Figure 4.5b:  NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay dr_f8_t4_d 800 120 4 2.5 14000 2400
+echo "Figure 4.6a:  NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay dr_f8_t4_r 800 120 3 2.5 7500 2400
+echo "Figure 4.6b:  NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay dr_f8_t4_r 800 120 4 2.5 7500 2400
+echo
+echo "FIGURES FOR SECTION 5"
+echo
+echo "FIGURES FOR SECTION 6"
+echo "Figure 6.1:  NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
+echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
+StrobeCriterionDisplay dr_f8_t4_r 640 120 3 2.5 7000 2000
+echo "Figure 6.2a:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=2)"
+gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=32000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 6.2b:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=3)"
+gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=10000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5
+echo "Figure 6.2c:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=4)"
+gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=1200 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5

Mercurial > hg > aim92

comparison docs/aimStrobeCriterion @ 0:5242703e91d3 tip