view docs/aimStrobeCriterion @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
line wrap: on
line source
docs/aimStrobeCriterion (text)
scripts/aimStrobeCriterion (figures)


STROBED TEMPORAL INTEGRATION AND THE STABILISED AUDITORY IMAGE

Roy D. Patterson, Jay Datta and Mike Allerhand
MRC Applied Psychology Unit
15 Chaucer Road, Cambridge, CB2 2EF UK

email:  roy.patterson, jay.datta or mike.allerhand  @mrc-apu.cam.ac.uk

2 August 1995


ABSTRACT

	This document describes the Strobed Temporal Integration
mechanism used to convert neural activity patterns into stabilised
auditory images. The specific version of the Auditory Image Model is
AIM R7, as described in Patterson, Allerhand, and Giguere (1995)



INTRODUCTION

	When a periodic sound occurs with a pitch in the musical
range, the cochlea produces a detailed, multi-channel, time-interval
pattern that repeats once per cycle of the wave.  The auditory images
that we hear in response to periodic sounds are perfectly stable.
That is, despite the fact that the level of activity in the neural
activity pattern is fluctuating over a large range within the course
of each cycle, the loudness of the sound is fixed.  This indicates
that some form of temporal integration is applied to the NAP prior to
our initial perception of the sound.  The auditory images of periodic
sounds can have a very rich timbre, or sound quality, that can reveal
a great deal about the sound source such as the quality of the musical
instrument or the finesse of the musician.  This suggests that much of
the detailed time-interval information produced by the cochlea is
preserved in the stabilised auditory image.

	The fact that we hear stable auditory images with rich sound
quality presents auditory theorists with a problem.  The temporal
integration mechanism in traditional auditory models is a low-pass
filter that removes the fine-grain time-interval information from the
internal representation of the sound -- time interval information that
appears to be required for timbre perception.  Strobed temporal
integration was introduced to solve this problem. At one and the same
time, it performs the temporal integration necessary to produce stable
auditory images and it preserved the majority of the time-interval
information observed in the neural activity pattern (NAP) produced by
the cochlea.

	It is not a difficult problem to produce a high-resolution,
stabilised version of the NAP provided you know the moment in time at
which the pattern in the NAP will repeat.  For example, consider the
NAP of the first note of the wave CEGC in Figure 0.1 from Patterson et
al. (1992).  The wave is a train of clicks separated by 8-ms gaps; the
upper channels of the NAP show that the response is a sequence of
filter impulse responses spaced at 8 ms intervals.  A stabilised
representation of the NAP can be produced by setting up an image
buffer that has the same number of channels as the NAP, and simply
transferring a copy of the pattern in each channel of the NAP to the
corresponding channel of the image buffer once every 8 ms.  In the
NAP, the pattern flows from right to left as time progresses, and
since the cycles are continually entering the NAP from the right hand
side and exiting the NAP from the left hand side, the pattern after
every 8 ms is identical to the pattern 8 ms ago.  So if the transfer
from the NAP to the auditory image is performed every 8 ms exactly,
successive contributions from the NAP to the image are all identical.

	In the image buffer, activity does not move from right to
left, it simply decays into the floor exponentially over time with a
half life of about 30 ms.  When a new contribution arrives from the
NAP, it is added point for point with whatever is currently in the
corresponding channel of the image buffer.  In the current example,
after a copy of the NAP arrives in the auditory image, and during the
30 ms over which it would decay to half its original value, three more
copies of the NAP pattern arrive and are added into the auditory
image.  Thus, for typical musical notes and typical vowels, the rate
of temporal integration from the NAP into the auditory image is high
and there is little time between successive integration events for the
image itself to decay.  This is the source of the stability of the
auditory image.

	Provided the integration is performed once per cycle of the
sound, the majority of the time-interval information in the NAP will
be preserved in the auditory image, thereby providing a solution to
the problem of how to produce stable images without removing the
fine-grain time-interval information associated with sound quality.
The auditory image produced by this process is shown in Figure 0.2
from Patterson et al. (1992).  The transfer is performed on each
channel of the NAP separately and it is performed at the point in the
cycle where the activity in the NAP is a maximum.  The maximum of the
most recent cycle to arrive in the NAP is added into the auditory
image at the 0-ms point, and as a result, the NAP peaks are aligned
vertically in the auditory image.  This passive alignment process
explains the loss of global phase information observed empirically
(see Patterson, 1987, for a review).

	Thus it would appear that the problem of converting the
oscillating NAP into a stabilised, high-resolution image reduces to
the problem of finding the pitch of the sound and performing temporal
integration at multiples of the pitch period.  There are now a number
of computational auditory models with a proven ability to extract the
pitch of complex sounds (see Brown and Cook, 1994, for a review) and
they could be used to direct strobed temporal integration.  However,
experiments with vowels (McKeown and Patterson, 1995; Robinson and
Patterson, 1995a) and musical notes (Robinson and Patterson, 1995b)
indicate that 4 to 8 cycles of the sound are required to produce an
accurate estimate of the pitch, whereas the sound quality information
necessary to identify a vowel or a musical instrument can be extracted
from one cycle of the wave.  This suggests, that if the auditory
system does use strobed temporal integration to produce a stable, high
resolution auditory image, it does it with a mechanism that operates
more locally in time than pitch extraction mechanisms.  This is the
background that led to the development of the strobed temporal
integration mechanism in the auditory image model.

	In Sections 1 and 2 of this document, following Allerhand and
Patterson (1992), we describe two simple criteria for selecting strobe
points in the NAP and show that they produce auditory images that are
very similar to the correlograms produced by Assman and Summerfield
(1990), Slaney and Lyon (1990), or Meddis and Hewitt (1991a, 1991b).
The structures that arise in this form of auditory image are much more
symmetric than the corresponding structures in the NAP (Allerhand and
Patterson, 1992).  There is mounting evidence, however, that the
auditory system is highly sensitive to temporal asymmetry (Patterson,
1994a, 1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996),
and so the loss of asymmetry associated with the simple strobe
criterion seems likely to limit the value of this representation of
our perceptions.  In the remaining Sections, an ordered sequence of
restrictions is added to the simple criteria for initiating temporal
integration, to restore asymmetry to the structures that arise in the
auditory image.


1.  Strobe on Every Non-Zero Point in the NAP.  

	The initial criterion is very simple; temporal integration is
initiated on each and every non-zero point in the NAP.  In AIM
software, the option that determines which strobe criterion will be
used is 'stcrit_ai' and it is set equal to one for this simplest
strobe criterion. Allerhand and Patterson (1992) showed that when
temporal integration from the NAP to the auditory image is initiated
on each and every non-zero point in the NAP function, the result is
very similar to a correlogram -- a representation that is commonly
used in time-domain models of hearing to extract the pitch of complex
periodic sounds (see Brown and Cook, 1994, for a review).  For
example, compare the auditory image with stcrit_ai=1 (Figure 1.1) and
the correlogram (Figure 1.2) of the first note of the sound cegc.
Both figures show stabilised representations of the time-interval
pattern that the sound produces in the NAP, and in both cases, the
individual channels have been aligned vertically on the largest peak
in the NAP function.  The patterns in the auditory image and the
correlogram both differ from the pattern in the NAP in one important
way; there is a reflection of the NAP pulses associated with the
ringing of the auditory filters, on the side opposite to where they
originally appear.  That is, autocorrelation and STI with stcrit_ai=1
reduce the temporal asymmetry observed in the NAP.  The asymmetry
information is not entirely removed but it is largely removed.
Experiments with sounds that have asymmetric temporal modulation show
that listeners are sensitive to temporal asymmetry (Patterson, 1994a,
1994b; Akeroyd and Patterson, 1995; Irino and Patterson, 1996), and so
the removal of asymmetry information seems likely to prove a
disadvantage when attempting to explain auditory perception.

	The autocorrelation process is symmetric in time by its very
nature. Mechanical processes that produce sound in the world are
typically asymmetric in time because they usually have some inertia.
Resonators struck impulsively ring after the pulse and not before.
This principle also applies to the processes that analyse the sound in
the auditory system.  The impulse response of the auditory filter
rises faster than it falls; the adaptation process in the inner
haircell adapts up faster at the onset of a sound than it adapts down
after the sound passes.  So asymmetry is the norm in the world and it
is not surprising that the auditory system is sensitive to it.


2.  Strobe on the Peak of Each NAP pulse.

	When temporal integration is initiated on every non-zero NAP
point, the successive NAP functions that are transferred to the
auditory image are highly correlated.  This suggests that we could
attain essentially the same auditory image for vastly less computation
by restricting temporal integration to the larger points on the
individual NAP pulses.  This leads, in turn, to the suggestion that
temporal integration be limited to the peak of the individual NAP
pulses.  The result of this restriction is illustrated in Figure 2.1
which shows the auditory image of the first note of CEGC with this
more restricted strobe criterion.  Since the peak restriction greatly
reduces the rate of temporal integration, the absolute levels of
structures in this form of auditory image are considerably lower than
those in the previous form of image.  The pattern of time intervals,
however, is very similar in the two forms of auditory image.  They
both preserve a detailed representation of the time-interval pattern
in the NAP, and, they both loose much of the asymmetry in the NAP.


3.  Avoid Strobing in the Temporal Shadow after a large NAP Pulse.

	The loss of asymmetry in the click-train structure of the
auditory image, arises when temporal integration is initiated on the
smaller NAP pulses associated with the ringing of the auditory filters
after each click in the train.  This can be demonstrated by
introducing a fixed strobe threshold below which NAP peaks do not
initiate temporal integration, and progressively raising this strobe
threshold to exclude more and more of the lower level NAP pulses.  (In
AIM, a fixed threshold is set with option stthresh_ai and
stcrit_ai=1.)  The auditory image becomes less and less symmetric and
more and more like the original NAP pattern for the click train as the
strobe threshold is increased.  Fixed thresholds of this sort are not
realistic for simulating the operation of auditory system, firstly
because the strobe threshold eventually exceeds the largest NAP pulse
and temporal integration ceases entirely, and secondly because, in the
natural environment, the levels of sounds are constantly changing.
Nevertheless, the example illustrates how NAP asymmetry is lost with
simple strobe criteria.  The problem with autocorrelation is similar;
the correlation values at lags associated with the smaller NAP pulses
introduce symmetric reflections into structure that appear in the
correlogram.

	An alternative means of restricting temporal integration to
the larger pulses in the NAP of the click train is to use an adaptive
strobe threshold which is temporally asymmetric.  In the simplest
case, when the strobe unit monitoring a NAP channel encounters a
pulse, strobe threshold is set to the full height of the NAP pulse.
But following the peak threshold does not fall as fast as the NAP
function, rather it is restricted to decaying at a fixed percentage of
the peak height per ms.  In AIM, the rate of decay is set to 5% per
ms, so the threshold decays faster after larger peaks, and in the
absence of further NAP peaks, returns to 0 in 20 ms.  The NAP function
for the 1.0-kHz channel of the NAP is presented in Figure 3.1 along
with the adaptive threshold function. Together they illustrate what is
referred to as the "temporal shadow criterion" for strobed temporal
integration.

	In the figure, the vertical lines below the abscissa of the
NAP function mark the NAP pulses that initiate temporal integration.
They show that the first NAP pulse strobes temporal integration and
strobe threshold is set to the peak height.  It immediately begins to
decay, but then it encounters another NAP pulse that exceeds strobe
threshold and so the process of strobing temporal integration and
raising strobe threshold is promptly repeated.  At this point,
however, strobe threshold is high relative to the NAP pulses and,
strobe threshold is falling more slowly than the NAP pulses, so the
algorithm proceeds through the rest of the cycle without encountering
another NAP pulse from the ringing part of the NAP function.  In this
way, the strobe mechanism is synchronised to the period of the sound
even though no explicit information about the pitch of the sound is
provided to the strobe mechanism.  It is the auditory image with the
temporal shaddow criterion that was presented originally in Figure
0.2. (stcrit_ai=3).

	The 'temporal shadow criterion' produces stable auditory
images with accurate, asymmetry for a wide variety of naturally
occurring sounds like vowels and musical notes.  The reason is that
the NAPs of these sounds have a restricted range of periods and within
those periods the asymmetry is typically characterised by the
rapid-rise/slow-fall form.  There are, however, periodic sounds with
very low pitch and NAP functions that rise slowly over the course of
the period and fall rapidly at the end of the period, and the
perceptions produced by these sounds indicate that the auditory strobe
mechanism is somewhat more sophisticated than the temporal shadow
strobe mechanism.  These "ramped" sounds are the subject of the next
section.


4.  Avoid Temporal Integration on NAP Peaks Followed by Larger NAP Peaks.

	A pair of the sounds that illustrate the limitations of the
temporal shadow criterion are presented in Figures 4.1a and 4.2a; the
former is an exponentially damped sinusoid that repeats every 25-ms,
the latter is an exponentially ramped sinusoid with the same envelope
period.  The carrier frequency in this case is 800 Hz and the half
life of the exponential is 4-ms.  The half life is on the same order
as the exponential decay of the impulse response of a gammatone
auditory filter with a centre frequency in the region of 800 Hz.  The
example is taken from Patterson (1994a).

	The neural activity patterns produced by the damped and ramped
sinusoids are shown in Figures 4.1b and 4.2b, respectively.  The
frequency range of the filterbank is from an octave below the carrier
frequency to an octave above the carrier frequency.  The highest and
lowest channels in Figure 4.1b show the transient response of the
filterbank to the onset of the damped sinusoid, and similarly the
high- and low-frequency channels in Figure 4.2b show the transient
response of the filterbank to the offset of the ramped sinusoid.  In
the high-frequency channels, the onset response of the damped sinusoid
and the offset response of the ramped sinusoid are composed of impulse
responses from the individual auditory filters.  The centre section of
each figure shows the response to the carrier.  Here we see that the
asymmetry in the waveform is preserved in the NAP: in Figure 4.1b, the
carrier component is at its highest level just as the transient
response ends and the carrier component decays away over the course of
the period; in Figure 4.2b, the carrier activity rises over the course
of the ramped cycle and ends at its peak level in the transient
response.

	Auditory images of these damped and ramped sinusoids are
presented in Figures 4.3 and 4.4, respectively.  The upper rows show
the images obtained when the strobe initiates temporal integration on
every peak in the NAP; the middle rows show the images obtained with
the temporal shadow criterion.  The images in the upper row illustrate
the problem of preserving NAP asymmetry during temporal integration.
When the mechanism strobes on every peak, the temporal asymmetry
observed in the NAP of the damped sinusoid is actually reversed in the
auditory image of the damped sinusoid (Figure 4.3a).  In the case of
the ramped sinusoid, the asymmetry observed in the NAP is largely lost
in the image of the ramped sinusoid (Figure 4.4a); there is activity
at all time intervals in the central channels, whereas there is a gap
in activity in the NAP of the ramped sinusoid, once per cycle, just
after the abrupt reduction in amplitude.  It is also the case that
there are irregular fringes along the edges of the main structure in
the auditory image of the ramped sinusoid (Figure 4.4a).  This
provides further evidence that the time interval pattern in the NAP is
being disrupted by the temporal integration process in the
construction of the auditory image.

	The introduction of the temporal shadow criterion for
initiating temporal integration produces a dramatic improvement in the
auditory image of the damped sinusoid (Figure 4.3b).  The structure in
the image is highly asymmetric and, once the alignment process is
taken into account, the structure in the image is seen to be a very
faithful reproduction of that in the NAP.  The imposition of the
temporal shadow criterion improves the auditory image of the ramped
sound (Figure 4.4b). in as much as it eliminates the fringes seen in
Figure 4.4a.  But it does not solve the asymmetry problem.  The
structure in the auditory image of Figure 4.4a is still more symmetric
than it is asymmetric, whereas the structure in the corresponding NAP
is highly asymmetric.

	The source of the problem is illustrated in Figures 4.5a and
4.6a which show the NAPs and adaptive thresholds for 80-ms segments of
the damped and ramped sinusoids, respectively.  The vertical markers
below the abscissa in Figure 4.5a show that after the first cycle, the
strobe mechanism is synchronised to the period of the wave and
initiates temporal integration once per cycle on the largest NAP peak.
So this criterion preserves the asymmetry of the damped sound in its
auditory image.  In contrast, Figure 4.6a shows that on the way up the
ramped portion of each cycle, the rising NAP pulses repeatedly exceed
the adaptive threshold resulting in repeated initiation of temporal
integration.  Since, in this region of the cycle, the mechanism
initiates temporal integration on every cycle, the auditory image does
not preserve the asymmetry observed in the corresponding NAP.  The
irregular fringe is reduced because the mechanism reliably skips the
portion of the cycle where the level of activity in the NAP is
changing most rapidly.

	The high rate of strobing revealed in Figure 4.6a means that
the level of activity in the ramped auditory image of Figure 4.4b is
considerably greater than that in the damped image (Figure 4.3b).  It
does not show in those Figures because they have been normalised for
display purposes.  In terms of the auditory model, however, the
greater overall level in the image of the ramped sound would lead to
the prediction that ramped sounds are considerably louder than damped
sounds, and this is not the case; they have roughly equal loudness.
All of these observations taken together suggest that the strobe rate
should be limited and that the limitation should favour larger NAP
peaks, closer to the local maximum.

	The solution in this case is to delay temporal integration a
few milliseconds after each suprathreshold NAP pulse, to determine
whether another, larger, NAP pulse is about to occur.  Specifically,
when a NAP peak is identified, it is labeled as a potential strobe
point, but the initiation of temporal integration is delayed for
several milliseconds.  In AIM, the value is set with option
'stlag_ai'.  If, during this time, no new larger NAP pulses are
encountered, the candidate strobe point is used to initiate temporal
integration.  If a larger NAP pulse is encountered, it becomes the new
strobe candidate and replaces the previous strobe candidate, the
strobe lag is reset to stlag_ai ms and the process begins again.  The
auditory images of damped and ramped sinusoids produced with this
'local-max' strobe criterion are shown in Figures 4.3c and 4.4c,
respectively.  The strobe lag restriction has virtually no effect on
the auditory image of the damped sinusoid, but it improves the image
of the ramped sinusoid markedly.  The asymmetry observed in the NAP of
the ramped sinusoid is now preserved in its auditory image.

	The NAP functions and the adaptive thresholds for the damped
and ramped sinusoids are shown in Figures 4.5b and 4.6b, respectively.
A comparison of the strobe points for the damped sinusoid under the
temporal shadow criterion (Figure 4.5a) and the local max criterion
(Figure 4.5b) shows that there is one small difference; the very first
strobe point under the temporal shadow criterion is omitted under the
local max criterion because a larger NAP pulse follows it within
stlag_ai ms.  So the second NAP pulse replaces the first as the strobe
candidate.  In the case of the ramped sinusoid, shifting to the local
max criterion has a dramatic effect.  The NAP functions and adaptive
thresholds in Figures 4.6a and 7.6b are identical, but most of the
strobe points identified under the temporal shadow criterion (Figure
4.6a) are immediately followed by larger NAP pulses as we proceed up
the ramp.  As a result the majority of the candidate pulses are
repressed in favour of the one that occurs at the offset of the ramp.
So, with the exception of the onset of the sound, the mechanism
synchronises to the period of the sound and there is one strobe per
cycle of the sound.  The local max criterion also leads to damped and
ramped auditory images with roughly the same level of activity in the
auditory image, and so it is also a better predictor of the loudness
of these sounds. Finally, note that the strobe lag restricts the
maximum strobe rate of the mechanism. This is important because,
without it, the level of a sinusoid would increase with its frequency
in the auditory image.


5.  Limiting the Lag of the Local Max Criterion.

	In the second experiment with damped and ramped sinusoids
(Patterson, 1994b), the longest envelope period was 100-ms, and in
that condition, the distinction between damped and ramped sinusoids is
audible for half lives as long as 64 ms.  In channels near the carrier
frequency, the NAP function produced by the ramped sinusoid is a long,
slowly rising, sequence of peaks.  The local-max strobe criterion
delays temporal integration to the end of the ramp and initiates
temporal integration once per cycle, as previously, with the 25-ms
envelope stimuli.  The example, however, raises the question of what
would happen in the case of a very long duration slowly rising tone,
say a tone that rises from absolute threshold to 80 dB SPL over the
course of 5 seconds.  A listener would undoubtedly hear the sound
shortly after it comes on, and hear its loudness increase
progressively over the course of the 5-second rise.  The local-max
strobe mechanism would initiate temporal integration once, shortly
after the onset of the sound, because of overshoot in the neural
encoding stage of AIM. But thereafter, it would suppress temporal
integration throughout the rise of the NAP function and strobe once at
the end of the rise.  Thus the auditory image would be empty at a time
when we know the listener would hear the tone.  To solve this problem,
the strobe lag of the local max mechanism is limited to twice the
stlag_ai value; that is, after a NAP pulse becomes a strobe candidate,
either that NAP pulse or a larger one must initiate temporal
integration within the next 2*stlag_ai ms. So the strobe lag restricts
not only the maximum strobe rate for static sinusoids, but also the
minimum strobe rate for slowly increasing sinusoids.


6. Aperiodic Strobing and Irregularity in the Auditory Image.

	To this point, the discussion of strobe criteria has focussed
on activity in the carrier channel of the NAP and auditory image, and
the relationship between strobe criteria and the preservation of NAP
asymmetry through temporal integration.  It was noted in passing,
that, away from the carrier channel, auditory images of ramped sounds
have fringes of irregular activity, for all strobe criteria prior to
the local max criterion.  We might expect such fringes to impart a
roughness or noisy quality to the perception of ramped sounds, but
typically they are static and clear.  In this final Section, the
activity produced by a ramped sinusoid in the 640 Hz channel of the
NAP and auditory image is examined, to illustrate the relationship
between strobe restrictions and the fringe of irregularity in the
auditory image.

	The NAP produced in the 640 Hz channel of the filterbank by a
ramped sinusoid with an 800-Hz carrier, a 25-ms envelope period, and a
4-ms half life is shown in Figure 6.1.  The level of the ramped
sinusoid rises rapidly, relative to the decay rate of the impulse
response of the auditory filter and, as a result, the activity in the
rising part of the NAP is dominated by carrier-period time intervals
(Patterson, 1994a).  When the amplitude of the ramped sinusoid drops
abruptly, the energy stored in the filter decays away in a wave with
periods appropriate to the centre frequency of the channel.  Now
consider the activity produced by this NAP in the 640-Hz channel of
the auditory image for strobe criteria 2, 3 and 4, the 'every peak',
'temporal shaddow,' and 'local max' criteria, respectively.

	Figure 6.2a shows the case where there is no adaptive
threshold and the mechanism strobes on the peak of every NAP pulse.
This is the version of STI most similar to autocorrelation.  Strobing
on every peak causes carrier periods from the ramp to be mixed with
centre-frequency periods after the offset of the ramp.  This is the
source of the irregularity in Fig. 6.2a, and the source of the
irregular fringe in the full auditory image (Fig. 4.5a) (Allerhand and
Patterson, 1992).

	The activity produced with the temporal shadow criterion is
shown in the Figure 6.2b. The adaptive threshold function and the
strobe points shown with the NAP in Fig. 6.1 were generated with the
temporal shaddow criterion.  In this case, the mechanism initiates
temporal integration on each peak in the ramped portion of the NAP,
but it skips the peaks associated with the ringing of the filter after
the ramp terminates.  Strobing occurs in synchrony with the carrier
periods in the ramped portion of the NAP and this removes the
irregularity from the ramped portion of the auditory image between 0
ms and about 10 ms.  There is still irregularity in the region from 0
to -10 ms, and in the region from 25 to 15 ms, because strobing in
synchrony with the carrier period mixes carrier periods and centre
frequency periods in this region of the image.

	A further improvement occurs when the local max criterion is
introduced and strobing on successive carrier periods of the ramped
section of the NAP is suppressed.  The activity in the 640-Hz channel
of the image is shown in Figure 6.2c.  The irregular activity has been
removed; the image shows carrier periods to the left of the 0-ms point
and centre frequency periods to the right of the 0-ms point.  Thus,
strobing on local maxima synchronises temporal integration to the
period of the wave and preserves not only the basic asymmetry of the
NAP, but also the contrasting time interval patterns associated with
different sections of the NAP cycle. 



REFERENCES

Akeroyd, M.A. and Patterson, R.D. (1995). "Discrimination of wideband
   noises modulated by a temporally asymmetric function,"
   J. Acoust. Soc. Am. (in press).

Assman, P. F. and Q. Summerfield (1990). "Modelling the perception of
   concurrent vowels: Vowels with different fundamental frequencies,"
   J. Acoust. Soc. Am. 88, 680-697.

Brown, G.J. and Cooke, M. (1994). "Computational auditory scene
   analysis," Computer Speech and Language 8, 297-336.

Irino, T. and Patterson, R.D. (1996). "Temporal asymmetry in the
   auditory system," J. Acoust. Soc. Am. (revision submitted
   August 95).

McKeown, D. and Patterson, R.D. (1995). "The time course of auditory
   segregation: concurrant vowels that vary in duration,"
   J. Acoust. Soc. Am. (in press).

Meddis, R. and M. J. Hewitt (1991a). "Virtual pitch and phase
   sensitivity of a computer model of the auditory periphery: I
   pitch identification," J. Acoust. Soc. Am.  89, 2866-82.

Meddis, R. and M. J. Hewitt (1991b). "Virtual pitch and phase
   sensitivity of a computer model of the auditory periphery: II
   phase sensitivity," J. Acoust. Soc. Am. 89, 2883-94.

Patterson, R.D. (1987b). "A pulse ribbon model of monaural
   phase perception,"  J. Acoust. Soc. Am. 82, 1560-1586.

Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang,
   C. and Allerhand M. (1992) "Complex sounds and auditory images,"
   In: Auditory physiology and perception, Y Cazals, L. Demany,
   K. Horner (eds), Pergamon, Oxford, 429-446.

Patterson, R.D. (1994a). "The sound of a sinusoid: Spectral models,"
   J. Acoust. Soc. Am.  96, 1409-1418.

Patterson, R.D. (1994b). "The sound of a sinusoid: Time-interval
   models." J. Acoust. Soc. Am. 96, 1419-1428.

Patterson, R.D. and Akeroyd, M. A. (1995). "Time-interval patterns and
   sound quality," in: Advances in Hearing Research: Proceedings of
   the 10th International Symposium on Hearing, G. Manley, G. Klump,
   C. Koppl, H. Fastl, & H. Oeckinghaus, (Eds). World Scientific,
   Singapore, (in press).

Patterson, R.D., Allerhand, M., and Giguere, C., (1995). "Time-domain
   modelling of peripheral auditory processing: A modular architecture
   and a software platform," J. Acoust. Soc. Am. 98, (in press).

Robinson, K.L. & Patterson, R.D. (1995a) "The duration required to
   identify the instrument, the octave, or the pitch-chroma of a
   musical note," Music Perception (in press).

Robinson, K.L. & Patterson, R.D. (1995b) "The stimulus duration required to
   identify vowels, their octave, and their pitch-chroma,"  J. Acoust. Soc.
   Am 98, (in press).

Slaney, M. and Lyon, R.F. (1990). "A perceptual pitch detector," in
   Proc. IEEE Int. Conf. Acoust. Speech Signal Processing,
   Albuquerque, New Mexico.




===========================================================================
#!/bin/sh

# script/aimStrobeCriterion
# Annotated script for generating the figures in docs/aimStrobeCriterion

echo "FIGURES FOR SECTION 0"

mv .gennaprc .oldgennaprc # a safety precaution
mv .gensairc .oldgensairc # a safety precaution
echo | gennap powc=off -update # make sure that powc is off
echo | gensai powc=off -update # make sure that powc is off 

echo
echo "FIGURES FOR SECTION 0"
echo "Figure 0.1:  Neural Activity Pattern (NAP) of cegc"
gennap input=cegc_br top=3000 swap=off bits=12 gain_gtf=4 # all default values

echo "Figure 0.2:  Stabilised Auditory Image (SAI) of cegc"
gensai stcrit=3 input=cegc_br length=100ms frstep_aid=96ms top=2500

echo
echo "FIGURES FOR SECTION 1"

echo "Figure 1.1  SAI of cegc strobing on every non-zero point in the NAP"
echo " 			(stcrit_ai=1). This one is slow to calculate."
gensai stcrit_ai=1 top=17000 input=cegc_br length=100ms frstep_aid=96ms 

# Top has to be raised because this strobe criterion causes constant 
# temporal integration.


echo "Figure 1.2: SAI via autocorrelation -- a correlogram"
echo | gennap input=cegc_br display=off length=125ms top=3000 output=stdout > cegc_br_gtf.nap
#gennap -use start=48 display=on cegc_br_gtf # optional display of the NAP
# After making a NAP with display=off, gennap -use requires you to set display=on.

acgram start=50 wid=70ms lag=35ms frames=1 scale=.02 cegc_br_gtf.nap > cegc_gtf.sai
gensai -use top=5000 input=cegc_gtf

rm cegc_br_gtf.nap cegc_gtf.sai

echo
echo "FIGURES FOR SECTION 2"

echo "Figure 2.1:  SAI of cegc strobing on the peak of every NAP pulse" 
echo "			(stcrit_ai=2)"
gensai stcrit_ai=2 top=10000 input=cegc_br length=100ms frstep_aid=96ms 

echo
echo "FIGURES FOR SECTION 3"

echo "Demonstration of preservation of asymmetry when stthresh is elevated"
# Note stthresh only operates when stcrit_ai=1.
gensai stcrit_ai=1 top=5000 input=cegc_br length=68ms frstep_aid=66ms stthresh_ai=5000

echo "Figure 3.1:  NAP of cegc with temporal shaddow criterion (stcrit_ai=3)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay cegc_br 1000 100 3 2.5 17000 2000

# Type 'StrobeCriterionDisplay -help' for a listing of the options and
# 	their order.
# Control of Xplots:
#	Click mouse button 1 to display coordinates of points.
#	Click mouse button 2 to redraw.
#	Click mouse button 3 to remove the display (i.e. quit).

echo
echo "FIGURES FOR SECTION 4"

echo "Figure 4.1a:   Waveform of Damped Sinusoid (4 cycles)"
genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_d swap=on

echo "Figure 4.2a:   Waveform of Ramped Sinusoid (4 cycles)"
genwav top=14000 bottom=-14000 length=100ms input=dr_f8_t4_r swap=on 

echo "Figure 4.1b:   NAP of the Damped Sinusoid (2 cycles)"
gennap input=dr_f8_t4_d gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > damped.nap
gennap -use start=50 leng=50 display=on damped

echo "Figure 4.2b:   NAP of the Ramped Sinusoid (2 cycles)"
gennap input=dr_f8_t4_r gain_gtf=0.0626 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=110ms output=stdout display=off > ramped.nap
gennap -use start=60 leng=50 display=on ramped

rm damped.nap ramped.nap

echo "Figure 4.3a:   SAI of the Damped Sinusoid strobing on every NAP peak"
echo "			(stcrit_ai=2)"
gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5

echo "Figure 4.4a:   SAI of the Ramped Sinusoid strobing on every NAP peak"
echo "			(stcrit_ai=2)"
gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=7000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=2  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 

echo "Figure 4.3b:   SAI of the Damped Sinusoid with temporal shaddow criterion"
echo "			(stcrit_ai=3)"
gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=1000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 

echo "Figure 4.4b:   SAI of the Ramped Sinusoid with temporal shaddow criterion"
echo "			(stcrit_ai=3)"
gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=2000 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=3  pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 

echo "Figure 4.3c:   SAI of the Damped Sinusoid with the local max criterion"
echo "			(stcrit_ai=4)"
gensai input=dr_f8_t4_d gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5

echo "Figure 4.4c:   SAI of the Ramped Sinusoid with the local max criterion"
echo "			(stcrit_ai=4)"
gensai input=dr_f8_t4_r gain_gtf=0.0625 bits=16 top=800 mincf=400 maxcf=1600 swap=on length=140ms frstep_aid=135ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5

echo | gennap swap=on bits=16 gain_gtf=0.0625 -update 
echo | gensai swap=on bits=16 gain_gtf=0.0625 -update 


echo "Figure 4.5a:  NAP of Damped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay dr_f8_t4_d 800 120 3 2.5 14000 2400

echo "Figure 4.5b:  NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay dr_f8_t4_d 800 120 4 2.5 14000 2400

echo "Figure 4.6a:  NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay dr_f8_t4_r 800 120 3 2.5 7500 2400

echo "Figure 4.6b:  NAP of Damped Sinusoid, local max criterion (stcrit_ai=4)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay dr_f8_t4_r 800 120 4 2.5 7500 2400

echo
echo "FIGURES FOR SECTION 5"

echo
echo "FIGURES FOR SECTION 6"

echo "Figure 6.1:  NAP of Ramped Sinusoid, temporal shaddow criterion (stcrit_ai=3)"
echo "	Single Channel NAP with Strobe Threshold and Strobe Points below NAP"
StrobeCriterionDisplay dr_f8_t4_r 640 120 3 2.5 7000 2000

echo "Figure 6.2a:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=2)" 
gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=32000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=2 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5

echo "Figure 6.2b:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=3)" 
gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=10000 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=3 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5 
echo "Figure 6.2c:  SAI of Ramped Sinusoid in channel centred on 640Hz (stcrit_ai=4)" 
gensai input=dr_f8_t4_r swap=on gain_gtf=0.0625 bits=16 top=1200 mincf=640Hz chan=1 start=10ms length=110ms frstep_aid=100ms stcrit=4 pwid=30ms nwid=-10ms stlag=10ms stdecay=2.5