comparison man/man1/gensai.1 @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:5242703e91d3
1 .TH GENSAI 1 "26 May 1995"
2 .LP
3 .SH NAME
4 .LP
5 gensai \- generate stabilised auditory image
6 .LP
7 .SH SYNOPSIS/SYNTAX
8 .LP
9 gensai [ option=value | -option ] filename
10 .LP
11 .SH DESCRIPTION
12 .LP
13
14 Periodic sounds give rise to static, rather than oscillating,
15 perceptions indicating that temporal integration is applied to the NAP
16 in the production of our initial perception of a sound -- our auditory
17 image. Traditionally, auditory temporal integration is represented by
18 a simple leaky integration process and AIM provides a bank of lowpass
19 filters to enable the user to generate auditory spectra (Patterson,
20 1994a) and auditory spectrograms (Patterson et al., 1992b). However,
21 the leaky integrator removes the phase-locked fine structure observed
22 in the NAP, and this conflicts with perceptual data indicating that
23 the fine structure plays an important role in determining sound
24 quality and source identification (Patterson, 1994b; Patterson and
25 Akeroyd, 1995). As a result, AIM includes two modules which preserve
26 much of the time-interval information in the NAP during temporal
27 integration, and which produce a better representation of our auditory
28 images. In the functional version of AIM, this is accomplished with
29 strobed temporal integration (Patterson et al., 1992a,b), and this is
30 the topic of this manual entry.
31
32 .LP
33
34 In the physiological version of AIM, the auditory image is constructed
35 with a bank of autocorrelators (Slaney and Lyon, 1990; Meddis and
36 Hewitt, 1991). The autocorrelation module is an aimTool rather than
37 an integral part of the main program 'gen'. The appropriate tool is
38 'acgram'. Type 'manaim acgram' for the documentation. The module
39 extracts periodicity information and preserves intra-period fine
40 structure by autocorrelating each channel of the NAP separately. The
41 correlogram is the multi-channel version of this process. It was
42 originally introduced as a model of pitch perception (Licklider,
43 1951). It is not yet known whether STI or autocorrelation is more
44 realistic, or more efficient, as a means of simulating our perceived
45 auditory images. At present, the purpose is to provide a software
46 package that can be used to compare these auditory representations in
47 a way not previously possible.
48
49 .RE
50 .LP
51 .SH STROBED TEMPORAL INTEGRATION
52 .PP
53
54 In strobed temporal integration, a bank of delay lines is used to form
55 a buffer store for the NAP, one delay line per channel, and as the NAP
56 proceeds along the buffer it decays linearly with time, at about 2.5
57 %/ms. Each channel of the buffer is assigned a strobe unit which
58 monitors activity in that channel looking for local maxima in the
59 stream of NAP pulses. When one is found, the unit initiates temporal
60 integration in that channel; that is, it transfers a copy of the NAP
61 at that instant to the corresponding channel of an image buffer and
62 adds it point-for-point with whatever is already there. The local
63 maximum itself is mapped to the 0-ms point in the image buffer. The
64 multi-channel version of this STI process is AIM's representation of
65 our auditory image of a sound. Periodic and quasi-periodic sounds
66 cause regular strobing which leads to simulated auditory images that
67 are static, or nearly static, but with the same temporal resolution as
68 the NAP. Dynamic sounds are represented as a sequence of auditory
69 image frames. If the rate of change in a sound is not too rapid, as is
70 diphthongs, features are seen to move smoothly as the sound proceeds,
71 much as objects move smoothly in animated cartoons.
72
73 .LP
74 It is important to emphasise, that the triggering done on a
75 channel by channel basis and that triggering is asynchronous
76 across channels, inasmuch as the major peaks in one channel occur
77 at different times from the major peaks in other channels. It
78 is this aspect of the triggering process that causes the
79 alignment of the auditory image and which accounts for the loss
80 of phase information in the auditory system (Patterson, 1987).
81
82 .LP
83
84 The auditory image has the same vertical dimension as the neural
85 activity pattern (filter centre frequency). The continuous time
86 dimension of the neural activity pattern becomes a local,
87 time-interval dimension in the auditory image; specifically, it is
88 "the time interval between a given pulse and the succeeding strobe
89 pulse". In order to preserve the direction of asymmetry of features
90 that appear in the NAP, the time-interval origin is plotted towards
91 the right-hand edge of the image, with increasing, positive time
92 intervals proceeding to towards the left.
93
94 .LP
95 .SH OPTIONS
96 .LP
97 .SS "Display options for the auditory image"
98 .PP
99
100 The options that control the positioning of the window in which the
101 auditory image appears are the same as those used to set up the
102 earlier windows, as are the options that control the level of the
103 image within the display. In addition, there are three new options
104 that are required to present this new auditory representation. The
105 options are frstep_aid, pwid_aid, and nwid_aid; the suffix "_aid"
106 means "auditory image display". These options are described here
107 before the options that control the image construction process itself,
108 as they occur first in the options list. There are also three extra
109 display options for presenting the auditory image in its spiral form;
110 these options have the suffix "_spd" for "spiral display"; they are
111 described in the manual entry for 'genspl'.
112
113 .LP
114 .TP 17
115 frstep_aid
116 The frame step interval, or the update interval for the auditory image display
117 .RS
118 Default units: ms. Default value: 16 ms.
119 .RE
120 .RS
121
122 Conceptually, the auditory image exists continuously in time. The
123 simulation of the image produced by AIM is not continuous; rather it
124 is like an animated cartoon. Frames of the cartoon are calculated at
125 discrete points in time, and then the sequence of frames is replayed
126 to reveal the dynamics of the sound, or the lack of dynamics in the
127 case of periodic sounds. When the sound is changing at a rate where
128 we hear smooth glides, the structures in the simulated auditory image
129 move much like objects in a cartoon. frstep_aid determines the time
130 interval between frames of the auditory image cartoon. Frames are
131 calculated at time zero and integer multiples of segment_sai.
132
133 .RE
134
135 The default value (16 ms) is reasonable for musical sounds and speech
136 sounds. For a detailed examination of the development of the image of
137 brief transient sounds frstep_aid should be decreased to 4 or even 2
138 ms.
139 .LP
140 .TP 16
141 pwidth_sai
142
143 The maximum positive time interval presented in the display of the
144 auditory image (to the left of 0 ms).
145
146 .RS
147 Default units: ms. Default value: 35 ms.
148 .RE
149 .LP
150 .TP 16
151 nwidth_sai
152
153 The maximum negative time interval presented in the display of the
154 auditory image (to the right of 0 ms).
155
156 .RS
157 Default units: ms. Default value: -5 ms.
158 .RE
159
160 .LP
161 .TP 12
162 animate
163 Present the frames of the simulated auditory image as a cartoon.
164 .RS
165 Switch. Default off.
166 .RE
167 .RS
168
169 With reasonable resolution and a reasonable frame rate, the auditory
170 cartoon for a second of sound will require on the order of 1 Mbyte of
171 storage. As a result, auditory cartoons are only stored at the
172 specific request of the user. When the animate flag is set to `on',
173 the bit maps that constitute the frames the auditory cartoon are
174 stored in computer memory. They can then be replayed as an auditory
175 cartoon by pressing `carriage return'. To exit the instruction, type
176 "q" for `quit' or "control c". The bit maps are discarded unless
177 option bitmap=on.
178
179 .RE
180 .LP
181 .SS "Storage options for the auditory image "
182 .PP
183
184 A record of the auditory image can be stored in two ways depending on
185 the purpose for which it is stored. The actual numerical values of
186 the auditory image can be stored as previously, by setting output=on.
187 In this case, a file with a .sai suffix will be created in accordance
188 with the conventions of the software. These values can be recalled
189 for further processing with the aimTools. In this regard the SAI
190 module is like any previous module.
191
192 .LP
193 It is also possible to store the bit maps which are displayed on
194 the screen for the auditory image cartoon. The bit maps require
195 less storage space and reload more quickly, so this is the
196 preferred mode of storage when one simply wants to review the
197 visual image.
198 .LP
199 .TP 10
200 bitmap
201 Produce a bit-map storage file
202 .RS
203 Switch. Default value: off.
204 .RE
205 .RS
206
207 When the bitmap option is set to `on', the bit maps are stored in a
208 file with the suffix .ctn. The bitmaps are reloaded into memory using
209 the commands review, or xreview, followed by the file name without the
210 suffix .ctn. The auditory image can then be replayed, as with animate,
211 by typing `carriage return'. xreview is the newer and preferred
212 display routine. It enables the user to select subsets of the cartoon
213 and to change the rate of play via a convenient control window.
214
215
216
217 .LP
218 The strobe mechanism is relatively simple. A trigger threshold
219 value is maintained for each channel and when a NAP pulse exceeds
220 the threshold a trigger pulse is generated at the time associated
221 with the maximum of the peak. The threshold value is then reset
222 to a value somewhat above the height of the current NAP peak and
223 the threshold value decays exponentially with time thereafter.
224
225
226
227 There are six options with the suffix "_ai", short for
228 'auditory image'. Four of these control STI itself -- stdecay_ai,
229 stcrit_ai, stthresh_ai and decay_ai. The option stinfo_ai is a switch
230 that causes the software to produce information about the current STI
231 analysis for demonstration or diagnostic purposes. The final option,
232 napdecay_ai controls the decay rate for the NAP while it flows down
233 the NAP buffer.
234
235 .LP
236 .TP 17
237 napdecay_ai
238 Decay rate for the neural activity pattern (NAP)
239 .RS
240 Default units: %/ms. Default value 2.5 %/ms.
241 .RE
242 .RS
243
244 napdecay_ai determines the rate at which the information in the neural
245 activity pattern decays as it proceeds along the auditory buffer that
246 stores the NAP prior to temporal integration.
247 .RE
248
249
250 .LP
251 .TP 16
252 stdecay_ai
253 Strobe threshold decay rate
254 .RS
255 Default units: %/ms. Default value: 5 %/ms.
256 .RE
257 .RS
258 stdecay_sai determines the rate at which the strobe threshold decays.
259 .RE
260 .LP
261 General purpose pitch mechanisms based on peak picking are
262 notoriously difficult to design, and the trigger mechanism just
263 described would not work well on an arbitrary acoustic waveform.
264 The reason that this simple trigger mechanism is sufficient for
265 the construction of the auditory image is that NAP functions are
266 highly constrained. The microstructure reveals a function that
267 rises from zero to a local maximum smoothly and returns smoothly
268 back to zero where it stays for more than half of a period of the
269 centre frequency of that channel. On the longer time scale, the
270 amplitude of successive peaks changes only relatively slowly with
271 respect to time. As a result, for periodic sounds there tends
272 to be one clear maximum per period in all but the lowest channels
273 where there is an integer number of maxima per period. The
274 simplicity of the NAP functions follows from the fact that the
275 acoustic waveform has passed through a narrow band filter and so
276 it has a limited number of degrees of freedom. In all but the
277 highest frequency channels, the output of the auditory filter
278 resembles a modulated sine wave whose frequency is near the
279 centre frequency of the filter. Thus the neural activity pattern
280 is largely restricted to a set of peaks which are modified
281 versions of the positive halves of a sine wave, and the remaining
282 degrees of freedom appear as relatively slow changes in peak
283 amplitude and relatively small changes in peak time (or phase).
284 .LP
285 .LP
286 When the acoustic input terminates, the auditory image must
287 decay. In the ASP model the form of the decay is exponential and
288 the decay rate is determined by decayrate_sai.
289 .LP
290 .TP 18
291 decay_ai
292 SAI decay time constant
293 .RS
294 Default units: ms. Default value 30 ms.
295 .RE
296 .RS
297 decay_ai determines the rate at which the auditory image decays.
298 .RE
299 .RS
300
301 In addition, decay_ai determines the rate at which the strength of the
302 auditory image increases and the level to which it asymptotes if the
303 sound continues indefinitely. In an exponential process, the asymptote
304 is reached when the increment provided by each new cycle of the sound
305 equals the amount that the image decays over the same period.
306
307 .RE
308 .SH MOTIVATION
309 .LP
310 .SS "Auditory temporal integration: The problem "
311 .PP
312 Image stabilisation and temporal smearing.
313 .LP
314 When the input to the auditory system is a periodic sound like
315 pt_8ms or ae_8ms, the output of the cochlea is a rapidly flowing
316 neural activity pattern on which the information concerning the
317 source repeats every 8 ms. Consider the display problem that
318 would arise if one attempted to present a one second sample of
319 either pt_8ms or ae_8ms with the resolution and format of Figure
320 5.2. In that figure each 8 ms period of the sound occupies about
321 4 cm of width. There are 125 repetitions of the period in one
322 second and so a paper version of the complete NAP would be 5
323 metres in length. If the NAP were presented as a real-time flow
324 process, the paper would have to move past a typical window at
325 the rate of 5 metres per second! At this rate, the temporal
326 detail within the cycle would be lost. The image would be stable
327 but the information would be reduced to horizontal banding. The
328 fine-grain temporal information is lost because the integration
329 time of the visual system is long with respect to the rate of
330 flow of information when the record is moving at 5 metres a
331 second.
332 .LP
333 Traditional models of auditory temporal integration are similar
334 to visual models. They assume that we hear a stable auditory
335 image in response to a periodic sound because the neural activity
336 is passed through a temporal weighting function that integrates
337 over time. The output does not fluctuate if the integration time
338 is long enough. Unfortunately, the simple model of temporal
339 integration does not work for the auditory system. If the output
340 is to be stable, the integrator must integrate over 10 or more
341 cycles of the sound. We hear stable images for pitches as low
342 as, say 50 cycles per second, which suggests that the integration
343 time of the auditory system would have to be 200 ms at the
344 minimum. Such an integrator would cause far more smearing of
345 auditory information than we know occurs. For example, phase
346 shifts that produce small changes half way through the period of
347 a pulse train are often audible (see Patterson, 1987, for a
348 review). Small changes of this sort would be obscured by lengthy
349 temporal integration.
350 .LP
351 Thus the problem in modelling auditory temporal integration is
352 to determine how the auditory system can integrate information
353 to form a stable auditory image without losing the fine-grain
354 temporal information within the individual cycles of periodic
355 sounds. In visual terms, the problem is how to present a neural
356 activity pattern at a rate of 5 metres per second while at the
357 same time enabling the viewer to see features within periods
358 greater than about 4 ms.
359 .LP
360 .SS "Periodic sounds and information packets. "
361 .PP
362 Now consider temporal integration from an information processing
363 perspective, and in particular, the problem of preserving formant
364 information in the auditory image. The shape of the neural
365 activity pattern within the period of a vowel sound provides
366 information about the resonances of the vocal tract (see Figure
367 3.6), and thus the identity of the vowel. The information about
368 the source arrives in packets whose duration is the period of the
369 source. Many of the sounds in speech and music have the property
370 that the source information changes relatively slowly when
371 compared with the repetition rate of the source wave (i.e. the
372 pitch). Thus, from an information processing point of view, one
373 would like to combine source information from neighbouring
374 packets, while at the same time taking care not to smear the
375 source information contained within the individual packets. In
376 short, one would like to perform quantised temporal integration,
377 integrating over cycles but not within cycles of the sound.
378 .LP
379 .SH EXAMPLES
380 .LP
381 This first pair of examples is intended to illustrate the
382 dominant forms of motion that appear in the auditory image, and
383 the fact that shapes can be tracked across the image provided the
384 rate of change is not excessive. The first example is a pitch
385 glide for a note with fixed timbre. The second example involves
386 formant motion (a form of timbre glide) in a monotone voice (i.e.
387 for a relatively fixed pitch).
388 .LP
389 .SS "A pitch glide in the auditory image "
390 .PP
391 Up to this point, we have focussed on the way that TQTI can
392 convert a fast flowing NAP pattern into a stabilised auditory
393 image. The mechanism is not, however, limited to continuous or
394 stationary sounds. The data file cegc contains pulse trains that
395 produce pitches near the musical notes C3, E3, G3, and C4, along
396 with glides from one note to the next. The notes are relatively
397 long and the pitch glides are relatively slow. As a result, each
398 note form a stabilised auditory image and there is smooth motion
399 from one note image to the next. The stimulus file cegc is
400 intended to support several examples including ones involving the
401 spiral representation of the auditory image and its relationship
402 to musical consonance in the next chapter. For brevity, the
403 current example is limited to the transition from C to E near the
404 start of the file. The pitch of musical notes is determined by
405 the lower harmonics when they are present and so the command for
406 the example is:
407 .LP
408 gensai mag=16 min=100 max=2000 start=100 length=600
409 duration_sai=32 cegc
410 .LP
411 In point of fact, the pulse train associated with the first note
412 has a period of 8 ms like pt_8ms and so this "C" is actually a
413 little below the musical note C3. Since the initial C is the
414 same as pt_8ms, the onset of the first note is the same as in the
415 previous example; however, four cycles of the pulse train pattern
416 build up in the window because it has been set to show 32 ms of
417 'auditory image time'. During the transition, the period of the
418 stimulus decreases from 32/4 ms down to 32/5 ms, and so the image
419 stabilises with five cycles in the window. The period of E is
420 4/5 that of C.
421 .LP
422 During the transition, in the lower channels associated with the
423 first and second harmonic, the individual SAI pulses march from
424 left to right in time and, at the same time, they move up in
425 frequency as the energy of these harmonics moves out of lower
426 filters and into higher filters. In these low channels the
427 motion is relatively smooth because the SAI pulse has a duration
428 which is a significant proportion of the period of the sound. As
429 the pitch rises and the periods get shorter, each new NAP cycle
430 contributes a NAP pulse which is shifted a little to the right
431 relative to the corresponding SAI pulse. This increases the
432 leading edge of the SAI pulse without contributing to the lagging
433 edge. As a result, the leading edge builds, the lagging edge
434 decays, and the SAI pulse moves to the right. The SAI pulses are
435 asymmetric during the motion, with the trailing edge more shallow
436 than the leading edge, and the effect is greater towards the left
437 edge of the image because the discrepancies over four cycles are
438 larger than the discrepancies over one cycle. The effects are
439 larger for the second harmonic than for the first harmonic
440 because the width of the pulses of the second harmonic are a
441 smaller proportion of the period. During the pitch glide the SAI
442 pulses have a reduced peak height because the activity is
443 distributed over more channels and over longer durations.
444 .LP
445 The SAI pulses associated with the higher harmonics are
446 relatively narrow with regard to the changes in period during the
447 pitch glide. As a result there is more blurring of the image
448 during the glide in the higher channels. Towards the right-hand
449 edge, for the column that shows correlations over one cycle, the
450 blurring is minimal. Towards the left-hand edge the details of
451 the pattern are blurred and we see mainly activity moving in
452 vertical bands from left to right. When the glide terminates the
453 fine structure reforms from right to left across the image and
454 the stationary image for the note E appears.
455 .LP
456 The details of the motion are more readily observed when the
457 image is played in slow motion. If the disc space is available
458 (about 1.3 Mbytes), it is useful to generate a cegc.img file
459 using the image option. The auditory image can then be played
460 in slow motion using the review command and the slow down option
461 "-".
462 .LP
463 .LP
464 .SS "Formant motion in the auditory image "
465 .PP
466 The vowels of speech are quasi-periodic sounds and the period for
467 the average male speaker is on the order of 8ms. As the
468 articulators change the shape of the vocal tract during speech,
469 formants appear in the auditory image and move about. The
470 position and motion of the formants represent the speech
471 information conveyed by the voiced parts of speech. When the
472 speaker uses a monotone voice, the pitch remains relatively
473 steady and the motion of the formants is essentially in the
474 vertical dimension. An example of monotone voiced speech is
475 provided in the file leo which is the acoustic waveform of the
476 word 'leo'. The auditory image of leo can be produced using the
477 command
478 .LP
479 gensai mag=12 segment=40 duration_sai=20 leo
480 .LP
481 The dominant impression on first observing the auditory image of
482 leo is the motion in the formation of the "e" sound, the
483 transition from "e" to "o", and the formation of the "o" sound.
484 .LP
485 The vocal chords come on at the start of the "l" sound but the
486 tip of the tongue is pressed against the roof of the mouth just
487 behind the teeth and so it restricts the air flow and the start
488 of the "l" does not contain much energy. As a result, in the
489 auditory image, the presence of the "l" is primarily observed in
490 the transition from the "l" to the "e". That is, as the three
491 formants in the auditory image of the "e" come on and grow
492 stronger, the second formant glides into its "e" position from
493 below, indicating that the second formant was recently at a lower
494 frequency for the previous sound.
495 .LP
496 In the "e", the first formant is low, centred on the third
497 harmonic at the bottom of the auditory image. The second formant
498 is high, up near the third formant. The lower portion of the
499 fourth formant shows along the upper edge of the image.
500 Recognition systems that ignore temporal fine structure often
501 have difficulty determining whether a high frequency
502 concentration of energy is a single broad formant or a pair of
503 narrower formants close together. This makes it more difficult
504 to distinguish "e". In the auditory image, information about the
505 pulsing of the vocal chords is maintained and the temporal
506 fluctuation of the formant shapes makes it easier to distinguish
507 that there are two overlapping formants rather than a single
508 large formant.
509 .LP
510 As the "e" changes into the "o", the second formant moves back
511 down onto the eighth harmonic and the first formant moves up to
512 a position between the third and fourth harmonics. The third and
513 fourth formants remain relatively fixed in frequency but they
514 become softer as the "o" takes over. During the transition, the
515 second formant becomes fuzzy and moves down a set of vertical
516 ridges at multiples of the period.
517 .LP
518 .LP
519 .SS "The vowel triangle: aiua "
520 .PP
521 In speech research, the vowels are specified by the centre
522 frequencies of their formants. The first two formants carry the
523 most information and it is common to see sets of vowels
524 represented on a graph whose axes are the centre frequencies of
525 the first and second formant. Not all combinations of these
526 formant frequencies occur in speech; rather, the vowels occupy a
527 triangular region within this vowel space and the points of the
528 triangle are represented by /a/ as in paw /i/ as in beet, /u/ as
529 in toot. The file aiua contains a synthetic speech wave that
530 provides a tour around the vowel triangle from /a/ to /i/ to /u/
531 and back to /a/, and there are smooth transitions from one vowel
532 to the next. The auditory image of aiua can be generated using
533 the command
534 .LP
535 gensai mag=12 segment=40 duration=20 aiua
536 .LP
537 The initial vowel /a/ has a high first formant centred on the
538 fifth harmonic and a low second formant centred between the
539 seventh and eighth harmonics (for these low formants the harmonic
540 number can be determined by counting the number of SAI peaks in
541 one period of the image). The third formant is at the top of the
542 image and it is reasonably strong, although relatively short in
543 duration. As the sound changes from /a/ to /i/, the first formant
544 moves successively down through the low harmonics and comes to
545 rest on the second harmonic. At the same time the second formant
546 moves all the way up to a position adjacent to the third formant,
547 similar to the "e" in leo. All three of the formants are
548 relatively strong. During the transition from the /i/ to the /
549 u/, the third formant becomes much weaker;. The second formant
550 moves down onto the seventh harmonic and it remains relatively
551 weak. The first formant remains centred on the second harmonic
552 and it is relatively strong. Finally, the formants return to
553 their /a/ positions.
554 .LP
555 .LP
556 .SS "Speaker separation in the auditory image "
557 .PP
558 One of the more intriguing aspects of speech recognition is our
559 ability to hear out one voice in the presence of competing voices
560 -- the proverbial cocktail party phenomenon. It is assumed that
561 we use pitch differences to help separate the voices. In support
562 of this view, several researchers have presented listeners with
563 pairs of vowels and shown that they can discriminate the vowels
564 better when they have different pitches (Summerfield & Assman,
565 1989). The final example involves a double vowel stimulus, /a/
566 with /i/, and it shows that stable images of the dominant
567 formants of both vowels appear in the image. The file dblvow
568 (double vowel) contains seven double-vowel pulses. The amplitude
569 of the /a/ is fixed at a moderate level; the amplitude of the /
570 i/ begins at a level 12 dB greater than that of the /a/ and it
571 decreases 4 dB with each successive pulse, and so they are equal
572 in level in the fourth pulse. Each pulse is 200 ms in duration
573 with 20 ms rise and fall times that are included within the 200
574 ms. There are 80 ms silent gaps between pulses and a gap of 80
575 ms at the start of the file. The auditory image can be generated
576 with the command
577 .LP
578 gensai mag=12 samplerate=10000 segment=40 duration=20 dblvow
579 .LP
580 The pitch of the /a/ and the /i/ are 100 and 125 Hz, respectively.
581 The image reveals a strong first formant centred on the second
582 harmonic of 125 Hz (8 ms), and strong third and fourth formants
583 with a period of 8 ms (125 Hz). These are the formants of the /
584 e/ which is the stronger of the two vowels at this point. In
585 between the first and second formants of the /i/ are the first
586 and second formants of the /a/ at a somewhat lower level. The
587 formants of the /a/ show their proper period, 10 ms. The
588 triggering mechanism can stabilise the formants of both vowels
589 at their proper periods because the triggering is done on a
590 channel by channel basis. The upper formants of the /a/ fall in
591 the same channels as the upper formants of the /i/ and since they
592 are much weaker, they are repressed by the /i/ formants.
593 .LP
594 As the example proceeds, the formants of the /e/ become
595 progressively weaker. In the image of the fifth burst of the
596 double vowel we see evidence of both the upper formants of the /
597 i/ and the upper formants of the /a/ in the same channel.
598 Finally, in the last burst the first formant of the /i/ has
599 disappeared from the lowest channels entirely. There is still
600 some evidence of /e/ in the region of the upper formants but it
601 is the formants of the /a/ that now dominate in the high frequency
602 region.
603 .LP
604 .SH SEE ALSO
605 .LP
606 .SH COPYRIGHT
607 .LP
608 Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
609 .LP
610 Permission to use, copy, modify, and distribute this software without fee
611 is hereby granted for research purposes, provided that this copyright
612 notice appears in all copies and in all supporting documentation, and that
613 the software is not redistributed for any fee (except for a nominal
614 shipping charge). Anyone wanting to incorporate all or part of this
615 software in a commercial product must obtain a license from the Medical
616 Research Council.
617 .LP
618 The MRC makes no representations about the suitability of this
619 software for any purpose. It is provided "as is" without express or
620 implied warranty.
621 .LP
622 THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
623 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
624 THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
625 OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
626 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
627 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
628 SOFTWARE.
629 .LP
630 .SH ACKNOWLEDGEMENTS
631 .LP
632 The AIM software was developed for Unix workstations by John
633 Holdsworth and Mike Allerhand of the MRC APU, under the direction of
634 Roy Patterson. The physiological version of AIM was developed by
635 Christian Giguere. The options handler is by Paul Manson. The revised
636 SAI module is by Jay Datta. Michael Akeroyd extended the postscript
637 facilites and developed the xreview routine for auditory image
638 cartoons.
639 .LP
640 The project was supported by the MRC and grants from the U.K. Defense
641 Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
642 BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
643