view man/man1/genspl.1 @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
line wrap: on
line source
.TH GENSPL 1 "8 September 1993"
.LP
.SH NAME
.LP
genspl \- spiral auditory image of a pulse train
.LP
.SH SYNOPSIS/SYNTAX
.LP
genspl [ option=value  |  -option ]  filename
.LP
.SH DESCRIPTION
.LP
Since the spiral auditory image is just a different view of the 
auditory image, it includes all of the flags associated 
previously with the gensai command.  In the ASP software, the 
spiral auditory image is presented in cartoon form, similar to 
the presentation of the linear auditory image. The spiral view 
of the auditory image is a global view of the sound that 
emphasises pitch and de-emphasises timbre.  It is a distant 
perspective taken in order to view the longer term correlations 
that arise in periodic sounds.  It is difficult to represent the 
functions of the SAI visually in a spiral form; the fine detail 
of the functions wouldbe lost in the spiral perspective.  
Accordingly, in the spiral perspective each of the separate SAI 
pulses is replaced by a dot positioned at the time of the peak 
of the pulse.  Previously, this representation was referred to 
as a pulse ribbon (Patterson, 1987a).
.LP
Conceptually, the spiral auditory image is a set of concentric 
spirals one for each channel of the auditory image.  The highest 
frequency channel is on the inside with the smallest radius; the 
lowest frequency channel is on the outside with the largest 
radius.  The spirals lines are omitted for clarity, leaving just 
the dots.  The presence of bars shows that the same period exists 
in a range of filter channels.  Note, however, that this 
information about correlation across channels appears on the same 
spoke as the information indicating that the pattern repeats on 
the auditory image in time.  Thus the multi-channel spiral maps 
both spectral and temporal information concerning the 
periodicity of the sound onto a single spatial vector -- a spoke 
of the spiral.  It is this property that enables the spiral 
representation to explain octave perception (Patterson, 1990).  
.LP
.SS "A pitch glide in the spiral auditory image "
.PP
The spiral auditory image, like its linear counterpart, is not 
limited to periodic sounds.  When the pitch of a sound glides 
smoothly from one note to another the pattern on the spiral 
auditory image rotates smoothly from one position to another, and 
when the pitch changes abruptly from one note to another, the 
spiral pattern dissolves at the end of the first note and forms 
again in a different orientation at the start of the second note.  
.LP
The spiral spokes grow from the centre outwards as the 
correlation across cycles grows.  For the note C3, four spokes 
form:  the vertical spoke contains information about periods 
separated by 1, 2, 4 and 8 cycles; the spoke at 25 minutes past 
the hour contains information about periods separated by 3 and 6 
cycles; the remaining two spokes at 40 and 10 minutes past the 
hour contain information about periods separated by 5 and 7 
cycles, respectively.  
.LP
	As the pitch of the note changes from C3 to E3, the pattern 
rotates 20 minutes, and the spoke that was previously at 40 
minutes moves into the vertical position.  Then, as the pitch 
glides from E3 to G3, the spoke which was at  25 minutes in C3, 
moves into the vertical position. As the pitch glides on up from 
G3 to C4 the longest spoke of the pattern returns to the vertical 
position completing one revolution as the pitch rises an octave.  
Note, however, that each of the spokes has been extended by one 
circuit towards the centre of the spiral.  Thus, in the ASP model, 
octaves are perceived to be similar because they produce spoke 
patterns with the same orientation on the spiral auditory image 
and the notes of the major triad are those with a spoke that 
coincides with the main spoke of the tonic.  A theory of musical 
consonance based on the coincidence of spokes in spiral auditory 
images is presented in Patterson (1986).
.LP
.LP
.SH OPTIONS
.LP
.SS "Display options for the spiral auditory image "
.PP
The options that control the position of the spiral image window 
on the screen are the same as for all previous windows.  
Furthermore, since the spiral auditory image is a cartoon just 
like the linear auditory image, it may be generated, stored, 
animated, and reviewed in the same way as the linear auditory 
image.  In addition, there are six new display options for the 
spiral view of the auditory image.
.LP
.TP 11
spiral
Switch to spiral auditory image 
.RS
Switch: Default, off. 
.RE
.RS
When spiral is set to "on" the time dimension of the auditory image is plotted as a spiral and the SAI function is replaced with dots positioned at the peaks of the pulses in the SAI function. 
.RE
.TP 13
form_spl
The form of the spiral time line 
.RS
Switch: Default, archimedian. 
.RE
.RS
The software offers two visual representations of the underlying logarithmic spiral, both of which have the base 2. 
.RE
Both representations gather doublings in time onto a specific 
spoke of the spiral, and so both have the general property that 
.LP
	q = log2(t/T) 	(6.1)
.LP
q is the angle between the horizontal axis and the radius drawn 
to point on the spiral.  T is the period of the sampling rate and 
t is "auditory image time", both in seconds.  Every time t doubles 
q increases by 1, and so the integer part of q (the characteristic 
of the logarithm) specifies the circuit of the spiral.  The 
fractional part of the logarithm (the mantissa) specifies the 
angle within the circuit, and in this case, the angle is measured 
in revolutions, or circuits.
.LP
The archimedian spiral is like a coil of rope; that is, the radius 
increases by the thickness of the rope on each successive 
circuit.  The form of the archimedian spiral is 
.LP
	r = aq = a log2(t/T)	(6.2)
.LP
where r is the radius from the centre of the spiral to a point 
on the spiral.  The logarithmic spiral has the form 
.LP
	r = 2q = 2log2(t/T) = t/T	(6.3)
.LP
The logarithmic version of the spiral has the advantage that 
image time is linear along the path of the spiral.  However, it 
has the disadvantage that it expands rapidly, and so the current 
default is archimedian.  
.LP
.LP
.TP 16
dotsize_spl
The size of the dots on the spiral 
.RS
Default units, pixels: Default value, 2 pixels. 
.RE
.RS
The dots plotted on the spiral are actually small squares and the value dotsize_spl determines the number of pixels along the side of the square. 
.RE
.TP 13
axis_spl
Spiral axis, or time line 
.RS
Switch: Default, off 
.RE
.RS
When the axis_spl switch is set to "on", a spiral axis, or time line is plotted. It is presented on the outside of the circuit, one channel below the lowest filter channel, just as in the linear image. The default value for axis_spl is "off" because the spiral axis contains a large number of points and it is slow to calculate and plot. 
.RE
Note:  The length of spiral displayed in the window is determined 
by duration_sai.  This is the same duration_sai as for the linear 
image.  The size of the spiral display is scaled so that the 
radius associated with duration_sai fits inside the rectangle 
specified for the window.  The spiral does not have to be 
presented in a square window and in some instances rectangular 
windows are quite effective for giving a sense of depth.
.LP
.TP 13
zero_spl
Spiral start point and spiral orientation 
.RS
Default units: revolutions. Default value 4.072 revolutions. 
.RE
.RS
This parameter determines the minimum "auditory image time" that appears on the spiral, and thus it determines the zero point on the spiral. 
.RE
The parameter zero_spl has two primary uses:  Firstly, it enables 
the user to determine the orientation of the main spoke of the 
spiral for a given combination of sampling rate and stimulus 
period.  Without the parameter zero_spl, the orientation of the 
spiral would be fixed by the sampling rate and period of the 
sound.  Periods that are an exact power-of-2 times the base 
period, 1/T, would appear on the spoke preceding horizontally 
from the centre of the spiral towards the right.  By removing a 
portion of a circuit the orientation of the spiral can be set to 
suit the user.  A reduction in zero_spl of 0.25 will rotate the 
main spoke from horizontal to vertical. 
.LP
The second purpose of zero_spl is to enable the user to adjust 
the image to the period being displayed; that is, to focus on the 
octave of the current sound.  For example, when the sound has a 
long period, like 8 ms, the activity produced by the sound falls 
in the outer circuits of the spiral.  If zero_spl is set to a 
small value (<2) the centre of the display will be largely blank.  
The short circuits associated with higher octaves can be removed 
by setting zero_spl to a larger value, say 4, in which case a 
sound with an 8 ms period will fill the display.  
.LP
The one parameter zero_spl can be used to both scale and rotate 
the spiral simultaneously; integer changes in the parameter cause 
a scaling without rotation.  The default value, 4.072, assigns a 
vertical spoke to a period of 8 ms (and its base-2 relatives) 
when the sampling rate is 20 kHz (or a base-2 relative).
.LP
.TP 18
dotthresh_spl
Threshold value for the production of a spiral dot 
.RS
Unit: SAI strength. Default value, 50 SAI units. 
.RE
.RS
This threshold specifies the value that a pulse in the SAI must reach, or exceeds in order for it to be presented as a dot in the spiral image. 
.RE
.LP
.SH EXAMPLES
.LP
In order to understand the spiral mapping, look at the auditory 
image of C3 and imagine the pulse ribbon that would be formed by 
replacing each SAI pulse with a dot and extending the duration 
of the image to 70 ms so that it will accommodate eight cycles 
of the note.  The spiral view is produced by compressing the pulse 
ribbon vertically, stretching it horizontally, and then wrapping 
it counterclockwise into a spiral, with the right-hand edge at 
the centre of the spiral and the left-hand edge at the end of the 
outer circuit.  The dots from vertical columns of pulses in the 
linear auditory image, merge into short bars in the spiral view 
because of the vertical compression; the bars fall along  spokes 
radiating from the centre of the spiral.  The dots from the arches 
of pulses on either side of the vertical column in the linear 
auditory image appear in a stretched form like "wings" in the 
spiral auditory image.
.LP
In the case of C3 four of the bars are aligned on one spoke of 
the spiral (the vertical spoke); they represent the strong 
correlations that  occur in the auditory image for cycles of the 
original sound separated by 1, 2, 4, and 8 cycles.  In this way, 
much of the information that is distributed across the temporal 
dimension of the linear auditory image is gathered together into 
a single spatial vector.
.LP
.LP
The wave cegc provides an example of how the spiral auditory 
image follows pitch glides from one note to another.  One 
reasonable version of the spiral pitch glide is provided by the 
command
.LP
.LP
genspl width=600 height=550 duration_sai=70 zero_spl=5.072 cegc
.LP
.LP
.SS "The separation of pitch and timbre in the auditory image. "
.PP
The file vowgld contains a synthetic speech waveform that 
combines both formant motion and pitch motion; the formant motion 
is a rapid tour around the vowel triangle as in aiua, and the 
pitch motion is C3, E3, G3 and C4.  A linear auditory image of 
vowgld can be generated with the command 
.LP
.LP
gensai width=420 height=420 mag=12 segment=40 duration_sai=20 
spiral=off vowgld
.LP
.LP
The motion in the linear auditory image is similar to that 
observed with aiua in Chapter 5.  That is, the formants move 
vertically as the vowels change from one to the next.  In this 
example, however, there is pitch motion and the period decreases 
by a factor of 2 as the example proceeds.  The pitch change is 
observed primarily as horizontal motion that is largely 
independent of the formant motion.  In point of fact, the 
resolved harmonics in the lower half of the auditory image are 
rising in frequency as the example proceeds but this does not 
seem to interfere with the perception of either the vertical 
motion of the formants or the horizontal   shrinking of the 
period.
.LP
Although the rise in pitch can be observed in the linear auditory 
image it is not the dominant perception; rather, it is the 
formant motion that dominates in this microscopic view of the 
auditory image.  A spiral auditory image of vowgld can be 
generated with the command
.LP
.LP
gensai width=420 height=420 segment=40 duration_sai=70 spiral=on 
zero_spl=5.072 vowgld
.LP
.LP
The motion in the spiral auditory image is dominated by the 
rotation of the spokes, that is, the pitch motion.  The motion 
of the formants is represented in the spiral image in the sense 
that there is more sparkle in the information that is not on the 
main spoke pattern.  This sparkle is caused by the formant energy 
changing channels as the formants move from channel to channel 
within one circuit of the spiral.  But the fact that the motion 
in successive circuits is coordinated is not apparent in this 
macro view of the auditory image.
.LP
A more dramatic example of the enhancement pitch and the 
repression of timbre can be produced by generating a spiral 
auditory image for aiua  in which the pitch is fixed and the 
vowels range around the vowel triangle.  The formant information 
on the spokes changes as the vowel tour proceeds but the position 
of the spokes remains fixed.  The vowel information of the spokes 
rushes around in three discrete transitions but there is no 
particular pattern to the motion. 
.LP
Thus, in the ASP model, pitch and timbre are just two views of 
the same auditory image; pitch effects are observed when we stand 
back and take a macroscopic view of the auditory image; timbre 
details are observed when we move in close and take a microscopic 
view of the auditory image. 
.LP
.LP
The review program has the capacity to present two auditory 
images simultaneously.  If linear and spiral auditory images of 
vowgld are generated and stored using image=on, they can be 
replayed simultaneously and compared using the command 
.LP
review vowgld_l vowgld_s
.LP
Caution: this requires the user to produce separate image files 
(vowgld_l.img, vowgld_s.img) either by producing the images from 
copies of vowgld with different names, or by renaming the 
auditory images as they are produced.  If two different auditory 
images are produced from the same file, the second will overwrite 
the first even though one has a linear format and one a spiral 
format.  
.LP
.SS "Multiple pitches in the spiral auditory image "
.PP
It is generally assumed that when two people are speaking at the 
same time, the listener uses the differences in the pitches of 
the two voices to assist in separating the speakers.  The final 
example in this chapter shows that the pitches of the /a/ and the 
/o/ in dblvow appear separately in the spiral auditory image, and 
that it would be reasonable to use the spiral to separate the 
channels associated with the two vowels and thereby assist 
speaker tracking.  The spiral auditory image can be generated by 
the command
.LP
.LP
gensai width=600 height=550 samplerate=10000 spiral=on 
duration=90 dblvow
.LP
.LP
The main spokes of the /a/ and the /i/ appear at angles of 40 and 
0 minutes past the hour, respectively, corresponding to periods 
of 10 and 8 ms.  Over the course of the example, the main spoke 
of the /i/ fades considerably while the main spoke of the /a/ 
increases somewhat.
.LP
The second spoke of the /a/ and /i/ patterns appear at 5 and 25 
minutes, respectively, and their strength  changes predictably 
as the example proceeds.  If either vowel were presented on its 
own there would be more than two spokes in the pattern of each 
vowel.  The presence of the second vowel represses spokes beyond 
the second in the patterns of both vowels.
.LP
.LP
.SH BUGS
.LP
Note:  the current vrsion of the software (release 3, June 1990) 
incorrectly adds linear axes to hardcopy figures.  Apologies.
.LP
.SH COPYRIGHT
.LP
Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
.LP
Permission to use, copy, modify, and distribute this software without fee 
is hereby granted for research purposes, provided that this copyright 
notice appears in all copies and in all supporting documentation, and that 
the software is not redistributed for any fee (except for a nominal 
shipping charge). Anyone wanting to incorporate all or part of this 
software in a commercial product must obtain a license from the Medical 
Research Council.
.LP
The MRC makes no representations about the suitability of this 
software for any purpose.  It is provided "as is" without express or 
implied warranty.
.LP
THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING 
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL 
THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES 
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, 
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS 
SOFTWARE.
.LP
.SH ACKNOWLEDGEMENTS
.LP
The AIM software was developed for Unix workstations by John
Holdsworth and Mike Allerhand of the MRC APU, under the direction of
Roy Patterson. The physiological version of AIM was developed by
Christian Giguere. The options handler is by Paul Manson. The revised
SAI module is by Jay Datta. Michael Akeroyd extended the postscript
facilites and developed the xreview routine for auditory image
cartoons.
.LP
The project was supported by the MRC and grants from the U.K. Defense
Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.