comparison man/man1/genspl.1 @ 0:5242703e91d3 tip

Initial checkin for AIM92 aimR8.2 (last updated May 1997).
author tomwalters
date Fri, 20 May 2011 15:19:45 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:5242703e91d3
1 .TH GENSPL 1 "8 September 1993"
2 .LP
3 .SH NAME
4 .LP
5 genspl \- spiral auditory image of a pulse train
6 .LP
7 .SH SYNOPSIS/SYNTAX
8 .LP
9 genspl [ option=value | -option ] filename
10 .LP
11 .SH DESCRIPTION
12 .LP
13 Since the spiral auditory image is just a different view of the
14 auditory image, it includes all of the flags associated
15 previously with the gensai command. In the ASP software, the
16 spiral auditory image is presented in cartoon form, similar to
17 the presentation of the linear auditory image. The spiral view
18 of the auditory image is a global view of the sound that
19 emphasises pitch and de-emphasises timbre. It is a distant
20 perspective taken in order to view the longer term correlations
21 that arise in periodic sounds. It is difficult to represent the
22 functions of the SAI visually in a spiral form; the fine detail
23 of the functions wouldbe lost in the spiral perspective.
24 Accordingly, in the spiral perspective each of the separate SAI
25 pulses is replaced by a dot positioned at the time of the peak
26 of the pulse. Previously, this representation was referred to
27 as a pulse ribbon (Patterson, 1987a).
28 .LP
29 Conceptually, the spiral auditory image is a set of concentric
30 spirals one for each channel of the auditory image. The highest
31 frequency channel is on the inside with the smallest radius; the
32 lowest frequency channel is on the outside with the largest
33 radius. The spirals lines are omitted for clarity, leaving just
34 the dots. The presence of bars shows that the same period exists
35 in a range of filter channels. Note, however, that this
36 information about correlation across channels appears on the same
37 spoke as the information indicating that the pattern repeats on
38 the auditory image in time. Thus the multi-channel spiral maps
39 both spectral and temporal information concerning the
40 periodicity of the sound onto a single spatial vector -- a spoke
41 of the spiral. It is this property that enables the spiral
42 representation to explain octave perception (Patterson, 1990).
43 .LP
44 .SS "A pitch glide in the spiral auditory image "
45 .PP
46 The spiral auditory image, like its linear counterpart, is not
47 limited to periodic sounds. When the pitch of a sound glides
48 smoothly from one note to another the pattern on the spiral
49 auditory image rotates smoothly from one position to another, and
50 when the pitch changes abruptly from one note to another, the
51 spiral pattern dissolves at the end of the first note and forms
52 again in a different orientation at the start of the second note.
53 .LP
54 The spiral spokes grow from the centre outwards as the
55 correlation across cycles grows. For the note C3, four spokes
56 form: the vertical spoke contains information about periods
57 separated by 1, 2, 4 and 8 cycles; the spoke at 25 minutes past
58 the hour contains information about periods separated by 3 and 6
59 cycles; the remaining two spokes at 40 and 10 minutes past the
60 hour contain information about periods separated by 5 and 7
61 cycles, respectively.
62 .LP
63 As the pitch of the note changes from C3 to E3, the pattern
64 rotates 20 minutes, and the spoke that was previously at 40
65 minutes moves into the vertical position. Then, as the pitch
66 glides from E3 to G3, the spoke which was at 25 minutes in C3,
67 moves into the vertical position. As the pitch glides on up from
68 G3 to C4 the longest spoke of the pattern returns to the vertical
69 position completing one revolution as the pitch rises an octave.
70 Note, however, that each of the spokes has been extended by one
71 circuit towards the centre of the spiral. Thus, in the ASP model,
72 octaves are perceived to be similar because they produce spoke
73 patterns with the same orientation on the spiral auditory image
74 and the notes of the major triad are those with a spoke that
75 coincides with the main spoke of the tonic. A theory of musical
76 consonance based on the coincidence of spokes in spiral auditory
77 images is presented in Patterson (1986).
78 .LP
79 .LP
80 .SH OPTIONS
81 .LP
82 .SS "Display options for the spiral auditory image "
83 .PP
84 The options that control the position of the spiral image window
85 on the screen are the same as for all previous windows.
86 Furthermore, since the spiral auditory image is a cartoon just
87 like the linear auditory image, it may be generated, stored,
88 animated, and reviewed in the same way as the linear auditory
89 image. In addition, there are six new display options for the
90 spiral view of the auditory image.
91 .LP
92 .TP 11
93 spiral
94 Switch to spiral auditory image
95 .RS
96 Switch: Default, off.
97 .RE
98 .RS
99 When spiral is set to "on" the time dimension of the auditory image is plotted as a spiral and the SAI function is replaced with dots positioned at the peaks of the pulses in the SAI function.
100 .RE
101 .TP 13
102 form_spl
103 The form of the spiral time line
104 .RS
105 Switch: Default, archimedian.
106 .RE
107 .RS
108 The software offers two visual representations of the underlying logarithmic spiral, both of which have the base 2.
109 .RE
110 Both representations gather doublings in time onto a specific
111 spoke of the spiral, and so both have the general property that
112 .LP
113 q = log2(t/T) (6.1)
114 .LP
115 q is the angle between the horizontal axis and the radius drawn
116 to point on the spiral. T is the period of the sampling rate and
117 t is "auditory image time", both in seconds. Every time t doubles
118 q increases by 1, and so the integer part of q (the characteristic
119 of the logarithm) specifies the circuit of the spiral. The
120 fractional part of the logarithm (the mantissa) specifies the
121 angle within the circuit, and in this case, the angle is measured
122 in revolutions, or circuits.
123 .LP
124 The archimedian spiral is like a coil of rope; that is, the radius
125 increases by the thickness of the rope on each successive
126 circuit. The form of the archimedian spiral is
127 .LP
128 r = aq = a log2(t/T) (6.2)
129 .LP
130 where r is the radius from the centre of the spiral to a point
131 on the spiral. The logarithmic spiral has the form
132 .LP
133 r = 2q = 2log2(t/T) = t/T (6.3)
134 .LP
135 The logarithmic version of the spiral has the advantage that
136 image time is linear along the path of the spiral. However, it
137 has the disadvantage that it expands rapidly, and so the current
138 default is archimedian.
139 .LP
140 .LP
141 .TP 16
142 dotsize_spl
143 The size of the dots on the spiral
144 .RS
145 Default units, pixels: Default value, 2 pixels.
146 .RE
147 .RS
148 The dots plotted on the spiral are actually small squares and the value dotsize_spl determines the number of pixels along the side of the square.
149 .RE
150 .TP 13
151 axis_spl
152 Spiral axis, or time line
153 .RS
154 Switch: Default, off
155 .RE
156 .RS
157 When the axis_spl switch is set to "on", a spiral axis, or time line is plotted. It is presented on the outside of the circuit, one channel below the lowest filter channel, just as in the linear image. The default value for axis_spl is "off" because the spiral axis contains a large number of points and it is slow to calculate and plot.
158 .RE
159 Note: The length of spiral displayed in the window is determined
160 by duration_sai. This is the same duration_sai as for the linear
161 image. The size of the spiral display is scaled so that the
162 radius associated with duration_sai fits inside the rectangle
163 specified for the window. The spiral does not have to be
164 presented in a square window and in some instances rectangular
165 windows are quite effective for giving a sense of depth.
166 .LP
167 .TP 13
168 zero_spl
169 Spiral start point and spiral orientation
170 .RS
171 Default units: revolutions. Default value 4.072 revolutions.
172 .RE
173 .RS
174 This parameter determines the minimum "auditory image time" that appears on the spiral, and thus it determines the zero point on the spiral.
175 .RE
176 The parameter zero_spl has two primary uses: Firstly, it enables
177 the user to determine the orientation of the main spoke of the
178 spiral for a given combination of sampling rate and stimulus
179 period. Without the parameter zero_spl, the orientation of the
180 spiral would be fixed by the sampling rate and period of the
181 sound. Periods that are an exact power-of-2 times the base
182 period, 1/T, would appear on the spoke preceding horizontally
183 from the centre of the spiral towards the right. By removing a
184 portion of a circuit the orientation of the spiral can be set to
185 suit the user. A reduction in zero_spl of 0.25 will rotate the
186 main spoke from horizontal to vertical.
187 .LP
188 The second purpose of zero_spl is to enable the user to adjust
189 the image to the period being displayed; that is, to focus on the
190 octave of the current sound. For example, when the sound has a
191 long period, like 8 ms, the activity produced by the sound falls
192 in the outer circuits of the spiral. If zero_spl is set to a
193 small value (<2) the centre of the display will be largely blank.
194 The short circuits associated with higher octaves can be removed
195 by setting zero_spl to a larger value, say 4, in which case a
196 sound with an 8 ms period will fill the display.
197 .LP
198 The one parameter zero_spl can be used to both scale and rotate
199 the spiral simultaneously; integer changes in the parameter cause
200 a scaling without rotation. The default value, 4.072, assigns a
201 vertical spoke to a period of 8 ms (and its base-2 relatives)
202 when the sampling rate is 20 kHz (or a base-2 relative).
203 .LP
204 .TP 18
205 dotthresh_spl
206 Threshold value for the production of a spiral dot
207 .RS
208 Unit: SAI strength. Default value, 50 SAI units.
209 .RE
210 .RS
211 This threshold specifies the value that a pulse in the SAI must reach, or exceeds in order for it to be presented as a dot in the spiral image.
212 .RE
213 .LP
214 .SH EXAMPLES
215 .LP
216 In order to understand the spiral mapping, look at the auditory
217 image of C3 and imagine the pulse ribbon that would be formed by
218 replacing each SAI pulse with a dot and extending the duration
219 of the image to 70 ms so that it will accommodate eight cycles
220 of the note. The spiral view is produced by compressing the pulse
221 ribbon vertically, stretching it horizontally, and then wrapping
222 it counterclockwise into a spiral, with the right-hand edge at
223 the centre of the spiral and the left-hand edge at the end of the
224 outer circuit. The dots from vertical columns of pulses in the
225 linear auditory image, merge into short bars in the spiral view
226 because of the vertical compression; the bars fall along spokes
227 radiating from the centre of the spiral. The dots from the arches
228 of pulses on either side of the vertical column in the linear
229 auditory image appear in a stretched form like "wings" in the
230 spiral auditory image.
231 .LP
232 In the case of C3 four of the bars are aligned on one spoke of
233 the spiral (the vertical spoke); they represent the strong
234 correlations that occur in the auditory image for cycles of the
235 original sound separated by 1, 2, 4, and 8 cycles. In this way,
236 much of the information that is distributed across the temporal
237 dimension of the linear auditory image is gathered together into
238 a single spatial vector.
239 .LP
240 .LP
241 The wave cegc provides an example of how the spiral auditory
242 image follows pitch glides from one note to another. One
243 reasonable version of the spiral pitch glide is provided by the
244 command
245 .LP
246 .LP
247 genspl width=600 height=550 duration_sai=70 zero_spl=5.072 cegc
248 .LP
249 .LP
250 .SS "The separation of pitch and timbre in the auditory image. "
251 .PP
252 The file vowgld contains a synthetic speech waveform that
253 combines both formant motion and pitch motion; the formant motion
254 is a rapid tour around the vowel triangle as in aiua, and the
255 pitch motion is C3, E3, G3 and C4. A linear auditory image of
256 vowgld can be generated with the command
257 .LP
258 .LP
259 gensai width=420 height=420 mag=12 segment=40 duration_sai=20
260 spiral=off vowgld
261 .LP
262 .LP
263 The motion in the linear auditory image is similar to that
264 observed with aiua in Chapter 5. That is, the formants move
265 vertically as the vowels change from one to the next. In this
266 example, however, there is pitch motion and the period decreases
267 by a factor of 2 as the example proceeds. The pitch change is
268 observed primarily as horizontal motion that is largely
269 independent of the formant motion. In point of fact, the
270 resolved harmonics in the lower half of the auditory image are
271 rising in frequency as the example proceeds but this does not
272 seem to interfere with the perception of either the vertical
273 motion of the formants or the horizontal shrinking of the
274 period.
275 .LP
276 Although the rise in pitch can be observed in the linear auditory
277 image it is not the dominant perception; rather, it is the
278 formant motion that dominates in this microscopic view of the
279 auditory image. A spiral auditory image of vowgld can be
280 generated with the command
281 .LP
282 .LP
283 gensai width=420 height=420 segment=40 duration_sai=70 spiral=on
284 zero_spl=5.072 vowgld
285 .LP
286 .LP
287 The motion in the spiral auditory image is dominated by the
288 rotation of the spokes, that is, the pitch motion. The motion
289 of the formants is represented in the spiral image in the sense
290 that there is more sparkle in the information that is not on the
291 main spoke pattern. This sparkle is caused by the formant energy
292 changing channels as the formants move from channel to channel
293 within one circuit of the spiral. But the fact that the motion
294 in successive circuits is coordinated is not apparent in this
295 macro view of the auditory image.
296 .LP
297 A more dramatic example of the enhancement pitch and the
298 repression of timbre can be produced by generating a spiral
299 auditory image for aiua in which the pitch is fixed and the
300 vowels range around the vowel triangle. The formant information
301 on the spokes changes as the vowel tour proceeds but the position
302 of the spokes remains fixed. The vowel information of the spokes
303 rushes around in three discrete transitions but there is no
304 particular pattern to the motion.
305 .LP
306 Thus, in the ASP model, pitch and timbre are just two views of
307 the same auditory image; pitch effects are observed when we stand
308 back and take a macroscopic view of the auditory image; timbre
309 details are observed when we move in close and take a microscopic
310 view of the auditory image.
311 .LP
312 .LP
313 The review program has the capacity to present two auditory
314 images simultaneously. If linear and spiral auditory images of
315 vowgld are generated and stored using image=on, they can be
316 replayed simultaneously and compared using the command
317 .LP
318 review vowgld_l vowgld_s
319 .LP
320 Caution: this requires the user to produce separate image files
321 (vowgld_l.img, vowgld_s.img) either by producing the images from
322 copies of vowgld with different names, or by renaming the
323 auditory images as they are produced. If two different auditory
324 images are produced from the same file, the second will overwrite
325 the first even though one has a linear format and one a spiral
326 format.
327 .LP
328 .SS "Multiple pitches in the spiral auditory image "
329 .PP
330 It is generally assumed that when two people are speaking at the
331 same time, the listener uses the differences in the pitches of
332 the two voices to assist in separating the speakers. The final
333 example in this chapter shows that the pitches of the /a/ and the
334 /o/ in dblvow appear separately in the spiral auditory image, and
335 that it would be reasonable to use the spiral to separate the
336 channels associated with the two vowels and thereby assist
337 speaker tracking. The spiral auditory image can be generated by
338 the command
339 .LP
340 .LP
341 gensai width=600 height=550 samplerate=10000 spiral=on
342 duration=90 dblvow
343 .LP
344 .LP
345 The main spokes of the /a/ and the /i/ appear at angles of 40 and
346 0 minutes past the hour, respectively, corresponding to periods
347 of 10 and 8 ms. Over the course of the example, the main spoke
348 of the /i/ fades considerably while the main spoke of the /a/
349 increases somewhat.
350 .LP
351 The second spoke of the /a/ and /i/ patterns appear at 5 and 25
352 minutes, respectively, and their strength changes predictably
353 as the example proceeds. If either vowel were presented on its
354 own there would be more than two spokes in the pattern of each
355 vowel. The presence of the second vowel represses spokes beyond
356 the second in the patterns of both vowels.
357 .LP
358 .LP
359 .SH BUGS
360 .LP
361 Note: the current vrsion of the software (release 3, June 1990)
362 incorrectly adds linear axes to hardcopy figures. Apologies.
363 .LP
364 .SH COPYRIGHT
365 .LP
366 Copyright (c) Applied Psychology Unit, Medical Research Council, 1995
367 .LP
368 Permission to use, copy, modify, and distribute this software without fee
369 is hereby granted for research purposes, provided that this copyright
370 notice appears in all copies and in all supporting documentation, and that
371 the software is not redistributed for any fee (except for a nominal
372 shipping charge). Anyone wanting to incorporate all or part of this
373 software in a commercial product must obtain a license from the Medical
374 Research Council.
375 .LP
376 The MRC makes no representations about the suitability of this
377 software for any purpose. It is provided "as is" without express or
378 implied warranty.
379 .LP
380 THE MRC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
381 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
382 THE A.P.U. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
383 OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
384 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
385 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
386 SOFTWARE.
387 .LP
388 .SH ACKNOWLEDGEMENTS
389 .LP
390 The AIM software was developed for Unix workstations by John
391 Holdsworth and Mike Allerhand of the MRC APU, under the direction of
392 Roy Patterson. The physiological version of AIM was developed by
393 Christian Giguere. The options handler is by Paul Manson. The revised
394 SAI module is by Jay Datta. Michael Akeroyd extended the postscript
395 facilites and developed the xreview routine for auditory image
396 cartoons.
397 .LP
398 The project was supported by the MRC and grants from the U.K. Defense
399 Research Agency, Farnborough (Research Contract 2239); the EEC Esprit
400 BR Porgramme, Project ACTS (3207); and the U.K. Hearing Research Trust.
401