Mercurial > hg > sv-dependency-builds
comparison src/libvorbis-1.3.3/doc/stereo.html @ 1:05aa0afa9217
Bring in flac, ogg, vorbis
author | Chris Cannam |
---|---|
date | Tue, 19 Mar 2013 17:37:49 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
0:c7265573341e | 1:05aa0afa9217 |
---|---|
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
2 <html> | |
3 <head> | |
4 | |
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> | |
6 <title>Ogg Vorbis Documentation</title> | |
7 | |
8 <style type="text/css"> | |
9 body { | |
10 margin: 0 18px 0 18px; | |
11 padding-bottom: 30px; | |
12 font-family: Verdana, Arial, Helvetica, sans-serif; | |
13 color: #333333; | |
14 font-size: .8em; | |
15 } | |
16 | |
17 a { | |
18 color: #3366cc; | |
19 } | |
20 | |
21 img { | |
22 border: 0; | |
23 } | |
24 | |
25 #xiphlogo { | |
26 margin: 30px 0 16px 0; | |
27 } | |
28 | |
29 #content p { | |
30 line-height: 1.4; | |
31 } | |
32 | |
33 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a { | |
34 font-weight: bold; | |
35 color: #ff9900; | |
36 margin: 1.3em 0 8px 0; | |
37 } | |
38 | |
39 h1 { | |
40 font-size: 1.3em; | |
41 } | |
42 | |
43 h2 { | |
44 font-size: 1.2em; | |
45 } | |
46 | |
47 h3 { | |
48 font-size: 1.1em; | |
49 } | |
50 | |
51 li { | |
52 line-height: 1.4; | |
53 } | |
54 | |
55 #copyright { | |
56 margin-top: 30px; | |
57 line-height: 1.5em; | |
58 text-align: center; | |
59 font-size: .8em; | |
60 color: #888888; | |
61 clear: both; | |
62 } | |
63 </style> | |
64 | |
65 </head> | |
66 | |
67 <body> | |
68 | |
69 <div id="xiphlogo"> | |
70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a> | |
71 </div> | |
72 | |
73 <h1>Ogg Vorbis stereo-specific channel coupling discussion</h1> | |
74 | |
75 <h2>Abstract</h2> | |
76 | |
77 <p>The Vorbis audio CODEC provides a channel coupling | |
78 mechanisms designed to reduce effective bitrate by both eliminating | |
79 interchannel redundancy and eliminating stereo image information | |
80 labeled inaudible or undesirable according to spatial psychoacoustic | |
81 models. This document describes both the mechanical coupling | |
82 mechanisms available within the Vorbis specification, as well as the | |
83 specific stereo coupling models used by the reference | |
84 <tt>libvorbis</tt> codec provided by xiph.org.</p> | |
85 | |
86 <h2>Mechanisms</h2> | |
87 | |
88 <p>In encoder release beta 4 and earlier, Vorbis supported multiple | |
89 channel encoding, but the channels were encoded entirely separately | |
90 with no cross-analysis or redundancy elimination between channels. | |
91 This multichannel strategy is very similar to the mp3's <em>dual | |
92 stereo</em> mode and Vorbis uses the same name for its analogous | |
93 uncoupled multichannel modes.</p> | |
94 | |
95 <p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and | |
96 later implement a coupled channel strategy. Vorbis has two specific | |
97 mechanisms that may be used alone or in conjunction to implement | |
98 channel coupling. The first is <em>channel interleaving</em> via | |
99 residue backend type 2, and the second is <em>square polar | |
100 mapping</em>. These two general mechanisms are particularly well | |
101 suited to coupling due to the structure of Vorbis encoding, as we'll | |
102 explore below, and using both we can implement both totally | |
103 <em>lossless stereo image coupling</em> [bit-for-bit decode-identical | |
104 to uncoupled modes], as well as various lossy models that seek to | |
105 eliminate inaudible or unimportant aspects of the stereo image in | |
106 order to enhance bitrate. The exact coupling implementation is | |
107 generalized to allow the encoder a great deal of flexibility in | |
108 implementation of a stereo or surround model without requiring any | |
109 significant complexity increase over the combinatorially simpler | |
110 mid/side joint stereo of mp3 and other current audio codecs.</p> | |
111 | |
112 <p>A particular Vorbis bitstream may apply channel coupling directly to | |
113 more than a pair of channels; polar mapping is hierarchical such that | |
114 polar coupling may be extrapolated to an arbitrary number of channels | |
115 and is not restricted to only stereo, quadraphonics, ambisonics or 5.1 | |
116 surround. However, the scope of this document restricts itself to the | |
117 stereo coupling case.</p> | |
118 | |
119 <a name="sqpm"></a> | |
120 <h3>Square Polar Mapping</h3> | |
121 | |
122 <h4>maximal correlation</h4> | |
123 | |
124 <p>Recall that the basic structure of a a Vorbis I stream first generates | |
125 from input audio a spectral 'floor' function that serves as an | |
126 MDCT-domain whitening filter. This floor is meant to represent the | |
127 rough envelope of the frequency spectrum, using whatever metric the | |
128 encoder cares to define. This floor is subtracted from the log | |
129 frequency spectrum, effectively normalizing the spectrum by frequency. | |
130 Each input channel is associated with a unique floor function.</p> | |
131 | |
132 <p>The basic idea behind any stereo coupling is that the left and right | |
133 channels usually correlate. This correlation is even stronger if one | |
134 first accounts for energy differences in any given frequency band | |
135 across left and right; think for example of individual instruments | |
136 mixed into different portions of the stereo image, or a stereo | |
137 recording with a dominant feature not perfectly in the center. The | |
138 floor functions, each specific to a channel, provide the perfect means | |
139 of normalizing left and right energies across the spectrum to maximize | |
140 correlation before coupling. This feature of the Vorbis format is not | |
141 a convenient accident.</p> | |
142 | |
143 <p>Because we strive to maximally correlate the left and right channels | |
144 and generally succeed in doing so, left and right residue is typically | |
145 nearly identical. We could use channel interleaving (discussed below) | |
146 alone to efficiently remove the redundancy between the left and right | |
147 channels as a side effect of entropy encoding, but a polar | |
148 representation gives benefits when left/right correlation is | |
149 strong.</p> | |
150 | |
151 <h4>point and diffuse imaging</h4> | |
152 | |
153 <p>The first advantage of a polar representation is that it effectively | |
154 separates the spatial audio information into a 'point image' | |
155 (magnitude) at a given frequency and located somewhere in the sound | |
156 field, and a 'diffuse image' (angle) that fills a large amount of | |
157 space simultaneously. Even if we preserve only the magnitude (point) | |
158 data, a detailed and carefully chosen floor function in each channel | |
159 provides us with a free, fine-grained, frequency relative intensity | |
160 stereo*. Angle information represents diffuse sound fields, such as | |
161 reverberation that fills the entire space simultaneously.</p> | |
162 | |
163 <p>*<em>Because the Vorbis model supports a number of different possible | |
164 stereo models and these models may be mixed, we do not use the term | |
165 'intensity stereo' talking about Vorbis; instead we use the terms | |
166 'point stereo', 'phase stereo' and subcategories of each.</em></p> | |
167 | |
168 <p>The majority of a stereo image is representable by polar magnitude | |
169 alone, as strong sounds tend to be produced at near-point sources; | |
170 even non-diffuse, fast, sharp echoes track very accurately using | |
171 magnitude representation almost alone (for those experimenting with | |
172 Vorbis tuning, this strategy works much better with the precise, | |
173 piecewise control of floor 1; the continuous approximation of floor 0 | |
174 results in unstable imaging). Reverberation and diffuse sounds tend | |
175 to contain less energy and be psychoacoustically dominated by the | |
176 point sources embedded in them. Thus, we again tend to concentrate | |
177 more represented energy into a predictably smaller number of numbers. | |
178 Separating representation of point and diffuse imaging also allows us | |
179 to model and manipulate point and diffuse qualities separately.</p> | |
180 | |
181 <h4>controlling bit leakage and symbol crosstalk</h4> | |
182 | |
183 <p>Because polar | |
184 representation concentrates represented energy into fewer large | |
185 values, we reduce bit 'leakage' during cascading (multistage VQ | |
186 encoding) as a secondary benefit. A single large, monolithic VQ | |
187 codebook is more efficient than a cascaded book due to entropy | |
188 'crosstalk' among symbols between different stages of a multistage cascade. | |
189 Polar representation is a way of further concentrating entropy into | |
190 predictable locations so that codebook design can take steps to | |
191 improve multistage codebook efficiency. It also allows us to cascade | |
192 various elements of the stereo image independently.</p> | |
193 | |
194 <h4>eliminating trigonometry and rounding</h4> | |
195 | |
196 <p>Rounding and computational complexity are potential problems with a | |
197 polar representation. As our encoding process involves quantization, | |
198 mixing a polar representation and quantization makes it potentially | |
199 impossible, depending on implementation, to construct a coupled stereo | |
200 mechanism that results in bit-identical decompressed output compared | |
201 to an uncoupled encoding should the encoder desire it.</p> | |
202 | |
203 <p>Vorbis uses a mapping that preserves the most useful qualities of | |
204 polar representation, relies only on addition/subtraction (during | |
205 decode; high quality encoding still requires some trig), and makes it | |
206 trivial before or after quantization to represent an angle/magnitude | |
207 through a one-to-one mapping from possible left/right value | |
208 permutations. We do this by basing our polar representation on the | |
209 unit square rather than the unit-circle.</p> | |
210 | |
211 <p>Given a magnitude and angle, we recover left and right using the | |
212 following function (note that A/B may be left/right or right/left | |
213 depending on the coupling definition used by the encoder):</p> | |
214 | |
215 <pre> | |
216 if(magnitude>0) | |
217 if(angle>0){ | |
218 A=magnitude; | |
219 B=magnitude-angle; | |
220 }else{ | |
221 B=magnitude; | |
222 A=magnitude+angle; | |
223 } | |
224 else | |
225 if(angle>0){ | |
226 A=magnitude; | |
227 B=magnitude+angle; | |
228 }else{ | |
229 B=magnitude; | |
230 A=magnitude-angle; | |
231 } | |
232 } | |
233 </pre> | |
234 | |
235 <p>The function is antisymmetric for positive and negative magnitudes in | |
236 order to eliminate a redundant value when quantizing. For example, if | |
237 we're quantizing to integer values, we can visualize a magnitude of 5 | |
238 and an angle of -2 as follows:</p> | |
239 | |
240 <p><img src="squarepolar.png" alt="square polar"/></p> | |
241 | |
242 <p>This representation loses or replicates no values; if the range of A | |
243 and B are integral -5 through 5, the number of possible Cartesian | |
244 permutations is 121. Represented in square polar notation, the | |
245 possible values are:</p> | |
246 | |
247 <pre> | |
248 0, 0 | |
249 | |
250 -1,-2 -1,-1 -1, 0 -1, 1 | |
251 | |
252 1,-2 1,-1 1, 0 1, 1 | |
253 | |
254 -2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3 | |
255 | |
256 2,-4 2,-3 ... following the pattern ... | |
257 | |
258 ... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9 | |
259 | |
260 </pre> | |
261 | |
262 <p>...for a grand total of 121 possible values, the same number as in | |
263 Cartesian representation (note that, for example, <tt>5,-10</tt> is | |
264 the same as <tt>-5,10</tt>, so there's no reason to represent | |
265 both. 2,10 cannot happen, and there's no reason to account for it.) | |
266 It's also obvious that this mapping is exactly reversible.</p> | |
267 | |
268 <h3>Channel interleaving</h3> | |
269 | |
270 <p>We can remap and A/B vector using polar mapping into a magnitude/angle | |
271 vector, and it's clear that, in general, this concentrates energy in | |
272 the magnitude vector and reduces the amount of information to encode | |
273 in the angle vector. Encoding these vectors independently with | |
274 residue backend #0 or residue backend #1 will result in bitrate | |
275 savings. However, there are still implicit correlations between the | |
276 magnitude and angle vectors. The most obvious is that the amplitude | |
277 of the angle is bounded by its corresponding magnitude value.</p> | |
278 | |
279 <p>Entropy coding the results, then, further benefits from the entropy | |
280 model being able to compress magnitude and angle simultaneously. For | |
281 this reason, Vorbis implements residue backend #2 which pre-interleaves | |
282 a number of input vectors (in the stereo case, two, A and B) into a | |
283 single output vector (with the elements in the order of | |
284 A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus | |
285 each vector to be coded by the vector quantization backend consists of | |
286 matching magnitude and angle values.</p> | |
287 | |
288 <p>The astute reader, at this point, will notice that in the theoretical | |
289 case in which we can use monolithic codebooks of arbitrarily large | |
290 size, we can directly interleave and encode left and right without | |
291 polar mapping; in fact, the polar mapping does not appear to lend any | |
292 benefit whatsoever to the efficiency of the entropy coding. In fact, | |
293 it is perfectly possible and reasonable to build a Vorbis encoder that | |
294 dispenses with polar mapping entirely and merely interleaves the | |
295 channel. Libvorbis based encoders may configure such an encoding and | |
296 it will work as intended.</p> | |
297 | |
298 <p>However, when we leave the ideal/theoretical domain, we notice that | |
299 polar mapping does give additional practical benefits, as discussed in | |
300 the above section on polar mapping and summarized again here:</p> | |
301 | |
302 <ul> | |
303 <li>Polar mapping aids in controlling entropy 'leakage' between stages | |
304 of a cascaded codebook.</li> | |
305 <li>Polar mapping separates the stereo image | |
306 into point and diffuse components which may be analyzed and handled | |
307 differently.</li> | |
308 </ul> | |
309 | |
310 <h2>Stereo Models</h2> | |
311 | |
312 <h3>Dual Stereo</h3> | |
313 | |
314 <p>Dual stereo refers to stereo encoding where the channels are entirely | |
315 separate; they are analyzed and encoded as entirely distinct entities. | |
316 This terminology is familiar from mp3.</p> | |
317 | |
318 <h3>Lossless Stereo</h3> | |
319 | |
320 <p>Using polar mapping and/or channel interleaving, it's possible to | |
321 couple Vorbis channels losslessly, that is, construct a stereo | |
322 coupling encoding that both saves space but also decodes | |
323 bit-identically to dual stereo. OggEnc 1.0 and later uses this | |
324 mode in all high-bitrate encoding.</p> | |
325 | |
326 <p>Overall, this stereo mode is overkill; however, it offers a safe | |
327 alternative to users concerned about the slightest possible | |
328 degradation to the stereo image or archival quality audio.</p> | |
329 | |
330 <h3>Phase Stereo</h3> | |
331 | |
332 <p>Phase stereo is the least aggressive means of gracefully dropping | |
333 resolution from the stereo image; it affects only diffuse imaging.</p> | |
334 | |
335 <p>It's often quoted that the human ear is deaf to signal phase above | |
336 about 4kHz; this is nearly true and a passable rule of thumb, but it | |
337 can be demonstrated that even an average user can tell the difference | |
338 between high frequency in-phase and out-of-phase noise. Obviously | |
339 then, the statement is not entirely true. However, it's also the case | |
340 that one must resort to nearly such an extreme demonstration before | |
341 finding the counterexample.</p> | |
342 | |
343 <p>'Phase stereo' is simply a more aggressive quantization of the polar | |
344 angle vector; above 4kHz it's generally quite safe to quantize noise | |
345 and noisy elements to only a handful of allowed phases, or to thin the | |
346 phase with respect to the magnitude. The phases of high amplitude | |
347 pure tones may or may not be preserved more carefully (they are | |
348 relatively rare and L/R tend to be in phase, so there is generally | |
349 little reason not to spend a few more bits on them)</p> | |
350 | |
351 <h4>example: eight phase stereo</h4> | |
352 | |
353 <p>Vorbis may implement phase stereo coupling by preserving the entirety | |
354 of the magnitude vector (essential to fine amplitude and energy | |
355 resolution overall) and quantizing the angle vector to one of only | |
356 four possible values. Given that the magnitude vector may be positive | |
357 or negative, this results in left and right phase having eight | |
358 possible permutation, thus 'eight phase stereo':</p> | |
359 | |
360 <p><img src="eightphase.png" alt="eight phase"/></p> | |
361 | |
362 <p>Left and right may be in phase (positive or negative), the most common | |
363 case by far, or out of phase by 90 or 180 degrees.</p> | |
364 | |
365 <h4>example: four phase stereo</h4> | |
366 | |
367 <p>Similarly, four phase stereo takes the quantization one step further; | |
368 it allows only in-phase and 180 degree out-out-phase signals:</p> | |
369 | |
370 <p><img src="fourphase.png" alt="four phase"/></p> | |
371 | |
372 <h3>example: point stereo</h3> | |
373 | |
374 <p>Point stereo eliminates the possibility of out-of-phase signal | |
375 entirely. Any diffuse quality to a sound source tends to collapse | |
376 inward to a point somewhere within the stereo image. A practical | |
377 example would be balanced reverberations within a large, live space; | |
378 normally the sound is diffuse and soft, giving a sonic impression of | |
379 volume. In point-stereo, the reverberations would still exist, but | |
380 sound fairly firmly centered within the image (assuming the | |
381 reverberation was centered overall; if the reverberation is stronger | |
382 to the left, then the point of localization in point stereo would be | |
383 to the left). This effect is most noticeable at low and mid | |
384 frequencies and using headphones (which grant perfect stereo | |
385 separation). Point stereo is is a graceful but generally easy to | |
386 detect degradation to the sound quality and is thus used in frequency | |
387 ranges where it is least noticeable.</p> | |
388 | |
389 <h3>Mixed Stereo</h3> | |
390 | |
391 <p>Mixed stereo is the simultaneous use of more than one of the above | |
392 stereo encoding models, generally using more aggressive modes in | |
393 higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p> | |
394 | |
395 <p>It is also the case that near-DC frequencies should be encoded using | |
396 lossless coupling to avoid frame blocking artifacts.</p> | |
397 | |
398 <h3>Vorbis Stereo Modes</h3> | |
399 | |
400 <p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes | |
401 constructed out of lossless and point stereo. Phase stereo was used | |
402 in the rc2 encoder, but is not currently used for simplicity's sake. It | |
403 will likely be re-added to the stereo model in the future.</p> | |
404 | |
405 <div id="copyright"> | |
406 The Xiph Fish Logo is a | |
407 trademark (™) of Xiph.Org.<br/> | |
408 | |
409 These pages © 1994 - 2005 Xiph.Org. All rights reserved. | |
410 </div> | |
411 | |
412 </body> | |
413 </html> | |
414 | |
415 | |
416 | |
417 | |
418 | |
419 |