sv-dependency-builds: src/libvorbis-1.3.3/doc/stereo.html annotate

annotate src/libvorbis-1.3.3/doc/stereo.html @ 148:b4bfdf10c4b3

Update Win64 capnp builds to v0.6

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Mon, 22 May 2017 18:56:49 +0100
parents	98c1576536ae
children

rev	line source
cannam@86	1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
cannam@86	2 <html>
cannam@86	3 <head>
cannam@86	4
cannam@86	5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
cannam@86	6 <title>Ogg Vorbis Documentation</title>
cannam@86	7
cannam@86	8 <style type="text/css">
cannam@86	9 body {
cannam@86	10 margin: 0 18px 0 18px;
cannam@86	11 padding-bottom: 30px;
cannam@86	12 font-family: Verdana, Arial, Helvetica, sans-serif;
cannam@86	13 color: #333333;
cannam@86	14 font-size: .8em;
cannam@86	15 }
cannam@86	16
cannam@86	17 a {
cannam@86	18 color: #3366cc;
cannam@86	19 }
cannam@86	20
cannam@86	21 img {
cannam@86	22 border: 0;
cannam@86	23 }
cannam@86	24
cannam@86	25 #xiphlogo {
cannam@86	26 margin: 30px 0 16px 0;
cannam@86	27 }
cannam@86	28
cannam@86	29 #content p {
cannam@86	30 line-height: 1.4;
cannam@86	31 }
cannam@86	32
cannam@86	33 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
cannam@86	34 font-weight: bold;
cannam@86	35 color: #ff9900;
cannam@86	36 margin: 1.3em 0 8px 0;
cannam@86	37 }
cannam@86	38
cannam@86	39 h1 {
cannam@86	40 font-size: 1.3em;
cannam@86	41 }
cannam@86	42
cannam@86	43 h2 {
cannam@86	44 font-size: 1.2em;
cannam@86	45 }
cannam@86	46
cannam@86	47 h3 {
cannam@86	48 font-size: 1.1em;
cannam@86	49 }
cannam@86	50
cannam@86	51 li {
cannam@86	52 line-height: 1.4;
cannam@86	53 }
cannam@86	54
cannam@86	55 #copyright {
cannam@86	56 margin-top: 30px;
cannam@86	57 line-height: 1.5em;
cannam@86	58 text-align: center;
cannam@86	59 font-size: .8em;
cannam@86	60 color: #888888;
cannam@86	61 clear: both;
cannam@86	62 }
cannam@86	63 </style>
cannam@86	64
cannam@86	65 </head>
cannam@86	66
cannam@86	67 <body>
cannam@86	68
cannam@86	69 <div id="xiphlogo">
cannam@86	70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a>
cannam@86	71 </div>
cannam@86	72
cannam@86	73 <h1>Ogg Vorbis stereo-specific channel coupling discussion</h1>
cannam@86	74
cannam@86	75 <h2>Abstract</h2>
cannam@86	76
cannam@86	77 <p>The Vorbis audio CODEC provides a channel coupling
cannam@86	78 mechanisms designed to reduce effective bitrate by both eliminating
cannam@86	79 interchannel redundancy and eliminating stereo image information
cannam@86	80 labeled inaudible or undesirable according to spatial psychoacoustic
cannam@86	81 models. This document describes both the mechanical coupling
cannam@86	82 mechanisms available within the Vorbis specification, as well as the
cannam@86	83 specific stereo coupling models used by the reference
cannam@86	84 <tt>libvorbis</tt> codec provided by xiph.org.</p>
cannam@86	85
cannam@86	86 <h2>Mechanisms</h2>
cannam@86	87
cannam@86	88 <p>In encoder release beta 4 and earlier, Vorbis supported multiple
cannam@86	89 channel encoding, but the channels were encoded entirely separately
cannam@86	90 with no cross-analysis or redundancy elimination between channels.
cannam@86	91 This multichannel strategy is very similar to the mp3's <em>dual
cannam@86	92 stereo</em> mode and Vorbis uses the same name for its analogous
cannam@86	93 uncoupled multichannel modes.</p>
cannam@86	94
cannam@86	95 <p>However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and
cannam@86	96 later implement a coupled channel strategy. Vorbis has two specific
cannam@86	97 mechanisms that may be used alone or in conjunction to implement
cannam@86	98 channel coupling. The first is <em>channel interleaving</em> via
cannam@86	99 residue backend type 2, and the second is <em>square polar
cannam@86	100 mapping</em>. These two general mechanisms are particularly well
cannam@86	101 suited to coupling due to the structure of Vorbis encoding, as we'll
cannam@86	102 explore below, and using both we can implement both totally
cannam@86	103 <em>lossless stereo image coupling</em> [bit-for-bit decode-identical
cannam@86	104 to uncoupled modes], as well as various lossy models that seek to
cannam@86	105 eliminate inaudible or unimportant aspects of the stereo image in
cannam@86	106 order to enhance bitrate. The exact coupling implementation is
cannam@86	107 generalized to allow the encoder a great deal of flexibility in
cannam@86	108 implementation of a stereo or surround model without requiring any
cannam@86	109 significant complexity increase over the combinatorially simpler
cannam@86	110 mid/side joint stereo of mp3 and other current audio codecs.</p>
cannam@86	111
cannam@86	112 <p>A particular Vorbis bitstream may apply channel coupling directly to
cannam@86	113 more than a pair of channels; polar mapping is hierarchical such that
cannam@86	114 polar coupling may be extrapolated to an arbitrary number of channels
cannam@86	115 and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
cannam@86	116 surround. However, the scope of this document restricts itself to the
cannam@86	117 stereo coupling case.</p>
cannam@86	118
cannam@86	119 <a name="sqpm"></a>
cannam@86	120 <h3>Square Polar Mapping</h3>
cannam@86	121
cannam@86	122 <h4>maximal correlation</h4>
cannam@86	123
cannam@86	124 <p>Recall that the basic structure of a a Vorbis I stream first generates
cannam@86	125 from input audio a spectral 'floor' function that serves as an
cannam@86	126 MDCT-domain whitening filter. This floor is meant to represent the
cannam@86	127 rough envelope of the frequency spectrum, using whatever metric the
cannam@86	128 encoder cares to define. This floor is subtracted from the log
cannam@86	129 frequency spectrum, effectively normalizing the spectrum by frequency.
cannam@86	130 Each input channel is associated with a unique floor function.</p>
cannam@86	131
cannam@86	132 <p>The basic idea behind any stereo coupling is that the left and right
cannam@86	133 channels usually correlate. This correlation is even stronger if one
cannam@86	134 first accounts for energy differences in any given frequency band
cannam@86	135 across left and right; think for example of individual instruments
cannam@86	136 mixed into different portions of the stereo image, or a stereo
cannam@86	137 recording with a dominant feature not perfectly in the center. The
cannam@86	138 floor functions, each specific to a channel, provide the perfect means
cannam@86	139 of normalizing left and right energies across the spectrum to maximize
cannam@86	140 correlation before coupling. This feature of the Vorbis format is not
cannam@86	141 a convenient accident.</p>
cannam@86	142
cannam@86	143 <p>Because we strive to maximally correlate the left and right channels
cannam@86	144 and generally succeed in doing so, left and right residue is typically
cannam@86	145 nearly identical. We could use channel interleaving (discussed below)
cannam@86	146 alone to efficiently remove the redundancy between the left and right
cannam@86	147 channels as a side effect of entropy encoding, but a polar
cannam@86	148 representation gives benefits when left/right correlation is
cannam@86	149 strong.</p>
cannam@86	150
cannam@86	151 <h4>point and diffuse imaging</h4>
cannam@86	152
cannam@86	153 <p>The first advantage of a polar representation is that it effectively
cannam@86	154 separates the spatial audio information into a 'point image'
cannam@86	155 (magnitude) at a given frequency and located somewhere in the sound
cannam@86	156 field, and a 'diffuse image' (angle) that fills a large amount of
cannam@86	157 space simultaneously. Even if we preserve only the magnitude (point)
cannam@86	158 data, a detailed and carefully chosen floor function in each channel
cannam@86	159 provides us with a free, fine-grained, frequency relative intensity
cannam@86	160 stereo*. Angle information represents diffuse sound fields, such as
cannam@86	161 reverberation that fills the entire space simultaneously.</p>
cannam@86	162
cannam@86	163 <p>*<em>Because the Vorbis model supports a number of different possible
cannam@86	164 stereo models and these models may be mixed, we do not use the term
cannam@86	165 'intensity stereo' talking about Vorbis; instead we use the terms
cannam@86	166 'point stereo', 'phase stereo' and subcategories of each.</em></p>
cannam@86	167
cannam@86	168 <p>The majority of a stereo image is representable by polar magnitude
cannam@86	169 alone, as strong sounds tend to be produced at near-point sources;
cannam@86	170 even non-diffuse, fast, sharp echoes track very accurately using
cannam@86	171 magnitude representation almost alone (for those experimenting with
cannam@86	172 Vorbis tuning, this strategy works much better with the precise,
cannam@86	173 piecewise control of floor 1; the continuous approximation of floor 0
cannam@86	174 results in unstable imaging). Reverberation and diffuse sounds tend
cannam@86	175 to contain less energy and be psychoacoustically dominated by the
cannam@86	176 point sources embedded in them. Thus, we again tend to concentrate
cannam@86	177 more represented energy into a predictably smaller number of numbers.
cannam@86	178 Separating representation of point and diffuse imaging also allows us
cannam@86	179 to model and manipulate point and diffuse qualities separately.</p>
cannam@86	180
cannam@86	181 <h4>controlling bit leakage and symbol crosstalk</h4>
cannam@86	182
cannam@86	183 <p>Because polar
cannam@86	184 representation concentrates represented energy into fewer large
cannam@86	185 values, we reduce bit 'leakage' during cascading (multistage VQ
cannam@86	186 encoding) as a secondary benefit. A single large, monolithic VQ
cannam@86	187 codebook is more efficient than a cascaded book due to entropy
cannam@86	188 'crosstalk' among symbols between different stages of a multistage cascade.
cannam@86	189 Polar representation is a way of further concentrating entropy into
cannam@86	190 predictable locations so that codebook design can take steps to
cannam@86	191 improve multistage codebook efficiency. It also allows us to cascade
cannam@86	192 various elements of the stereo image independently.</p>
cannam@86	193
cannam@86	194 <h4>eliminating trigonometry and rounding</h4>
cannam@86	195
cannam@86	196 <p>Rounding and computational complexity are potential problems with a
cannam@86	197 polar representation. As our encoding process involves quantization,
cannam@86	198 mixing a polar representation and quantization makes it potentially
cannam@86	199 impossible, depending on implementation, to construct a coupled stereo
cannam@86	200 mechanism that results in bit-identical decompressed output compared
cannam@86	201 to an uncoupled encoding should the encoder desire it.</p>
cannam@86	202
cannam@86	203 <p>Vorbis uses a mapping that preserves the most useful qualities of
cannam@86	204 polar representation, relies only on addition/subtraction (during
cannam@86	205 decode; high quality encoding still requires some trig), and makes it
cannam@86	206 trivial before or after quantization to represent an angle/magnitude
cannam@86	207 through a one-to-one mapping from possible left/right value
cannam@86	208 permutations. We do this by basing our polar representation on the
cannam@86	209 unit square rather than the unit-circle.</p>
cannam@86	210
cannam@86	211 <p>Given a magnitude and angle, we recover left and right using the
cannam@86	212 following function (note that A/B may be left/right or right/left
cannam@86	213 depending on the coupling definition used by the encoder):</p>
cannam@86	214
cannam@86	215 <pre>
cannam@86	216 if(magnitude>0)
cannam@86	217 if(angle>0){
cannam@86	218 A=magnitude;
cannam@86	219 B=magnitude-angle;
cannam@86	220 }else{
cannam@86	221 B=magnitude;
cannam@86	222 A=magnitude+angle;
cannam@86	223 }
cannam@86	224 else
cannam@86	225 if(angle>0){
cannam@86	226 A=magnitude;
cannam@86	227 B=magnitude+angle;
cannam@86	228 }else{
cannam@86	229 B=magnitude;
cannam@86	230 A=magnitude-angle;
cannam@86	231 }
cannam@86	232 }
cannam@86	233 </pre>
cannam@86	234
cannam@86	235 <p>The function is antisymmetric for positive and negative magnitudes in
cannam@86	236 order to eliminate a redundant value when quantizing. For example, if
cannam@86	237 we're quantizing to integer values, we can visualize a magnitude of 5
cannam@86	238 and an angle of -2 as follows:</p>
cannam@86	239
cannam@86	240 <p><img src="squarepolar.png" alt="square polar"/></p>
cannam@86	241
cannam@86	242 <p>This representation loses or replicates no values; if the range of A
cannam@86	243 and B are integral -5 through 5, the number of possible Cartesian
cannam@86	244 permutations is 121. Represented in square polar notation, the
cannam@86	245 possible values are:</p>
cannam@86	246
cannam@86	247 <pre>
cannam@86	248 0, 0
cannam@86	249
cannam@86	250 -1,-2 -1,-1 -1, 0 -1, 1
cannam@86	251
cannam@86	252 1,-2 1,-1 1, 0 1, 1
cannam@86	253
cannam@86	254 -2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3
cannam@86	255
cannam@86	256 2,-4 2,-3 ... following the pattern ...
cannam@86	257
cannam@86	258 ... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9
cannam@86	259
cannam@86	260 </pre>
cannam@86	261
cannam@86	262 <p>...for a grand total of 121 possible values, the same number as in
cannam@86	263 Cartesian representation (note that, for example, <tt>5,-10</tt> is
cannam@86	264 the same as <tt>-5,10</tt>, so there's no reason to represent
cannam@86	265 both. 2,10 cannot happen, and there's no reason to account for it.)
cannam@86	266 It's also obvious that this mapping is exactly reversible.</p>
cannam@86	267
cannam@86	268 <h3>Channel interleaving</h3>
cannam@86	269
cannam@86	270 <p>We can remap and A/B vector using polar mapping into a magnitude/angle
cannam@86	271 vector, and it's clear that, in general, this concentrates energy in
cannam@86	272 the magnitude vector and reduces the amount of information to encode
cannam@86	273 in the angle vector. Encoding these vectors independently with
cannam@86	274 residue backend #0 or residue backend #1 will result in bitrate
cannam@86	275 savings. However, there are still implicit correlations between the
cannam@86	276 magnitude and angle vectors. The most obvious is that the amplitude
cannam@86	277 of the angle is bounded by its corresponding magnitude value.</p>
cannam@86	278
cannam@86	279 <p>Entropy coding the results, then, further benefits from the entropy
cannam@86	280 model being able to compress magnitude and angle simultaneously. For
cannam@86	281 this reason, Vorbis implements residue backend #2 which pre-interleaves
cannam@86	282 a number of input vectors (in the stereo case, two, A and B) into a
cannam@86	283 single output vector (with the elements in the order of
cannam@86	284 A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
cannam@86	285 each vector to be coded by the vector quantization backend consists of
cannam@86	286 matching magnitude and angle values.</p>
cannam@86	287
cannam@86	288 <p>The astute reader, at this point, will notice that in the theoretical
cannam@86	289 case in which we can use monolithic codebooks of arbitrarily large
cannam@86	290 size, we can directly interleave and encode left and right without
cannam@86	291 polar mapping; in fact, the polar mapping does not appear to lend any
cannam@86	292 benefit whatsoever to the efficiency of the entropy coding. In fact,
cannam@86	293 it is perfectly possible and reasonable to build a Vorbis encoder that
cannam@86	294 dispenses with polar mapping entirely and merely interleaves the
cannam@86	295 channel. Libvorbis based encoders may configure such an encoding and
cannam@86	296 it will work as intended.</p>
cannam@86	297
cannam@86	298 <p>However, when we leave the ideal/theoretical domain, we notice that
cannam@86	299 polar mapping does give additional practical benefits, as discussed in
cannam@86	300 the above section on polar mapping and summarized again here:</p>
cannam@86	301
cannam@86	302 <ul>
cannam@86	303 <li>Polar mapping aids in controlling entropy 'leakage' between stages
cannam@86	304 of a cascaded codebook.</li>
cannam@86	305 <li>Polar mapping separates the stereo image
cannam@86	306 into point and diffuse components which may be analyzed and handled
cannam@86	307 differently.</li>
cannam@86	308 </ul>
cannam@86	309
cannam@86	310 <h2>Stereo Models</h2>
cannam@86	311
cannam@86	312 <h3>Dual Stereo</h3>
cannam@86	313
cannam@86	314 <p>Dual stereo refers to stereo encoding where the channels are entirely
cannam@86	315 separate; they are analyzed and encoded as entirely distinct entities.
cannam@86	316 This terminology is familiar from mp3.</p>
cannam@86	317
cannam@86	318 <h3>Lossless Stereo</h3>
cannam@86	319
cannam@86	320 <p>Using polar mapping and/or channel interleaving, it's possible to
cannam@86	321 couple Vorbis channels losslessly, that is, construct a stereo
cannam@86	322 coupling encoding that both saves space but also decodes
cannam@86	323 bit-identically to dual stereo. OggEnc 1.0 and later uses this
cannam@86	324 mode in all high-bitrate encoding.</p>
cannam@86	325
cannam@86	326 <p>Overall, this stereo mode is overkill; however, it offers a safe
cannam@86	327 alternative to users concerned about the slightest possible
cannam@86	328 degradation to the stereo image or archival quality audio.</p>
cannam@86	329
cannam@86	330 <h3>Phase Stereo</h3>
cannam@86	331
cannam@86	332 <p>Phase stereo is the least aggressive means of gracefully dropping
cannam@86	333 resolution from the stereo image; it affects only diffuse imaging.</p>
cannam@86	334
cannam@86	335 <p>It's often quoted that the human ear is deaf to signal phase above
cannam@86	336 about 4kHz; this is nearly true and a passable rule of thumb, but it
cannam@86	337 can be demonstrated that even an average user can tell the difference
cannam@86	338 between high frequency in-phase and out-of-phase noise. Obviously
cannam@86	339 then, the statement is not entirely true. However, it's also the case
cannam@86	340 that one must resort to nearly such an extreme demonstration before
cannam@86	341 finding the counterexample.</p>
cannam@86	342
cannam@86	343 <p>'Phase stereo' is simply a more aggressive quantization of the polar
cannam@86	344 angle vector; above 4kHz it's generally quite safe to quantize noise
cannam@86	345 and noisy elements to only a handful of allowed phases, or to thin the
cannam@86	346 phase with respect to the magnitude. The phases of high amplitude
cannam@86	347 pure tones may or may not be preserved more carefully (they are
cannam@86	348 relatively rare and L/R tend to be in phase, so there is generally
cannam@86	349 little reason not to spend a few more bits on them)</p>
cannam@86	350
cannam@86	351 <h4>example: eight phase stereo</h4>
cannam@86	352
cannam@86	353 <p>Vorbis may implement phase stereo coupling by preserving the entirety
cannam@86	354 of the magnitude vector (essential to fine amplitude and energy
cannam@86	355 resolution overall) and quantizing the angle vector to one of only
cannam@86	356 four possible values. Given that the magnitude vector may be positive
cannam@86	357 or negative, this results in left and right phase having eight
cannam@86	358 possible permutation, thus 'eight phase stereo':</p>
cannam@86	359
cannam@86	360 <p><img src="eightphase.png" alt="eight phase"/></p>
cannam@86	361
cannam@86	362 <p>Left and right may be in phase (positive or negative), the most common
cannam@86	363 case by far, or out of phase by 90 or 180 degrees.</p>
cannam@86	364
cannam@86	365 <h4>example: four phase stereo</h4>
cannam@86	366
cannam@86	367 <p>Similarly, four phase stereo takes the quantization one step further;
cannam@86	368 it allows only in-phase and 180 degree out-out-phase signals:</p>
cannam@86	369
cannam@86	370 <p><img src="fourphase.png" alt="four phase"/></p>
cannam@86	371
cannam@86	372 <h3>example: point stereo</h3>
cannam@86	373
cannam@86	374 <p>Point stereo eliminates the possibility of out-of-phase signal
cannam@86	375 entirely. Any diffuse quality to a sound source tends to collapse
cannam@86	376 inward to a point somewhere within the stereo image. A practical
cannam@86	377 example would be balanced reverberations within a large, live space;
cannam@86	378 normally the sound is diffuse and soft, giving a sonic impression of
cannam@86	379 volume. In point-stereo, the reverberations would still exist, but
cannam@86	380 sound fairly firmly centered within the image (assuming the
cannam@86	381 reverberation was centered overall; if the reverberation is stronger
cannam@86	382 to the left, then the point of localization in point stereo would be
cannam@86	383 to the left). This effect is most noticeable at low and mid
cannam@86	384 frequencies and using headphones (which grant perfect stereo
cannam@86	385 separation). Point stereo is is a graceful but generally easy to
cannam@86	386 detect degradation to the sound quality and is thus used in frequency
cannam@86	387 ranges where it is least noticeable.</p>
cannam@86	388
cannam@86	389 <h3>Mixed Stereo</h3>
cannam@86	390
cannam@86	391 <p>Mixed stereo is the simultaneous use of more than one of the above
cannam@86	392 stereo encoding models, generally using more aggressive modes in
cannam@86	393 higher frequencies, lower amplitudes or 'nearly' in-phase sound.</p>
cannam@86	394
cannam@86	395 <p>It is also the case that near-DC frequencies should be encoded using
cannam@86	396 lossless coupling to avoid frame blocking artifacts.</p>
cannam@86	397
cannam@86	398 <h3>Vorbis Stereo Modes</h3>
cannam@86	399
cannam@86	400 <p>Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes
cannam@86	401 constructed out of lossless and point stereo. Phase stereo was used
cannam@86	402 in the rc2 encoder, but is not currently used for simplicity's sake. It
cannam@86	403 will likely be re-added to the stereo model in the future.</p>
cannam@86	404
cannam@86	405 <div id="copyright">
cannam@86	406 The Xiph Fish Logo is a
cannam@86	407 trademark (™) of Xiph.Org.<br/>
cannam@86	408
cannam@86	409 These pages © 1994 - 2005 Xiph.Org. All rights reserved.
cannam@86	410 </div>
cannam@86	411
cannam@86	412 </body>
cannam@86	413 </html>
cannam@86	414
cannam@86	415
cannam@86	416
cannam@86	417
cannam@86	418
cannam@86	419

Mercurial > hg > sv-dependency-builds

annotate src/libvorbis-1.3.3/doc/stereo.html @ 148:b4bfdf10c4b3