sv-dependency-builds: src/libvorbis-1.3.3/doc/01-introduction.tex annotate

annotate src/libvorbis-1.3.3/doc/01-introduction.tex @ 94:d278df1123f9

Add liblo

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Wed, 20 Mar 2013 15:25:02 +0000
parents	98c1576536ae
children

rev	line source
cannam@86	1 % -- mode: latex; TeX-master: "Vorbis_I_spec"; --
cannam@86	2 %!TEX root = Vorbis_I_spec.tex
cannam@86	3 % $Id$
cannam@86	4 \section{Introduction and Description} \label{vorbis:spec:intro}
cannam@86	5
cannam@86	6 \subsection{Overview}
cannam@86	7
cannam@86	8 This document provides a high level description of the Vorbis codec's
cannam@86	9 construction. A bit-by-bit specification appears beginning in
cannam@86	10 \xref{vorbis:spec:codec}.
cannam@86	11 The later sections assume a high-level
cannam@86	12 understanding of the Vorbis decode process, which is
cannam@86	13 provided here.
cannam@86	14
cannam@86	15 \subsubsection{Application}
cannam@86	16 Vorbis is a general purpose perceptual audio CODEC intended to allow
cannam@86	17 maximum encoder flexibility, thus allowing it to scale competitively
cannam@86	18 over an exceptionally wide range of bitrates. At the high
cannam@86	19 quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)
cannam@86	20 it is in the same league as MPEG-2 and MPC. Similarly, the 1.0
cannam@86	21 encoder can encode high-quality CD and DAT rate stereo at below 48kbps
cannam@86	22 without resampling to a lower rate. Vorbis is also intended for
cannam@86	23 lower and higher sample rates (from 8kHz telephony to 192kHz digital
cannam@86	24 masters) and a range of channel representations (monaural,
cannam@86	25 polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255
cannam@86	26 discrete channels).
cannam@86	27
cannam@86	28
cannam@86	29 \subsubsection{Classification}
cannam@86	30 Vorbis I is a forward-adaptive monolithic transform CODEC based on the
cannam@86	31 Modified Discrete Cosine Transform. The codec is structured to allow
cannam@86	32 addition of a hybrid wavelet filterbank in Vorbis II to offer better
cannam@86	33 transient response and reproduction using a transform better suited to
cannam@86	34 localized time events.
cannam@86	35
cannam@86	36
cannam@86	37 \subsubsection{Assumptions}
cannam@86	38
cannam@86	39 The Vorbis CODEC design assumes a complex, psychoacoustically-aware
cannam@86	40 encoder and simple, low-complexity decoder. Vorbis decode is
cannam@86	41 computationally simpler than mp3, although it does require more
cannam@86	42 working memory as Vorbis has no static probability model; the vector
cannam@86	43 codebooks used in the first stage of decoding from the bitstream are
cannam@86	44 packed in their entirety into the Vorbis bitstream headers. In
cannam@86	45 packed form, these codebooks occupy only a few kilobytes; the extent
cannam@86	46 to which they are pre-decoded into a cache is the dominant factor in
cannam@86	47 decoder memory usage.
cannam@86	48
cannam@86	49
cannam@86	50 Vorbis provides none of its own framing, synchronization or protection
cannam@86	51 against errors; it is solely a method of accepting input audio,
cannam@86	52 dividing it into individual frames and compressing these frames into
cannam@86	53 raw, unformatted 'packets'. The decoder then accepts these raw
cannam@86	54 packets in sequence, decodes them, synthesizes audio frames from
cannam@86	55 them, and reassembles the frames into a facsimile of the original
cannam@86	56 audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no
cannam@86	57 minimum size, maximum size, or fixed/expected size. Packets
cannam@86	58 are designed that they may be truncated (or padded) and remain
cannam@86	59 decodable; this is not to be considered an error condition and is used
cannam@86	60 extensively in bitrate management in peeling. Both the transport
cannam@86	61 mechanism and decoder must allow that a packet may be any size, or
cannam@86	62 end before or after packet decode expects.
cannam@86	63
cannam@86	64 Vorbis packets are thus intended to be used with a transport mechanism
cannam@86	65 that provides free-form framing, sync, positioning and error correction
cannam@86	66 in accordance with these design assumptions, such as Ogg (for file
cannam@86	67 transport) or RTP (for network multicast). For purposes of a few
cannam@86	68 examples in this document, we will assume that Vorbis is to be
cannam@86	69 embedded in an Ogg stream specifically, although this is by no means a
cannam@86	70 requirement or fundamental assumption in the Vorbis design.
cannam@86	71
cannam@86	72 The specification for embedding Vorbis into
cannam@86	73 an Ogg transport stream is in \xref{vorbis:over:ogg}.
cannam@86	74
cannam@86	75
cannam@86	76
cannam@86	77 \subsubsection{Codec Setup and Probability Model}
cannam@86	78
cannam@86	79 Vorbis' heritage is as a research CODEC and its current design
cannam@86	80 reflects a desire to allow multiple decades of continuous encoder
cannam@86	81 improvement before running out of room within the codec specification.
cannam@86	82 For these reasons, configurable aspects of codec setup intentionally
cannam@86	83 lean toward the extreme of forward adaptive.
cannam@86	84
cannam@86	85 The single most controversial design decision in Vorbis (and the most
cannam@86	86 unusual for a Vorbis developer to keep in mind) is that the entire
cannam@86	87 probability model of the codec, the Huffman and VQ codebooks, is
cannam@86	88 packed into the bitstream header along with extensive CODEC setup
cannam@86	89 parameters (often several hundred fields). This makes it impossible,
cannam@86	90 as it would be with MPEG audio layers, to embed a simple frame type
cannam@86	91 flag in each audio packet, or begin decode at any frame in the stream
cannam@86	92 without having previously fetched the codec setup header.
cannam@86	93
cannam@86	94
cannam@86	95 \begin{note}
cannam@86	96 Vorbis \emph{can} initiate decode at any arbitrary packet within a
cannam@86	97 bitstream so long as the codec has been initialized/setup with the
cannam@86	98 setup headers.
cannam@86	99 \end{note}
cannam@86	100
cannam@86	101 Thus, Vorbis headers are both required for decode to begin and
cannam@86	102 relatively large as bitstream headers go. The header size is
cannam@86	103 unbounded, although for streaming a rule-of-thumb of 4kB or less is
cannam@86	104 recommended (and Xiph.Org's Vorbis encoder follows this suggestion).
cannam@86	105
cannam@86	106 Our own design work indicates the primary liability of the
cannam@86	107 required header is in mindshare; it is an unusual design and thus
cannam@86	108 causes some amount of complaint among engineers as this runs against
cannam@86	109 current design trends (and also points out limitations in some
cannam@86	110 existing software/interface designs, such as Windows' ACM codec
cannam@86	111 framework). However, we find that it does not fundamentally limit
cannam@86	112 Vorbis' suitable application space.
cannam@86	113
cannam@86	114
cannam@86	115 \subsubsection{Format Specification}
cannam@86	116 The Vorbis format is well-defined by its decode specification; any
cannam@86	117 encoder that produces packets that are correctly decoded by the
cannam@86	118 reference Vorbis decoder described below may be considered a proper
cannam@86	119 Vorbis encoder. A decoder must faithfully and completely implement
cannam@86	120 the specification defined below (except where noted) to be considered
cannam@86	121 a proper Vorbis decoder.
cannam@86	122
cannam@86	123 \subsubsection{Hardware Profile}
cannam@86	124 Although Vorbis decode is computationally simple, it may still run
cannam@86	125 into specific limitations of an embedded design. For this reason,
cannam@86	126 embedded designs are allowed to deviate in limited ways from the
cannam@86	127 `full' decode specification yet still be certified compliant. These
cannam@86	128 optional omissions are labelled in the spec where relevant.
cannam@86	129
cannam@86	130
cannam@86	131 \subsection{Decoder Configuration}
cannam@86	132
cannam@86	133 Decoder setup consists of configuration of multiple, self-contained
cannam@86	134 component abstractions that perform specific functions in the decode
cannam@86	135 pipeline. Each different component instance of a specific type is
cannam@86	136 semantically interchangeable; decoder configuration consists both of
cannam@86	137 internal component configuration, as well as arrangement of specific
cannam@86	138 instances into a decode pipeline. Componentry arrangement is roughly
cannam@86	139 as follows:
cannam@86	140
cannam@86	141 \begin{center}
cannam@86	142 \includegraphics[width=\textwidth]{components}
cannam@86	143 \captionof{figure}{decoder pipeline configuration}
cannam@86	144 \end{center}
cannam@86	145
cannam@86	146 \subsubsection{Global Config}
cannam@86	147 Global codec configuration consists of a few audio related fields
cannam@86	148 (sample rate, channels), Vorbis version (always '0' in Vorbis I),
cannam@86	149 bitrate hints, and the lists of component instances. All other
cannam@86	150 configuration is in the context of specific components.
cannam@86	151
cannam@86	152 \subsubsection{Mode}
cannam@86	153
cannam@86	154 Each Vorbis frame is coded according to a master 'mode'. A bitstream
cannam@86	155 may use one or many modes.
cannam@86	156
cannam@86	157 The mode mechanism is used to encode a frame according to one of
cannam@86	158 multiple possible methods with the intention of choosing a method best
cannam@86	159 suited to that frame. Different modes are, e.g. how frame size
cannam@86	160 is changed from frame to frame. The mode number of a frame serves as a
cannam@86	161 top level configuration switch for all other specific aspects of frame
cannam@86	162 decode.
cannam@86	163
cannam@86	164 A 'mode' configuration consists of a frame size setting, window type
cannam@86	165 (always 0, the Vorbis window, in Vorbis I), transform type (always
cannam@86	166 type 0, the MDCT, in Vorbis I) and a mapping number. The mapping
cannam@86	167 number specifies which mapping configuration instance to use for
cannam@86	168 low-level packet decode and synthesis.
cannam@86	169
cannam@86	170
cannam@86	171 \subsubsection{Mapping}
cannam@86	172
cannam@86	173 A mapping contains a channel coupling description and a list of
cannam@86	174 'submaps' that bundle sets of channel vectors together for grouped
cannam@86	175 encoding and decoding. These submaps are not references to external
cannam@86	176 components; the submap list is internal and specific to a mapping.
cannam@86	177
cannam@86	178 A 'submap' is a configuration/grouping that applies to a subset of
cannam@86	179 floor and residue vectors within a mapping. The submap functions as a
cannam@86	180 last layer of indirection such that specific special floor or residue
cannam@86	181 settings can be applied not only to all the vectors in a given mode,
cannam@86	182 but also specific vectors in a specific mode. Each submap specifies
cannam@86	183 the proper floor and residue instance number to use for decoding that
cannam@86	184 submap's spectral floor and spectral residue vectors.
cannam@86	185
cannam@86	186 As an example:
cannam@86	187
cannam@86	188 Assume a Vorbis stream that contains six channels in the standard 5.1
cannam@86	189 format. The sixth channel, as is normal in 5.1, is bass only.
cannam@86	190 Therefore it would be wasteful to encode a full-spectrum version of it
cannam@86	191 as with the other channels. The submapping mechanism can be used to
cannam@86	192 apply a full range floor and residue encoding to channels 0 through 4,
cannam@86	193 and a bass-only representation to the bass channel, thus saving space.
cannam@86	194 In this example, channels 0-4 belong to submap 0 (which indicates use
cannam@86	195 of a full-range floor) and channel 5 belongs to submap 1, which uses a
cannam@86	196 bass-only representation.
cannam@86	197
cannam@86	198
cannam@86	199 \subsubsection{Floor}
cannam@86	200
cannam@86	201 Vorbis encodes a spectral 'floor' vector for each PCM channel. This
cannam@86	202 vector is a low-resolution representation of the audio spectrum for
cannam@86	203 the given channel in the current frame, generally used akin to a
cannam@86	204 whitening filter. It is named a 'floor' because the Xiph.Org
cannam@86	205 reference encoder has historically used it as a unit-baseline for
cannam@86	206 spectral resolution.
cannam@86	207
cannam@86	208 A floor encoding may be of two types. Floor 0 uses a packed LSP
cannam@86	209 representation on a dB amplitude scale and Bark frequency scale.
cannam@86	210 Floor 1 represents the curve as a piecewise linear interpolated
cannam@86	211 representation on a dB amplitude scale and linear frequency scale.
cannam@86	212 The two floors are semantically interchangeable in
cannam@86	213 encoding/decoding. However, floor type 1 provides more stable
cannam@86	214 inter-frame behavior, and so is the preferred choice in all
cannam@86	215 coupled-stereo and high bitrate modes. Floor 1 is also considerably
cannam@86	216 less expensive to decode than floor 0.
cannam@86	217
cannam@86	218 Floor 0 is not to be considered deprecated, but it is of limited
cannam@86	219 modern use. No known Vorbis encoder past Xiph.Org's own beta 4 makes
cannam@86	220 use of floor 0.
cannam@86	221
cannam@86	222 The values coded/decoded by a floor are both compactly formatted and
cannam@86	223 make use of entropy coding to save space. For this reason, a floor
cannam@86	224 configuration generally refers to multiple codebooks in the codebook
cannam@86	225 component list. Entropy coding is thus provided as an abstraction,
cannam@86	226 and each floor instance may choose from any and all available
cannam@86	227 codebooks when coding/decoding.
cannam@86	228
cannam@86	229
cannam@86	230 \subsubsection{Residue}
cannam@86	231 The spectral residue is the fine structure of the audio spectrum
cannam@86	232 once the floor curve has been subtracted out. In simplest terms, it
cannam@86	233 is coded in the bitstream using cascaded (multi-pass) vector
cannam@86	234 quantization according to one of three specific packing/coding
cannam@86	235 algorithms numbered 0 through 2. The packing algorithm details are
cannam@86	236 configured by residue instance. As with the floor components, the
cannam@86	237 final VQ/entropy encoding is provided by external codebook instances
cannam@86	238 and each residue instance may choose from any and all available
cannam@86	239 codebooks.
cannam@86	240
cannam@86	241 \subsubsection{Codebooks}
cannam@86	242
cannam@86	243 Codebooks are a self-contained abstraction that perform entropy
cannam@86	244 decoding and, optionally, use the entropy-decoded integer value as an
cannam@86	245 offset into an index of output value vectors, returning the indicated
cannam@86	246 vector of values.
cannam@86	247
cannam@86	248 The entropy coding in a Vorbis I codebook is provided by a standard
cannam@86	249 Huffman binary tree representation. This tree is tightly packed using
cannam@86	250 one of several methods, depending on whether codeword lengths are
cannam@86	251 ordered or unordered, or the tree is sparse.
cannam@86	252
cannam@86	253 The codebook vector index is similarly packed according to index
cannam@86	254 characteristic. Most commonly, the vector index is encoded as a
cannam@86	255 single list of values of possible values that are then permuted into
cannam@86	256 a list of n-dimensional rows (lattice VQ).
cannam@86	257
cannam@86	258
cannam@86	259
cannam@86	260 \subsection{High-level Decode Process}
cannam@86	261
cannam@86	262 \subsubsection{Decode Setup}
cannam@86	263
cannam@86	264 Before decoding can begin, a decoder must initialize using the
cannam@86	265 bitstream headers matching the stream to be decoded. Vorbis uses
cannam@86	266 three header packets; all are required, in-order, by this
cannam@86	267 specification. Once set up, decode may begin at any audio packet
cannam@86	268 belonging to the Vorbis stream. In Vorbis I, all packets after the
cannam@86	269 three initial headers are audio packets.
cannam@86	270
cannam@86	271 The header packets are, in order, the identification
cannam@86	272 header, the comments header, and the setup header.
cannam@86	273
cannam@86	274 \paragraph{Identification Header}
cannam@86	275 The identification header identifies the bitstream as Vorbis, Vorbis
cannam@86	276 version, and the simple audio characteristics of the stream such as
cannam@86	277 sample rate and number of channels.
cannam@86	278
cannam@86	279 \paragraph{Comment Header}
cannam@86	280 The comment header includes user text comments (``tags'') and a vendor
cannam@86	281 string for the application/library that produced the bitstream. The
cannam@86	282 encoding and proper use of the comment header is described in \xref{vorbis:spec:comment}.
cannam@86	283
cannam@86	284 \paragraph{Setup Header}
cannam@86	285 The setup header includes extensive CODEC setup information as well as
cannam@86	286 the complete VQ and Huffman codebooks needed for decode.
cannam@86	287
cannam@86	288
cannam@86	289 \subsubsection{Decode Procedure}
cannam@86	290
cannam@86	291 The decoding and synthesis procedure for all audio packets is
cannam@86	292 fundamentally the same.
cannam@86	293 \begin{enumerate}
cannam@86	294 \item decode packet type flag
cannam@86	295 \item decode mode number
cannam@86	296 \item decode window shape (long windows only)
cannam@86	297 \item decode floor
cannam@86	298 \item decode residue into residue vectors
cannam@86	299 \item inverse channel coupling of residue vectors
cannam@86	300 \item generate floor curve from decoded floor data
cannam@86	301 \item compute dot product of floor and residue, producing audio spectrum vector
cannam@86	302 \item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I
cannam@86	303 \item overlap/add left-hand output of transform with right-hand output of previous frame
cannam@86	304 \item store right hand-data from transform of current frame for future lapping
cannam@86	305 \item if not first frame, return results of overlap/add as audio result of current frame
cannam@86	306 \end{enumerate}
cannam@86	307
cannam@86	308 Note that clever rearrangement of the synthesis arithmetic is
cannam@86	309 possible; as an example, one can take advantage of symmetries in the
cannam@86	310 MDCT to store the right-hand transform data of a partial MDCT for a
cannam@86	311 50\% inter-frame buffer space savings, and then complete the transform
cannam@86	312 later before overlap/add with the next frame. This optimization
cannam@86	313 produces entirely equivalent output and is naturally perfectly legal.
cannam@86	314 The decoder must be \emph{entirely mathematically equivalent} to the
cannam@86	315 specification, it need not be a literal semantic implementation.
cannam@86	316
cannam@86	317 \paragraph{Packet type decode}
cannam@86	318
cannam@86	319 Vorbis I uses four packet types. The first three packet types mark each
cannam@86	320 of the three Vorbis headers described above. The fourth packet type
cannam@86	321 marks an audio packet. All other packet types are reserved; packets
cannam@86	322 marked with a reserved type should be ignored.
cannam@86	323
cannam@86	324 Following the three header packets, all packets in a Vorbis I stream
cannam@86	325 are audio. The first step of audio packet decode is to read and
cannam@86	326 verify the packet type; \emph{a non-audio packet when audio is expected
cannam@86	327 indicates stream corruption or a non-compliant stream. The decoder
cannam@86	328 must ignore the packet and not attempt decoding it to
cannam@86	329 audio}.
cannam@86	330
cannam@86	331
cannam@86	332
cannam@86	333
cannam@86	334 \paragraph{Mode decode}
cannam@86	335 Vorbis allows an encoder to set up multiple, numbered packet 'modes',
cannam@86	336 as described earlier, all of which may be used in a given Vorbis
cannam@86	337 stream. The mode is encoded as an integer used as a direct offset into
cannam@86	338 the mode instance index.
cannam@86	339
cannam@86	340
cannam@86	341 \paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window}
cannam@86	342
cannam@86	343 Vorbis frames may be one of two PCM sample sizes specified during
cannam@86	344 codec setup. In Vorbis I, legal frame sizes are powers of two from 64
cannam@86	345 to 8192 samples. Aside from coupling, Vorbis handles channels as
cannam@86	346 independent vectors and these frame sizes are in samples per channel.
cannam@86	347
cannam@86	348 Vorbis uses an overlapping transform, namely the MDCT, to blend one
cannam@86	349 frame into the next, avoiding most inter-frame block boundary
cannam@86	350 artifacts. The MDCT output of one frame is windowed according to MDCT
cannam@86	351 requirements, overlapped 50\% with the output of the previous frame and
cannam@86	352 added. The window shape assures seamless reconstruction.
cannam@86	353
cannam@86	354 This is easy to visualize in the case of equal sized-windows:
cannam@86	355
cannam@86	356 \begin{center}
cannam@86	357 \includegraphics[width=\textwidth]{window1}
cannam@86	358 \captionof{figure}{overlap of two equal-sized windows}
cannam@86	359 \end{center}
cannam@86	360
cannam@86	361 And slightly more complex in the case of overlapping unequal sized
cannam@86	362 windows:
cannam@86	363
cannam@86	364 \begin{center}
cannam@86	365 \includegraphics[width=\textwidth]{window2}
cannam@86	366 \captionof{figure}{overlap of a long and a short window}
cannam@86	367 \end{center}
cannam@86	368
cannam@86	369 In the unequal-sized window case, the window shape of the long window
cannam@86	370 must be modified for seamless lapping as above. It is possible to
cannam@86	371 correctly infer window shape to be applied to the current window from
cannam@86	372 knowing the sizes of the current, previous and next window. It is
cannam@86	373 legal for a decoder to use this method. However, in the case of a long
cannam@86	374 window (short windows require no modification), Vorbis also codes two
cannam@86	375 flag bits to specify pre- and post- window shape. Although not
cannam@86	376 strictly necessary for function, this minor redundancy allows a packet
cannam@86	377 to be fully decoded to the point of lapping entirely independently of
cannam@86	378 any other packet, allowing easier abstraction of decode layers as well
cannam@86	379 as allowing a greater level of easy parallelism in encode and
cannam@86	380 decode.
cannam@86	381
cannam@86	382 A description of valid window functions for use with an inverse MDCT
cannam@86	383 can be found in \cite{Sporer/Brandenburg/Edler}. Vorbis windows
cannam@86	384 all use the slope function
cannam@86	385 \[ y = \sin(.5\pi \, \sin^2((x+.5)/n\pi)) . \]
cannam@86	386
cannam@86	387
cannam@86	388
cannam@86	389 \paragraph{floor decode}
cannam@86	390 Each floor is encoded/decoded in channel order, however each floor
cannam@86	391 belongs to a 'submap' that specifies which floor configuration to
cannam@86	392 use. All floors are decoded before residue decode begins.
cannam@86	393
cannam@86	394
cannam@86	395 \paragraph{residue decode}
cannam@86	396
cannam@86	397 Although the number of residue vectors equals the number of channels,
cannam@86	398 channel coupling may mean that the raw residue vectors extracted
cannam@86	399 during decode do not map directly to specific channels. When channel
cannam@86	400 coupling is in use, some vectors will correspond to coupled magnitude
cannam@86	401 or angle. The coupling relationships are described in the codec setup
cannam@86	402 and may differ from frame to frame, due to different mode numbers.
cannam@86	403
cannam@86	404 Vorbis codes residue vectors in groups by submap; the coding is done
cannam@86	405 in submap order from submap 0 through n-1. This differs from floors
cannam@86	406 which are coded using a configuration provided by submap number, but
cannam@86	407 are coded individually in channel order.
cannam@86	408
cannam@86	409
cannam@86	410
cannam@86	411 \paragraph{inverse channel coupling}
cannam@86	412
cannam@86	413 A detailed discussion of stereo in the Vorbis codec can be found in
cannam@86	414 the document \href{stereo.html}{Stereo Channel Coupling in the
cannam@86	415 Vorbis CODEC}. Vorbis is not limited to only stereo coupling, but
cannam@86	416 the stereo document also gives a good overview of the generic coupling
cannam@86	417 mechanism.
cannam@86	418
cannam@86	419 Vorbis coupling applies to pairs of residue vectors at a time;
cannam@86	420 decoupling is done in-place a pair at a time in the order and using
cannam@86	421 the vectors specified in the current mapping configuration. The
cannam@86	422 decoupling operation is the same for all pairs, converting square
cannam@86	423 polar representation (where one vector is magnitude and the second
cannam@86	424 angle) back to Cartesian representation.
cannam@86	425
cannam@86	426 After decoupling, in order, each pair of vectors on the coupling list,
cannam@86	427 the resulting residue vectors represent the fine spectral detail
cannam@86	428 of each output channel.
cannam@86	429
cannam@86	430
cannam@86	431
cannam@86	432 \paragraph{generate floor curve}
cannam@86	433
cannam@86	434 The decoder may choose to generate the floor curve at any appropriate
cannam@86	435 time. It is reasonable to generate the output curve when the floor
cannam@86	436 data is decoded from the raw packet, or it can be generated after
cannam@86	437 inverse coupling and applied to the spectral residue directly,
cannam@86	438 combining generation and the dot product into one step and eliminating
cannam@86	439 some working space.
cannam@86	440
cannam@86	441 Both floor 0 and floor 1 generate a linear-range, linear-domain output
cannam@86	442 vector to be multiplied (dot product) by the linear-range,
cannam@86	443 linear-domain spectral residue.
cannam@86	444
cannam@86	445
cannam@86	446
cannam@86	447 \paragraph{compute floor/residue dot product}
cannam@86	448
cannam@86	449 This step is straightforward; for each output channel, the decoder
cannam@86	450 multiplies the floor curve and residue vectors element by element,
cannam@86	451 producing the finished audio spectrum of each channel.
cannam@86	452
cannam@86	453 % TODO/FIXME: The following two paragraphs have identical twins
cannam@86	454 % in section 4 (under "dot product")
cannam@86	455 One point is worth mentioning about this dot product; a common mistake
cannam@86	456 in a fixed point implementation might be to assume that a 32 bit
cannam@86	457 fixed-point representation for floor and residue and direct
cannam@86	458 multiplication of the vectors is sufficient for acceptable spectral
cannam@86	459 depth in all cases because it happens to mostly work with the current
cannam@86	460 Xiph.Org reference encoder.
cannam@86	461
cannam@86	462 However, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and
cannam@86	463 the audio spectrum vector should represent a minimum of 120dB (\~{}21
cannam@86	464 bits with sign), even when output is to a 16 bit PCM device. For the
cannam@86	465 residue vector to represent full scale if the floor is nailed to
cannam@86	466 $-140$dB, it must be able to span 0 to $+140$dB. For the residue vector
cannam@86	467 to reach full scale if the floor is nailed at 0dB, it must be able to
cannam@86	468 represent $-140$dB to $+0$dB. Thus, in order to handle full range
cannam@86	469 dynamics, a residue vector may span $-140$dB to $+140$dB entirely within
cannam@86	470 spec. A 280dB range is approximately 48 bits with sign; thus the
cannam@86	471 residue vector must be able to represent a 48 bit range and the dot
cannam@86	472 product must be able to handle an effective 48 bit times 24 bit
cannam@86	473 multiplication. This range may be achieved using large (64 bit or
cannam@86	474 larger) integers, or implementing a movable binary point
cannam@86	475 representation.
cannam@86	476
cannam@86	477
cannam@86	478
cannam@86	479 \paragraph{inverse monolithic transform (MDCT)}
cannam@86	480
cannam@86	481 The audio spectrum is converted back into time domain PCM audio via an
cannam@86	482 inverse Modified Discrete Cosine Transform (MDCT). A detailed
cannam@86	483 description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}.
cannam@86	484
cannam@86	485 Note that the PCM produced directly from the MDCT is not yet finished
cannam@86	486 audio; it must be lapped with surrounding frames using an appropriate
cannam@86	487 window (such as the Vorbis window) before the MDCT can be considered
cannam@86	488 orthogonal.
cannam@86	489
cannam@86	490
cannam@86	491
cannam@86	492 \paragraph{overlap/add data}
cannam@86	493 Windowed MDCT output is overlapped and added with the right hand data
cannam@86	494 of the previous window such that the 3/4 point of the previous window
cannam@86	495 is aligned with the 1/4 point of the current window (as illustrated in
cannam@86	496 the window overlap diagram). At this point, the audio data between the
cannam@86	497 center of the previous frame and the center of the current frame is
cannam@86	498 now finished and ready to be returned.
cannam@86	499
cannam@86	500
cannam@86	501 \paragraph{cache right hand data}
cannam@86	502 The decoder must cache the right hand portion of the current frame to
cannam@86	503 be lapped with the left hand portion of the next frame.
cannam@86	504
cannam@86	505
cannam@86	506
cannam@86	507 \paragraph{return finished audio data}
cannam@86	508
cannam@86	509 The overlapped portion produced from overlapping the previous and
cannam@86	510 current frame data is finished data to be returned by the decoder.
cannam@86	511 This data spans from the center of the previous window to the center
cannam@86	512 of the current window. In the case of same-sized windows, the amount
cannam@86	513 of data to return is one-half block consisting of and only of the
cannam@86	514 overlapped portions. When overlapping a short and long window, much of
cannam@86	515 the returned range is not actually overlap. This does not damage
cannam@86	516 transform orthogonality. Pay attention however to returning the
cannam@86	517 correct data range; the amount of data to be returned is:
cannam@86	518
cannam@86	519 \begin{Verbatim}[commandchars=\\\{\}]
cannam@86	520 window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4
cannam@86	521 \end{Verbatim}
cannam@86	522
cannam@86	523 from the center of the previous window to the center of the current
cannam@86	524 window.
cannam@86	525
cannam@86	526 Data is not returned from the first frame; it must be used to 'prime'
cannam@86	527 the decode engine. The encoder accounts for this priming when
cannam@86	528 calculating PCM offsets; after the first frame, the proper PCM output
cannam@86	529 offset is '0' (as no data has been returned yet).

Mercurial > hg > sv-dependency-builds

annotate src/libvorbis-1.3.3/doc/01-introduction.tex @ 94:d278df1123f9