cannam@86: % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- cannam@86: %!TEX root = Vorbis_I_spec.tex cannam@86: % $Id$ cannam@86: \section{Introduction and Description} \label{vorbis:spec:intro} cannam@86: cannam@86: \subsection{Overview} cannam@86: cannam@86: This document provides a high level description of the Vorbis codec's cannam@86: construction. A bit-by-bit specification appears beginning in cannam@86: \xref{vorbis:spec:codec}. cannam@86: The later sections assume a high-level cannam@86: understanding of the Vorbis decode process, which is cannam@86: provided here. cannam@86: cannam@86: \subsubsection{Application} cannam@86: Vorbis is a general purpose perceptual audio CODEC intended to allow cannam@86: maximum encoder flexibility, thus allowing it to scale competitively cannam@86: over an exceptionally wide range of bitrates. At the high cannam@86: quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits) cannam@86: it is in the same league as MPEG-2 and MPC. Similarly, the 1.0 cannam@86: encoder can encode high-quality CD and DAT rate stereo at below 48kbps cannam@86: without resampling to a lower rate. Vorbis is also intended for cannam@86: lower and higher sample rates (from 8kHz telephony to 192kHz digital cannam@86: masters) and a range of channel representations (monaural, cannam@86: polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255 cannam@86: discrete channels). cannam@86: cannam@86: cannam@86: \subsubsection{Classification} cannam@86: Vorbis I is a forward-adaptive monolithic transform CODEC based on the cannam@86: Modified Discrete Cosine Transform. The codec is structured to allow cannam@86: addition of a hybrid wavelet filterbank in Vorbis II to offer better cannam@86: transient response and reproduction using a transform better suited to cannam@86: localized time events. cannam@86: cannam@86: cannam@86: \subsubsection{Assumptions} cannam@86: cannam@86: The Vorbis CODEC design assumes a complex, psychoacoustically-aware cannam@86: encoder and simple, low-complexity decoder. Vorbis decode is cannam@86: computationally simpler than mp3, although it does require more cannam@86: working memory as Vorbis has no static probability model; the vector cannam@86: codebooks used in the first stage of decoding from the bitstream are cannam@86: packed in their entirety into the Vorbis bitstream headers. In cannam@86: packed form, these codebooks occupy only a few kilobytes; the extent cannam@86: to which they are pre-decoded into a cache is the dominant factor in cannam@86: decoder memory usage. cannam@86: cannam@86: cannam@86: Vorbis provides none of its own framing, synchronization or protection cannam@86: against errors; it is solely a method of accepting input audio, cannam@86: dividing it into individual frames and compressing these frames into cannam@86: raw, unformatted 'packets'. The decoder then accepts these raw cannam@86: packets in sequence, decodes them, synthesizes audio frames from cannam@86: them, and reassembles the frames into a facsimile of the original cannam@86: audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no cannam@86: minimum size, maximum size, or fixed/expected size. Packets cannam@86: are designed that they may be truncated (or padded) and remain cannam@86: decodable; this is not to be considered an error condition and is used cannam@86: extensively in bitrate management in peeling. Both the transport cannam@86: mechanism and decoder must allow that a packet may be any size, or cannam@86: end before or after packet decode expects. cannam@86: cannam@86: Vorbis packets are thus intended to be used with a transport mechanism cannam@86: that provides free-form framing, sync, positioning and error correction cannam@86: in accordance with these design assumptions, such as Ogg (for file cannam@86: transport) or RTP (for network multicast). For purposes of a few cannam@86: examples in this document, we will assume that Vorbis is to be cannam@86: embedded in an Ogg stream specifically, although this is by no means a cannam@86: requirement or fundamental assumption in the Vorbis design. cannam@86: cannam@86: The specification for embedding Vorbis into cannam@86: an Ogg transport stream is in \xref{vorbis:over:ogg}. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{Codec Setup and Probability Model} cannam@86: cannam@86: Vorbis' heritage is as a research CODEC and its current design cannam@86: reflects a desire to allow multiple decades of continuous encoder cannam@86: improvement before running out of room within the codec specification. cannam@86: For these reasons, configurable aspects of codec setup intentionally cannam@86: lean toward the extreme of forward adaptive. cannam@86: cannam@86: The single most controversial design decision in Vorbis (and the most cannam@86: unusual for a Vorbis developer to keep in mind) is that the entire cannam@86: probability model of the codec, the Huffman and VQ codebooks, is cannam@86: packed into the bitstream header along with extensive CODEC setup cannam@86: parameters (often several hundred fields). This makes it impossible, cannam@86: as it would be with MPEG audio layers, to embed a simple frame type cannam@86: flag in each audio packet, or begin decode at any frame in the stream cannam@86: without having previously fetched the codec setup header. cannam@86: cannam@86: cannam@86: \begin{note} cannam@86: Vorbis \emph{can} initiate decode at any arbitrary packet within a cannam@86: bitstream so long as the codec has been initialized/setup with the cannam@86: setup headers. cannam@86: \end{note} cannam@86: cannam@86: Thus, Vorbis headers are both required for decode to begin and cannam@86: relatively large as bitstream headers go. The header size is cannam@86: unbounded, although for streaming a rule-of-thumb of 4kB or less is cannam@86: recommended (and Xiph.Org's Vorbis encoder follows this suggestion). cannam@86: cannam@86: Our own design work indicates the primary liability of the cannam@86: required header is in mindshare; it is an unusual design and thus cannam@86: causes some amount of complaint among engineers as this runs against cannam@86: current design trends (and also points out limitations in some cannam@86: existing software/interface designs, such as Windows' ACM codec cannam@86: framework). However, we find that it does not fundamentally limit cannam@86: Vorbis' suitable application space. cannam@86: cannam@86: cannam@86: \subsubsection{Format Specification} cannam@86: The Vorbis format is well-defined by its decode specification; any cannam@86: encoder that produces packets that are correctly decoded by the cannam@86: reference Vorbis decoder described below may be considered a proper cannam@86: Vorbis encoder. A decoder must faithfully and completely implement cannam@86: the specification defined below (except where noted) to be considered cannam@86: a proper Vorbis decoder. cannam@86: cannam@86: \subsubsection{Hardware Profile} cannam@86: Although Vorbis decode is computationally simple, it may still run cannam@86: into specific limitations of an embedded design. For this reason, cannam@86: embedded designs are allowed to deviate in limited ways from the cannam@86: `full' decode specification yet still be certified compliant. These cannam@86: optional omissions are labelled in the spec where relevant. cannam@86: cannam@86: cannam@86: \subsection{Decoder Configuration} cannam@86: cannam@86: Decoder setup consists of configuration of multiple, self-contained cannam@86: component abstractions that perform specific functions in the decode cannam@86: pipeline. Each different component instance of a specific type is cannam@86: semantically interchangeable; decoder configuration consists both of cannam@86: internal component configuration, as well as arrangement of specific cannam@86: instances into a decode pipeline. Componentry arrangement is roughly cannam@86: as follows: cannam@86: cannam@86: \begin{center} cannam@86: \includegraphics[width=\textwidth]{components} cannam@86: \captionof{figure}{decoder pipeline configuration} cannam@86: \end{center} cannam@86: cannam@86: \subsubsection{Global Config} cannam@86: Global codec configuration consists of a few audio related fields cannam@86: (sample rate, channels), Vorbis version (always '0' in Vorbis I), cannam@86: bitrate hints, and the lists of component instances. All other cannam@86: configuration is in the context of specific components. cannam@86: cannam@86: \subsubsection{Mode} cannam@86: cannam@86: Each Vorbis frame is coded according to a master 'mode'. A bitstream cannam@86: may use one or many modes. cannam@86: cannam@86: The mode mechanism is used to encode a frame according to one of cannam@86: multiple possible methods with the intention of choosing a method best cannam@86: suited to that frame. Different modes are, e.g. how frame size cannam@86: is changed from frame to frame. The mode number of a frame serves as a cannam@86: top level configuration switch for all other specific aspects of frame cannam@86: decode. cannam@86: cannam@86: A 'mode' configuration consists of a frame size setting, window type cannam@86: (always 0, the Vorbis window, in Vorbis I), transform type (always cannam@86: type 0, the MDCT, in Vorbis I) and a mapping number. The mapping cannam@86: number specifies which mapping configuration instance to use for cannam@86: low-level packet decode and synthesis. cannam@86: cannam@86: cannam@86: \subsubsection{Mapping} cannam@86: cannam@86: A mapping contains a channel coupling description and a list of cannam@86: 'submaps' that bundle sets of channel vectors together for grouped cannam@86: encoding and decoding. These submaps are not references to external cannam@86: components; the submap list is internal and specific to a mapping. cannam@86: cannam@86: A 'submap' is a configuration/grouping that applies to a subset of cannam@86: floor and residue vectors within a mapping. The submap functions as a cannam@86: last layer of indirection such that specific special floor or residue cannam@86: settings can be applied not only to all the vectors in a given mode, cannam@86: but also specific vectors in a specific mode. Each submap specifies cannam@86: the proper floor and residue instance number to use for decoding that cannam@86: submap's spectral floor and spectral residue vectors. cannam@86: cannam@86: As an example: cannam@86: cannam@86: Assume a Vorbis stream that contains six channels in the standard 5.1 cannam@86: format. The sixth channel, as is normal in 5.1, is bass only. cannam@86: Therefore it would be wasteful to encode a full-spectrum version of it cannam@86: as with the other channels. The submapping mechanism can be used to cannam@86: apply a full range floor and residue encoding to channels 0 through 4, cannam@86: and a bass-only representation to the bass channel, thus saving space. cannam@86: In this example, channels 0-4 belong to submap 0 (which indicates use cannam@86: of a full-range floor) and channel 5 belongs to submap 1, which uses a cannam@86: bass-only representation. cannam@86: cannam@86: cannam@86: \subsubsection{Floor} cannam@86: cannam@86: Vorbis encodes a spectral 'floor' vector for each PCM channel. This cannam@86: vector is a low-resolution representation of the audio spectrum for cannam@86: the given channel in the current frame, generally used akin to a cannam@86: whitening filter. It is named a 'floor' because the Xiph.Org cannam@86: reference encoder has historically used it as a unit-baseline for cannam@86: spectral resolution. cannam@86: cannam@86: A floor encoding may be of two types. Floor 0 uses a packed LSP cannam@86: representation on a dB amplitude scale and Bark frequency scale. cannam@86: Floor 1 represents the curve as a piecewise linear interpolated cannam@86: representation on a dB amplitude scale and linear frequency scale. cannam@86: The two floors are semantically interchangeable in cannam@86: encoding/decoding. However, floor type 1 provides more stable cannam@86: inter-frame behavior, and so is the preferred choice in all cannam@86: coupled-stereo and high bitrate modes. Floor 1 is also considerably cannam@86: less expensive to decode than floor 0. cannam@86: cannam@86: Floor 0 is not to be considered deprecated, but it is of limited cannam@86: modern use. No known Vorbis encoder past Xiph.Org's own beta 4 makes cannam@86: use of floor 0. cannam@86: cannam@86: The values coded/decoded by a floor are both compactly formatted and cannam@86: make use of entropy coding to save space. For this reason, a floor cannam@86: configuration generally refers to multiple codebooks in the codebook cannam@86: component list. Entropy coding is thus provided as an abstraction, cannam@86: and each floor instance may choose from any and all available cannam@86: codebooks when coding/decoding. cannam@86: cannam@86: cannam@86: \subsubsection{Residue} cannam@86: The spectral residue is the fine structure of the audio spectrum cannam@86: once the floor curve has been subtracted out. In simplest terms, it cannam@86: is coded in the bitstream using cascaded (multi-pass) vector cannam@86: quantization according to one of three specific packing/coding cannam@86: algorithms numbered 0 through 2. The packing algorithm details are cannam@86: configured by residue instance. As with the floor components, the cannam@86: final VQ/entropy encoding is provided by external codebook instances cannam@86: and each residue instance may choose from any and all available cannam@86: codebooks. cannam@86: cannam@86: \subsubsection{Codebooks} cannam@86: cannam@86: Codebooks are a self-contained abstraction that perform entropy cannam@86: decoding and, optionally, use the entropy-decoded integer value as an cannam@86: offset into an index of output value vectors, returning the indicated cannam@86: vector of values. cannam@86: cannam@86: The entropy coding in a Vorbis I codebook is provided by a standard cannam@86: Huffman binary tree representation. This tree is tightly packed using cannam@86: one of several methods, depending on whether codeword lengths are cannam@86: ordered or unordered, or the tree is sparse. cannam@86: cannam@86: The codebook vector index is similarly packed according to index cannam@86: characteristic. Most commonly, the vector index is encoded as a cannam@86: single list of values of possible values that are then permuted into cannam@86: a list of n-dimensional rows (lattice VQ). cannam@86: cannam@86: cannam@86: cannam@86: \subsection{High-level Decode Process} cannam@86: cannam@86: \subsubsection{Decode Setup} cannam@86: cannam@86: Before decoding can begin, a decoder must initialize using the cannam@86: bitstream headers matching the stream to be decoded. Vorbis uses cannam@86: three header packets; all are required, in-order, by this cannam@86: specification. Once set up, decode may begin at any audio packet cannam@86: belonging to the Vorbis stream. In Vorbis I, all packets after the cannam@86: three initial headers are audio packets. cannam@86: cannam@86: The header packets are, in order, the identification cannam@86: header, the comments header, and the setup header. cannam@86: cannam@86: \paragraph{Identification Header} cannam@86: The identification header identifies the bitstream as Vorbis, Vorbis cannam@86: version, and the simple audio characteristics of the stream such as cannam@86: sample rate and number of channels. cannam@86: cannam@86: \paragraph{Comment Header} cannam@86: The comment header includes user text comments (``tags'') and a vendor cannam@86: string for the application/library that produced the bitstream. The cannam@86: encoding and proper use of the comment header is described in \xref{vorbis:spec:comment}. cannam@86: cannam@86: \paragraph{Setup Header} cannam@86: The setup header includes extensive CODEC setup information as well as cannam@86: the complete VQ and Huffman codebooks needed for decode. cannam@86: cannam@86: cannam@86: \subsubsection{Decode Procedure} cannam@86: cannam@86: The decoding and synthesis procedure for all audio packets is cannam@86: fundamentally the same. cannam@86: \begin{enumerate} cannam@86: \item decode packet type flag cannam@86: \item decode mode number cannam@86: \item decode window shape (long windows only) cannam@86: \item decode floor cannam@86: \item decode residue into residue vectors cannam@86: \item inverse channel coupling of residue vectors cannam@86: \item generate floor curve from decoded floor data cannam@86: \item compute dot product of floor and residue, producing audio spectrum vector cannam@86: \item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I cannam@86: \item overlap/add left-hand output of transform with right-hand output of previous frame cannam@86: \item store right hand-data from transform of current frame for future lapping cannam@86: \item if not first frame, return results of overlap/add as audio result of current frame cannam@86: \end{enumerate} cannam@86: cannam@86: Note that clever rearrangement of the synthesis arithmetic is cannam@86: possible; as an example, one can take advantage of symmetries in the cannam@86: MDCT to store the right-hand transform data of a partial MDCT for a cannam@86: 50\% inter-frame buffer space savings, and then complete the transform cannam@86: later before overlap/add with the next frame. This optimization cannam@86: produces entirely equivalent output and is naturally perfectly legal. cannam@86: The decoder must be \emph{entirely mathematically equivalent} to the cannam@86: specification, it need not be a literal semantic implementation. cannam@86: cannam@86: \paragraph{Packet type decode} cannam@86: cannam@86: Vorbis I uses four packet types. The first three packet types mark each cannam@86: of the three Vorbis headers described above. The fourth packet type cannam@86: marks an audio packet. All other packet types are reserved; packets cannam@86: marked with a reserved type should be ignored. cannam@86: cannam@86: Following the three header packets, all packets in a Vorbis I stream cannam@86: are audio. The first step of audio packet decode is to read and cannam@86: verify the packet type; \emph{a non-audio packet when audio is expected cannam@86: indicates stream corruption or a non-compliant stream. The decoder cannam@86: must ignore the packet and not attempt decoding it to cannam@86: audio}. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Mode decode} cannam@86: Vorbis allows an encoder to set up multiple, numbered packet 'modes', cannam@86: as described earlier, all of which may be used in a given Vorbis cannam@86: stream. The mode is encoded as an integer used as a direct offset into cannam@86: the mode instance index. cannam@86: cannam@86: cannam@86: \paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window} cannam@86: cannam@86: Vorbis frames may be one of two PCM sample sizes specified during cannam@86: codec setup. In Vorbis I, legal frame sizes are powers of two from 64 cannam@86: to 8192 samples. Aside from coupling, Vorbis handles channels as cannam@86: independent vectors and these frame sizes are in samples per channel. cannam@86: cannam@86: Vorbis uses an overlapping transform, namely the MDCT, to blend one cannam@86: frame into the next, avoiding most inter-frame block boundary cannam@86: artifacts. The MDCT output of one frame is windowed according to MDCT cannam@86: requirements, overlapped 50\% with the output of the previous frame and cannam@86: added. The window shape assures seamless reconstruction. cannam@86: cannam@86: This is easy to visualize in the case of equal sized-windows: cannam@86: cannam@86: \begin{center} cannam@86: \includegraphics[width=\textwidth]{window1} cannam@86: \captionof{figure}{overlap of two equal-sized windows} cannam@86: \end{center} cannam@86: cannam@86: And slightly more complex in the case of overlapping unequal sized cannam@86: windows: cannam@86: cannam@86: \begin{center} cannam@86: \includegraphics[width=\textwidth]{window2} cannam@86: \captionof{figure}{overlap of a long and a short window} cannam@86: \end{center} cannam@86: cannam@86: In the unequal-sized window case, the window shape of the long window cannam@86: must be modified for seamless lapping as above. It is possible to cannam@86: correctly infer window shape to be applied to the current window from cannam@86: knowing the sizes of the current, previous and next window. It is cannam@86: legal for a decoder to use this method. However, in the case of a long cannam@86: window (short windows require no modification), Vorbis also codes two cannam@86: flag bits to specify pre- and post- window shape. Although not cannam@86: strictly necessary for function, this minor redundancy allows a packet cannam@86: to be fully decoded to the point of lapping entirely independently of cannam@86: any other packet, allowing easier abstraction of decode layers as well cannam@86: as allowing a greater level of easy parallelism in encode and cannam@86: decode. cannam@86: cannam@86: A description of valid window functions for use with an inverse MDCT cannam@86: can be found in \cite{Sporer/Brandenburg/Edler}. Vorbis windows cannam@86: all use the slope function cannam@86: \[ y = \sin(.5*\pi \, \sin^2((x+.5)/n*\pi)) . \] cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{floor decode} cannam@86: Each floor is encoded/decoded in channel order, however each floor cannam@86: belongs to a 'submap' that specifies which floor configuration to cannam@86: use. All floors are decoded before residue decode begins. cannam@86: cannam@86: cannam@86: \paragraph{residue decode} cannam@86: cannam@86: Although the number of residue vectors equals the number of channels, cannam@86: channel coupling may mean that the raw residue vectors extracted cannam@86: during decode do not map directly to specific channels. When channel cannam@86: coupling is in use, some vectors will correspond to coupled magnitude cannam@86: or angle. The coupling relationships are described in the codec setup cannam@86: and may differ from frame to frame, due to different mode numbers. cannam@86: cannam@86: Vorbis codes residue vectors in groups by submap; the coding is done cannam@86: in submap order from submap 0 through n-1. This differs from floors cannam@86: which are coded using a configuration provided by submap number, but cannam@86: are coded individually in channel order. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{inverse channel coupling} cannam@86: cannam@86: A detailed discussion of stereo in the Vorbis codec can be found in cannam@86: the document \href{stereo.html}{Stereo Channel Coupling in the cannam@86: Vorbis CODEC}. Vorbis is not limited to only stereo coupling, but cannam@86: the stereo document also gives a good overview of the generic coupling cannam@86: mechanism. cannam@86: cannam@86: Vorbis coupling applies to pairs of residue vectors at a time; cannam@86: decoupling is done in-place a pair at a time in the order and using cannam@86: the vectors specified in the current mapping configuration. The cannam@86: decoupling operation is the same for all pairs, converting square cannam@86: polar representation (where one vector is magnitude and the second cannam@86: angle) back to Cartesian representation. cannam@86: cannam@86: After decoupling, in order, each pair of vectors on the coupling list, cannam@86: the resulting residue vectors represent the fine spectral detail cannam@86: of each output channel. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{generate floor curve} cannam@86: cannam@86: The decoder may choose to generate the floor curve at any appropriate cannam@86: time. It is reasonable to generate the output curve when the floor cannam@86: data is decoded from the raw packet, or it can be generated after cannam@86: inverse coupling and applied to the spectral residue directly, cannam@86: combining generation and the dot product into one step and eliminating cannam@86: some working space. cannam@86: cannam@86: Both floor 0 and floor 1 generate a linear-range, linear-domain output cannam@86: vector to be multiplied (dot product) by the linear-range, cannam@86: linear-domain spectral residue. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{compute floor/residue dot product} cannam@86: cannam@86: This step is straightforward; for each output channel, the decoder cannam@86: multiplies the floor curve and residue vectors element by element, cannam@86: producing the finished audio spectrum of each channel. cannam@86: cannam@86: % TODO/FIXME: The following two paragraphs have identical twins cannam@86: % in section 4 (under "dot product") cannam@86: One point is worth mentioning about this dot product; a common mistake cannam@86: in a fixed point implementation might be to assume that a 32 bit cannam@86: fixed-point representation for floor and residue and direct cannam@86: multiplication of the vectors is sufficient for acceptable spectral cannam@86: depth in all cases because it happens to mostly work with the current cannam@86: Xiph.Org reference encoder. cannam@86: cannam@86: However, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and cannam@86: the audio spectrum vector should represent a minimum of 120dB (\~{}21 cannam@86: bits with sign), even when output is to a 16 bit PCM device. For the cannam@86: residue vector to represent full scale if the floor is nailed to cannam@86: $-140$dB, it must be able to span 0 to $+140$dB. For the residue vector cannam@86: to reach full scale if the floor is nailed at 0dB, it must be able to cannam@86: represent $-140$dB to $+0$dB. Thus, in order to handle full range cannam@86: dynamics, a residue vector may span $-140$dB to $+140$dB entirely within cannam@86: spec. A 280dB range is approximately 48 bits with sign; thus the cannam@86: residue vector must be able to represent a 48 bit range and the dot cannam@86: product must be able to handle an effective 48 bit times 24 bit cannam@86: multiplication. This range may be achieved using large (64 bit or cannam@86: larger) integers, or implementing a movable binary point cannam@86: representation. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{inverse monolithic transform (MDCT)} cannam@86: cannam@86: The audio spectrum is converted back into time domain PCM audio via an cannam@86: inverse Modified Discrete Cosine Transform (MDCT). A detailed cannam@86: description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}. cannam@86: cannam@86: Note that the PCM produced directly from the MDCT is not yet finished cannam@86: audio; it must be lapped with surrounding frames using an appropriate cannam@86: window (such as the Vorbis window) before the MDCT can be considered cannam@86: orthogonal. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{overlap/add data} cannam@86: Windowed MDCT output is overlapped and added with the right hand data cannam@86: of the previous window such that the 3/4 point of the previous window cannam@86: is aligned with the 1/4 point of the current window (as illustrated in cannam@86: the window overlap diagram). At this point, the audio data between the cannam@86: center of the previous frame and the center of the current frame is cannam@86: now finished and ready to be returned. cannam@86: cannam@86: cannam@86: \paragraph{cache right hand data} cannam@86: The decoder must cache the right hand portion of the current frame to cannam@86: be lapped with the left hand portion of the next frame. cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{return finished audio data} cannam@86: cannam@86: The overlapped portion produced from overlapping the previous and cannam@86: current frame data is finished data to be returned by the decoder. cannam@86: This data spans from the center of the previous window to the center cannam@86: of the current window. In the case of same-sized windows, the amount cannam@86: of data to return is one-half block consisting of and only of the cannam@86: overlapped portions. When overlapping a short and long window, much of cannam@86: the returned range is not actually overlap. This does not damage cannam@86: transform orthogonality. Pay attention however to returning the cannam@86: correct data range; the amount of data to be returned is: cannam@86: cannam@86: \begin{Verbatim}[commandchars=\\\{\}] cannam@86: window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4 cannam@86: \end{Verbatim} cannam@86: cannam@86: from the center of the previous window to the center of the current cannam@86: window. cannam@86: cannam@86: Data is not returned from the first frame; it must be used to 'prime' cannam@86: the decode engine. The encoder accounts for this priming when cannam@86: calculating PCM offsets; after the first frame, the proper PCM output cannam@86: offset is '0' (as no data has been returned yet).