cannam@86: cannam@86: % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- cannam@86: %!TEX root = Vorbis_I_spec.tex cannam@86: % $Id$ cannam@86: \section{Codec Setup and Packet Decode} \label{vorbis:spec:codec} cannam@86: cannam@86: \subsection{Overview} cannam@86: cannam@86: This document serves as the top-level reference document for the cannam@86: bit-by-bit decode specification of Vorbis I. This document assumes a cannam@86: high-level understanding of the Vorbis decode process, which is cannam@86: provided in \xref{vorbis:spec:intro}. \xref{vorbis:spec:bitpacking} covers reading and writing bit fields from cannam@86: and to bitstream packets. cannam@86: cannam@86: cannam@86: cannam@86: \subsection{Header decode and decode setup} cannam@86: cannam@86: A Vorbis bitstream begins with three header packets. The header cannam@86: packets are, in order, the identification header, the comments header, cannam@86: and the setup header. All are required for decode compliance. An cannam@86: end-of-packet condition during decoding the first or third header cannam@86: packet renders the stream undecodable. End-of-packet decoding the cannam@86: comment header is a non-fatal error condition. cannam@86: cannam@86: \subsubsection{Common header decode} cannam@86: cannam@86: Each header packet begins with the same header fields. cannam@86: cannam@86: cannam@86: \begin{Verbatim}[commandchars=\\\{\}] cannam@86: 1) [packet\_type] : 8 bit value cannam@86: 2) 0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters 'v','o','r','b','i','s' as six octets cannam@86: \end{Verbatim} cannam@86: cannam@86: Decode continues according to packet type; the identification header cannam@86: is type 1, the comment header type 3 and the setup header type 5 cannam@86: (these types are all odd as a packet with a leading single bit of '0' cannam@86: is an audio packet). The packets must occur in the order of cannam@86: identification, comment, setup. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{Identification header} cannam@86: cannam@86: The identification header is a short header of only a few fields used cannam@86: to declare the stream definitively as Vorbis, and provide a few externally cannam@86: relevant pieces of information about the audio stream. The cannam@86: identification header is coded as follows: cannam@86: cannam@86: \begin{Verbatim}[commandchars=\\\{\}] cannam@86: 1) [vorbis\_version] = read 32 bits as unsigned integer cannam@86: 2) [audio\_channels] = read 8 bit integer as unsigned cannam@86: 3) [audio\_sample\_rate] = read 32 bits as unsigned integer cannam@86: 4) [bitrate\_maximum] = read 32 bits as signed integer cannam@86: 5) [bitrate\_nominal] = read 32 bits as signed integer cannam@86: 6) [bitrate\_minimum] = read 32 bits as signed integer cannam@86: 7) [blocksize\_0] = 2 exponent (read 4 bits as unsigned integer) cannam@86: 8) [blocksize\_1] = 2 exponent (read 4 bits as unsigned integer) cannam@86: 9) [framing\_flag] = read one bit cannam@86: \end{Verbatim} cannam@86: cannam@86: \varname{[vorbis\_version]} is to read '0' in order to be compatible cannam@86: with this document. Both \varname{[audio\_channels]} and cannam@86: \varname{[audio\_sample\_rate]} must read greater than zero. Allowed final cannam@86: blocksize values are 64, 128, 256, 512, 1024, 2048, 4096 and 8192 in cannam@86: Vorbis I. \varname{[blocksize\_0]} must be less than or equal to cannam@86: \varname{[blocksize\_1]}. The framing bit must be nonzero. Failure to cannam@86: meet any of these conditions renders a stream undecodable. cannam@86: cannam@86: The bitrate fields above are used only as hints. The nominal bitrate cannam@86: field especially may be considerably off in purely VBR streams. The cannam@86: fields are meaningful only when greater than zero. cannam@86: cannam@86: \begin{itemize} cannam@86: \item All three fields set to the same value implies a fixed rate, or tightly bounded, nearly fixed-rate bitstream cannam@86: \item Only nominal set implies a VBR or ABR stream that averages the nominal bitrate cannam@86: \item Maximum and or minimum set implies a VBR bitstream that obeys the bitrate limits cannam@86: \item None set indicates the encoder does not care to speculate. cannam@86: \end{itemize} cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{Comment header} cannam@86: Comment header decode and data specification is covered in cannam@86: \xref{vorbis:spec:comment}. cannam@86: cannam@86: cannam@86: \subsubsection{Setup header} cannam@86: cannam@86: Vorbis codec setup is configurable to an extreme degree: cannam@86: cannam@86: \begin{center} cannam@86: \includegraphics[width=\textwidth]{components} cannam@86: \captionof{figure}{decoder pipeline configuration} cannam@86: \end{center} cannam@86: cannam@86: cannam@86: The setup header contains the bulk of the codec setup information cannam@86: needed for decode. The setup header contains, in order, the lists of cannam@86: codebook configurations, time-domain transform configurations cannam@86: (placeholders in Vorbis I), floor configurations, residue cannam@86: configurations, channel mapping configurations and mode cannam@86: configurations. It finishes with a framing bit of '1'. Header decode cannam@86: proceeds in the following order: cannam@86: cannam@86: \paragraph{Codebooks} cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_codebook\_count]} = read eight bits as unsigned integer and add one cannam@86: \item Decode \varname{[vorbis\_codebook\_count]} codebooks in order as defined cannam@86: in \xref{vorbis:spec:codebook}. Save each configuration, in cannam@86: order, in an array of cannam@86: codebook configurations \varname{[vorbis\_codebook\_configurations]}. cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Time domain transforms} cannam@86: cannam@86: These hooks are placeholders in Vorbis I. Nevertheless, the cannam@86: configuration placeholder values must be read to maintain bitstream cannam@86: sync. cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_time\_count]} = read 6 bits as unsigned integer and add one cannam@86: \item read \varname{[vorbis\_time\_count]} 16 bit values; each value should be zero. If any value is nonzero, this is an error condition and the stream is undecodable. cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Floors} cannam@86: cannam@86: Vorbis uses two floor types; header decode is handed to the decode cannam@86: abstraction of the appropriate type. cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_floor\_count]} = read 6 bits as unsigned integer and add one cannam@86: \item For each \varname{[i]} of \varname{[vorbis\_floor\_count]} floor numbers: cannam@86: \begin{enumerate} cannam@86: \item read the floor type: vector \varname{[vorbis\_floor\_types]} element \varname{[i]} = cannam@86: read 16 bits as unsigned integer cannam@86: \item If the floor type is zero, decode the floor cannam@86: configuration as defined in \xref{vorbis:spec:floor0}; save cannam@86: this cannam@86: configuration in slot \varname{[i]} of the floor configuration array \varname{[vorbis\_floor\_configurations]}. cannam@86: \item If the floor type is one, cannam@86: decode the floor configuration as defined in \xref{vorbis:spec:floor1}; save this configuration in slot \varname{[i]} of the floor configuration array \varname{[vorbis\_floor\_configurations]}. cannam@86: \item If the the floor type is greater than one, this stream is undecodable; ERROR CONDITION cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Residues} cannam@86: cannam@86: Vorbis uses three residue types; header decode of each type is identical. cannam@86: cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_residue\_count]} = read 6 bits as unsigned integer and add one cannam@86: cannam@86: \item For each of \varname{[vorbis\_residue\_count]} residue numbers: cannam@86: \begin{enumerate} cannam@86: \item read the residue type; vector \varname{[vorbis\_residue\_types]} element \varname{[i]} = read 16 bits as unsigned integer cannam@86: \item If the residue type is zero, cannam@86: one or two, decode the residue configuration as defined in \xref{vorbis:spec:residue}; save this configuration in slot \varname{[i]} of the residue configuration array \varname{[vorbis\_residue\_configurations]}. cannam@86: \item If the the residue type is greater than two, this stream is undecodable; ERROR CONDITION cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Mappings} cannam@86: cannam@86: Mappings are used to set up specific pipelines for encoding cannam@86: multichannel audio with varying channel mapping applications. Vorbis I cannam@86: uses a single mapping type (0), with implicit PCM channel mappings. cannam@86: cannam@86: % FIXME/TODO: LaTeX cannot nest enumerate that deeply, so I have to use cannam@86: % itemize at the innermost level. However, it would be much better to cannam@86: % rewrite this pseudocode using listings or algoritmicx or some other cannam@86: % package geared towards this. cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_mapping\_count]} = read 6 bits as unsigned integer and add one cannam@86: \item For each \varname{[i]} of \varname{[vorbis\_mapping\_count]} mapping numbers: cannam@86: \begin{enumerate} cannam@86: \item read the mapping type: 16 bits as unsigned integer. There's no reason to save the mapping type in Vorbis I. cannam@86: \item If the mapping type is nonzero, the stream is undecodable cannam@86: \item If the mapping type is zero: cannam@86: \begin{enumerate} cannam@86: \item read 1 bit as a boolean flag cannam@86: \begin{enumerate} cannam@86: \item if set, \varname{[vorbis\_mapping\_submaps]} = read 4 bits as unsigned integer and add one cannam@86: \item if unset, \varname{[vorbis\_mapping\_submaps]} = 1 cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: \item read 1 bit as a boolean flag cannam@86: \begin{enumerate} cannam@86: \item if set, square polar channel mapping is in use: cannam@86: \begin{itemize} cannam@86: \item \varname{[vorbis\_mapping\_coupling\_steps]} = read 8 bits as unsigned integer and add one cannam@86: \item for \varname{[j]} each of \varname{[vorbis\_mapping\_coupling\_steps]} steps: cannam@86: \begin{itemize} cannam@86: \item vector \varname{[vorbis\_mapping\_magnitude]} element \varname{[j]}= read \link{vorbis:spec:ilog}{ilog}(\varname{[audio\_channels]} - 1) bits as unsigned integer cannam@86: \item vector \varname{[vorbis\_mapping\_angle]} element \varname{[j]}= read \link{vorbis:spec:ilog}{ilog}(\varname{[audio\_channels]} - 1) bits as unsigned integer cannam@86: \item the numbers read in the above two steps are channel numbers representing the channel to treat as magnitude and the channel to treat as angle, respectively. If for any coupling step the angle channel number equals the magnitude channel number, the magnitude channel number is greater than \varname{[audio\_channels]}-1, or the angle channel is greater than \varname{[audio\_channels]}-1, the stream is undecodable. cannam@86: \end{itemize} cannam@86: cannam@86: cannam@86: \end{itemize} cannam@86: cannam@86: cannam@86: \item if unset, \varname{[vorbis\_mapping\_coupling\_steps]} = 0 cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: \item read 2 bits (reserved field); if the value is nonzero, the stream is undecodable cannam@86: \item if \varname{[vorbis\_mapping\_submaps]} is greater than one, we read channel multiplex settings. For each \varname{[j]} of \varname{[audio\_channels]} channels: cannam@86: \begin{enumerate} cannam@86: \item vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} = read 4 bits as unsigned integer cannam@86: \item if the value is greater than the highest numbered submap (\varname{[vorbis\_mapping\_submaps]} - 1), this in an error condition rendering the stream undecodable cannam@86: \end{enumerate} cannam@86: cannam@86: \item for each submap \varname{[j]} of \varname{[vorbis\_mapping\_submaps]} submaps, read the floor and residue numbers for use in decoding that submap: cannam@86: \begin{enumerate} cannam@86: \item read and discard 8 bits (the unused time configuration placeholder) cannam@86: \item read 8 bits as unsigned integer for the floor number; save in vector \varname{[vorbis\_mapping\_submap\_floor]} element \varname{[j]} cannam@86: \item verify the floor number is not greater than the highest number floor configured for the bitstream. If it is, the bitstream is undecodable cannam@86: \item read 8 bits as unsigned integer for the residue number; save in vector \varname{[vorbis\_mapping\_submap\_residue]} element \varname{[j]} cannam@86: \item verify the residue number is not greater than the highest number residue configured for the bitstream. If it is, the bitstream is undecodable cannam@86: \end{enumerate} cannam@86: cannam@86: \item save this mapping configuration in slot \varname{[i]} of the mapping configuration array \varname{[vorbis\_mapping\_configurations]}. cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \paragraph{Modes} cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_mode\_count]} = read 6 bits as unsigned integer and add one cannam@86: \item For each of \varname{[vorbis\_mode\_count]} mode numbers: cannam@86: \begin{enumerate} cannam@86: \item \varname{[vorbis\_mode\_blockflag]} = read 1 bit cannam@86: \item \varname{[vorbis\_mode\_windowtype]} = read 16 bits as unsigned integer cannam@86: \item \varname{[vorbis\_mode\_transformtype]} = read 16 bits as unsigned integer cannam@86: \item \varname{[vorbis\_mode\_mapping]} = read 8 bits as unsigned integer cannam@86: \item verify ranges; zero is the only legal value in Vorbis I for cannam@86: \varname{[vorbis\_mode\_windowtype]} cannam@86: and \varname{[vorbis\_mode\_transformtype]}. \varname{[vorbis\_mode\_mapping]} must not be greater than the highest number mapping in use. Any illegal values render the stream undecodable. cannam@86: \item save this mode configuration in slot \varname{[i]} of the mode configuration array cannam@86: \varname{[vorbis\_mode\_configurations]}. cannam@86: \end{enumerate} cannam@86: cannam@86: \item read 1 bit as a framing flag. If unset, a framing error occurred and the stream is not cannam@86: decodable. cannam@86: \end{enumerate} cannam@86: cannam@86: After reading mode descriptions, setup header decode is complete. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \subsection{Audio packet decode and synthesis} cannam@86: cannam@86: Following the three header packets, all packets in a Vorbis I stream cannam@86: are audio. The first step of audio packet decode is to read and cannam@86: verify the packet type. \emph{A non-audio packet when audio is expected cannam@86: indicates stream corruption or a non-compliant stream. The decoder cannam@86: must ignore the packet and not attempt decoding it to audio}. cannam@86: cannam@86: cannam@86: \subsubsection{packet type, mode and window decode} cannam@86: cannam@86: \begin{enumerate} cannam@86: \item read 1 bit \varname{[packet\_type]}; check that packet type is 0 (audio) cannam@86: \item read \link{vorbis:spec:ilog}{ilog}([vorbis\_mode\_count]-1) bits cannam@86: \varname{[mode\_number]} cannam@86: \item decode blocksize \varname{[n]} is equal to \varname{[blocksize\_0]} if cannam@86: \varname{[vorbis\_mode\_blockflag]} is 0, else \varname{[n]} is equal to \varname{[blocksize\_1]}. cannam@86: \item perform window selection and setup; this window is used later by the inverse MDCT: cannam@86: \begin{enumerate} cannam@86: \item if this is a long window (the \varname{[vorbis\_mode\_blockflag]} flag of this mode is cannam@86: set): cannam@86: \begin{enumerate} cannam@86: \item read 1 bit for \varname{[previous\_window\_flag]} cannam@86: \item read 1 bit for \varname{[next\_window\_flag]} cannam@86: \item if \varname{[previous\_window\_flag]} is not set, the left half cannam@86: of the window will be a hybrid window for lapping with a cannam@86: short block. See \xref{vorbis:spec:window} for an illustration of overlapping cannam@86: dissimilar cannam@86: windows. Else, the left half window will have normal long cannam@86: shape. cannam@86: \item if \varname{[next\_window\_flag]} is not set, the right half of cannam@86: the window will be a hybrid window for lapping with a short cannam@86: block. See \xref{vorbis:spec:window} for an cannam@86: illustration of overlapping dissimilar cannam@86: windows. Else, the left right window will have normal long cannam@86: shape. cannam@86: \end{enumerate} cannam@86: cannam@86: \item if this is a short window, the window is always the same cannam@86: short-window shape. cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: Vorbis windows all use the slope function $y=\sin(\frac{\pi}{2} * \sin^2((x+0.5)/n * \pi))$, cannam@86: where $n$ is window size and $x$ ranges $0 \ldots n-1$, but dissimilar cannam@86: lapping requirements can affect overall shape. Window generation cannam@86: proceeds as follows: cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[window\_center]} = \varname{[n]} / 2 cannam@86: \item if (\varname{[vorbis\_mode\_blockflag]} is set and \varname{[previous\_window\_flag]} is cannam@86: not set) then cannam@86: \begin{enumerate} cannam@86: \item \varname{[left\_window\_start]} = \varname{[n]}/4 - cannam@86: \varname{[blocksize\_0]}/4 cannam@86: \item \varname{[left\_window\_end]} = \varname{[n]}/4 + \varname{[blocksize\_0]}/4 cannam@86: \item \varname{[left\_n]} = \varname{[blocksize\_0]}/2 cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item \varname{[left\_window\_start]} = 0 cannam@86: \item \varname{[left\_window\_end]} = \varname{[window\_center]} cannam@86: \item \varname{[left\_n]} = \varname{[n]}/2 cannam@86: \end{enumerate} cannam@86: cannam@86: \item if (\varname{[vorbis\_mode\_blockflag]} is set and \varname{[next\_window\_flag]} is not cannam@86: set) then cannam@86: \begin{enumerate} cannam@86: \item \varname{[right\_window\_start]} = \varname{[n]*3}/4 - cannam@86: \varname{[blocksize\_0]}/4 cannam@86: \item \varname{[right\_window\_end]} = \varname{[n]*3}/4 + cannam@86: \varname{[blocksize\_0]}/4 cannam@86: \item \varname{[right\_n]} = \varname{[blocksize\_0]}/2 cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item \varname{[right\_window\_start]} = \varname{[window\_center]} cannam@86: \item \varname{[right\_window\_end]} = \varname{[n]} cannam@86: \item \varname{[right\_n]} = \varname{[n]}/2 cannam@86: \end{enumerate} cannam@86: cannam@86: \item window from range 0 ... \varname{[left\_window\_start]}-1 inclusive is zero cannam@86: \item for \varname{[i]} in range \varname{[left\_window\_start]} ... cannam@86: \varname{[left\_window\_end]}-1, window(\varname{[i]}) = $\sin(\frac{\pi}{2} * \sin^2($ (\varname{[i]}-\varname{[left\_window\_start]}+0.5) / \varname{[left\_n]} $* \frac{\pi}{2})$ ) cannam@86: \item window from range \varname{[left\_window\_end]} ... \varname{[right\_window\_start]}-1 cannam@86: inclusive is one\item for \varname{[i]} in range \varname{[right\_window\_start]} ... \varname{[right\_window\_end]}-1, window(\varname{[i]}) = $\sin(\frac{\pi}{2} * \sin^2($ (\varname{[i]}-\varname{[right\_window\_start]}+0.5) / \varname{[right\_n]} $ * \frac{\pi}{2} + \frac{\pi}{2})$ ) cannam@86: \item window from range \varname{[right\_window\_start]} ... \varname{[n]}-1 is cannam@86: zero cannam@86: \end{enumerate} cannam@86: cannam@86: An end-of-packet condition up to this point should be considered an cannam@86: error that discards this packet from the stream. An end of packet cannam@86: condition past this point is to be considered a possible nominal cannam@86: occurrence. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{floor curve decode} cannam@86: cannam@86: From this point on, we assume out decode context is using mode number cannam@86: \varname{[mode\_number]} from configuration array cannam@86: \varname{[vorbis\_mode\_configurations]} and the map number cannam@86: \varname{[vorbis\_mode\_mapping]} (specified by the current mode) taken cannam@86: from the mapping configuration array cannam@86: \varname{[vorbis\_mapping\_configurations]}. cannam@86: cannam@86: Floor curves are decoded one-by-one in channel order. cannam@86: cannam@86: For each floor \varname{[i]} of \varname{[audio\_channels]} cannam@86: \begin{enumerate} cannam@86: \item \varname{[submap\_number]} = element \varname{[i]} of vector [vorbis\_mapping\_mux] cannam@86: \item \varname{[floor\_number]} = element \varname{[submap\_number]} of vector cannam@86: [vorbis\_submap\_floor] cannam@86: \item if the floor type of this cannam@86: floor (vector \varname{[vorbis\_floor\_types]} element cannam@86: \varname{[floor\_number]}) is zero then decode the floor for cannam@86: channel \varname{[i]} according to the cannam@86: \xref{vorbis:spec:floor0-decode} cannam@86: \item if the type of this floor cannam@86: is one then decode the floor for channel \varname{[i]} according cannam@86: to the \xref{vorbis:spec:floor1-decode} cannam@86: \item save the needed decoded floor information for channel for later synthesis cannam@86: \item if the decoded floor returned 'unused', set vector \varname{[no\_residue]} element cannam@86: \varname{[i]} to true, else set vector \varname{[no\_residue]} element \varname{[i]} to cannam@86: false cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: An end-of-packet condition during floor decode shall result in packet cannam@86: decode zeroing all channel output vectors and skipping to the cannam@86: add/overlap output stage. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{nonzero vector propagate} cannam@86: cannam@86: A possible result of floor decode is that a specific vector is marked cannam@86: 'unused' which indicates that that final output vector is all-zero cannam@86: values (and the floor is zero). The residue for that vector is not cannam@86: coded in the stream, save for one complication. If some vectors are cannam@86: used and some are not, channel coupling could result in mixing a cannam@86: zeroed and nonzeroed vector to produce two nonzeroed vectors. cannam@86: cannam@86: for each \varname{[i]} from 0 ... \varname{[vorbis\_mapping\_coupling\_steps]}-1 cannam@86: cannam@86: \begin{enumerate} cannam@86: \item if either \varname{[no\_residue]} entry for channel cannam@86: (\varname{[vorbis\_mapping\_magnitude]} element \varname{[i]}) cannam@86: or channel cannam@86: (\varname{[vorbis\_mapping\_angle]} element \varname{[i]}) cannam@86: are set to false, then both must be set to false. Note that an 'unused' cannam@86: floor has no decoded floor information; it is important that this is cannam@86: remembered at floor curve synthesis time. cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{residue decode} cannam@86: cannam@86: Unlike floors, which are decoded in channel order, the residue vectors cannam@86: are decoded in submap order. cannam@86: cannam@86: for each submap \varname{[i]} in order from 0 ... \varname{[vorbis\_mapping\_submaps]}-1 cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[ch]} = 0 cannam@86: \item for each channel \varname{[j]} in order from 0 ... \varname{[audio\_channels]} - 1 cannam@86: \begin{enumerate} cannam@86: \item if channel \varname{[j]} in submap \varname{[i]} (vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} is equal to \varname{[i]}) cannam@86: \begin{enumerate} cannam@86: \item if vector \varname{[no\_residue]} element \varname{[j]} is true cannam@86: \begin{enumerate} cannam@86: \item vector \varname{[do\_not\_decode\_flag]} element \varname{[ch]} is set cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item vector \varname{[do\_not\_decode\_flag]} element \varname{[ch]} is unset cannam@86: \end{enumerate} cannam@86: cannam@86: \item increment \varname{[ch]} cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: \item \varname{[residue\_number]} = vector \varname{[vorbis\_mapping\_submap\_residue]} element \varname{[i]} cannam@86: \item \varname{[residue\_type]} = vector \varname{[vorbis\_residue\_types]} element \varname{[residue\_number]} cannam@86: \item decode \varname{[ch]} vectors using residue \varname{[residue\_number]}, according to type \varname{[residue\_type]}, also passing vector \varname{[do\_not\_decode\_flag]} to indicate which vectors in the bundle should not be decoded. Correct per-vector decode length is \varname{[n]}/2. cannam@86: \item \varname{[ch]} = 0 cannam@86: \item for each channel \varname{[j]} in order from 0 ... \varname{[audio\_channels]} cannam@86: \begin{enumerate} cannam@86: \item if channel \varname{[j]} is in submap \varname{[i]} (vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} is equal to \varname{[i]}) cannam@86: \begin{enumerate} cannam@86: \item residue vector for channel \varname{[j]} is set to decoded residue vector \varname{[ch]} cannam@86: \item increment \varname{[ch]} cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{inverse coupling} cannam@86: cannam@86: for each \varname{[i]} from \varname{[vorbis\_mapping\_coupling\_steps]}-1 descending to 0 cannam@86: cannam@86: \begin{enumerate} cannam@86: \item \varname{[magnitude\_vector]} = the residue vector for channel cannam@86: (vector \varname{[vorbis\_mapping\_magnitude]} element \varname{[i]}) cannam@86: \item \varname{[angle\_vector]} = the residue vector for channel (vector cannam@86: \varname{[vorbis\_mapping\_angle]} element \varname{[i]}) cannam@86: \item for each scalar value \varname{[M]} in vector \varname{[magnitude\_vector]} and the corresponding scalar value \varname{[A]} in vector \varname{[angle\_vector]}: cannam@86: \begin{enumerate} cannam@86: \item if (\varname{[M]} is greater than zero) cannam@86: \begin{enumerate} cannam@86: \item if (\varname{[A]} is greater than zero) cannam@86: \begin{enumerate} cannam@86: \item \varname{[new\_M]} = \varname{[M]} cannam@86: \item \varname{[new\_A]} = \varname{[M]}-\varname{[A]} cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item \varname{[new\_A]} = \varname{[M]} cannam@86: \item \varname{[new\_M]} = \varname{[M]}+\varname{[A]} cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item if (\varname{[A]} is greater than zero) cannam@86: \begin{enumerate} cannam@86: \item \varname{[new\_M]} = \varname{[M]} cannam@86: \item \varname{[new\_A]} = \varname{[M]}+\varname{[A]} cannam@86: \end{enumerate} cannam@86: else cannam@86: \begin{enumerate} cannam@86: \item \varname{[new\_A]} = \varname{[M]} cannam@86: \item \varname{[new\_M]} = \varname{[M]}-\varname{[A]} cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: \item set scalar value \varname{[M]} in vector \varname{[magnitude\_vector]} to \varname{[new\_M]} cannam@86: \item set scalar value \varname{[A]} in vector \varname{[angle\_vector]} to \varname{[new\_A]} cannam@86: \end{enumerate} cannam@86: cannam@86: \end{enumerate} cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{dot product} cannam@86: cannam@86: For each channel, synthesize the floor curve from the decoded floor cannam@86: information, according to packet type. Note that the vector synthesis cannam@86: length for floor computation is \varname{[n]}/2. cannam@86: cannam@86: For each channel, multiply each element of the floor curve by each cannam@86: element of that channel's residue vector. The result is the dot cannam@86: product of the floor and residue vectors for each channel; the produced cannam@86: vectors are the length \varname{[n]}/2 audio spectrum for each cannam@86: channel. cannam@86: cannam@86: % TODO/FIXME: The following two paragraphs have identical twins cannam@86: % in section 1 (under "compute floor/residue dot product") cannam@86: One point is worth mentioning about this dot product; a common mistake cannam@86: in a fixed point implementation might be to assume that a 32 bit cannam@86: fixed-point representation for floor and residue and direct cannam@86: multiplication of the vectors is sufficient for acceptable spectral cannam@86: depth in all cases because it happens to mostly work with the current cannam@86: Xiph.Org reference encoder. cannam@86: cannam@86: However, floor vector values can span \~140dB (\~24 bits unsigned), and cannam@86: the audio spectrum vector should represent a minimum of 120dB (\~21 cannam@86: bits with sign), even when output is to a 16 bit PCM device. For the cannam@86: residue vector to represent full scale if the floor is nailed to cannam@86: $-140$dB, it must be able to span 0 to $+140$dB. For the residue vector cannam@86: to reach full scale if the floor is nailed at 0dB, it must be able to cannam@86: represent $-140$dB to $+0$dB. Thus, in order to handle full range cannam@86: dynamics, a residue vector may span $-140$dB to $+140$dB entirely within cannam@86: spec. A 280dB range is approximately 48 bits with sign; thus the cannam@86: residue vector must be able to represent a 48 bit range and the dot cannam@86: product must be able to handle an effective 48 bit times 24 bit cannam@86: multiplication. This range may be achieved using large (64 bit or cannam@86: larger) integers, or implementing a movable binary point cannam@86: representation. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{inverse MDCT} cannam@86: cannam@86: Convert the audio spectrum vector of each channel back into time cannam@86: domain PCM audio via an inverse Modified Discrete Cosine Transform cannam@86: (MDCT). A detailed description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}. The window cannam@86: function used for the MDCT is the function described earlier. cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{overlap\_add} cannam@86: cannam@86: Windowed MDCT output is overlapped and added with the right hand data cannam@86: of the previous window such that the 3/4 point of the previous window cannam@86: is aligned with the 1/4 point of the current window (as illustrated in cannam@86: \xref{vorbis:spec:window}). The overlapped portion cannam@86: produced from overlapping the previous and current frame data is cannam@86: finished data to be returned by the decoder. This data spans from the cannam@86: center of the previous window to the center of the current window. In cannam@86: the case of same-sized windows, the amount of data to return is cannam@86: one-half block consisting of and only of the overlapped portions. When cannam@86: overlapping a short and long window, much of the returned range does not cannam@86: actually overlap. This does not damage transform orthogonality. Pay cannam@86: attention however to returning the correct data range; the amount of cannam@86: data to be returned is: cannam@86: cannam@86: \begin{programlisting} cannam@86: window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4 cannam@86: \end{programlisting} cannam@86: cannam@86: from the center (element windowsize/2) of the previous window to the cannam@86: center (element windowsize/2-1, inclusive) of the current window. cannam@86: cannam@86: Data is not returned from the first frame; it must be used to 'prime' cannam@86: the decode engine. The encoder accounts for this priming when cannam@86: calculating PCM offsets; after the first frame, the proper PCM output cannam@86: offset is '0' (as no data has been returned yet). cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{output channel order} cannam@86: cannam@86: Vorbis I specifies only a channel mapping type 0. In mapping type 0, cannam@86: channel mapping is implicitly defined as follows for standard audio cannam@86: applications. As of revision 16781 (20100113), the specification adds cannam@86: defined channel locations for 6.1 and 7.1 surround. Ordering/location cannam@86: for greater-than-eight channels remains 'left to the implementation'. cannam@86: cannam@86: These channel orderings refer to order within the encoded stream. It cannam@86: is naturally possible for a decoder to produce output with channels in cannam@86: any order. Any such decoder should explicitly document channel cannam@86: reordering behavior. cannam@86: cannam@86: \begin{description} %[style=nextline] cannam@86: \item[one channel] cannam@86: the stream is monophonic cannam@86: cannam@86: \item[two channels] cannam@86: the stream is stereo. channel order: left, right cannam@86: cannam@86: \item[three channels] cannam@86: the stream is a 1d-surround encoding. channel order: left, cannam@86: center, right cannam@86: cannam@86: \item[four channels] cannam@86: the stream is quadraphonic surround. channel order: front left, cannam@86: front right, rear left, rear right cannam@86: cannam@86: \item[five channels] cannam@86: the stream is five-channel surround. channel order: front left, cannam@86: center, front right, rear left, rear right cannam@86: cannam@86: \item[six channels] cannam@86: the stream is 5.1 surround. channel order: front left, center, cannam@86: front right, rear left, rear right, LFE cannam@86: cannam@86: \item[seven channels] cannam@86: the stream is 6.1 surround. channel order: front left, center, cannam@86: front right, side left, side right, rear center, LFE cannam@86: cannam@86: \item[eight channels] cannam@86: the stream is 7.1 surround. channel order: front left, center, cannam@86: front right, side left, side right, rear left, rear right, cannam@86: LFE cannam@86: cannam@86: \item[greater than eight channels] cannam@86: channel use and order is defined by the application cannam@86: cannam@86: \end{description} cannam@86: cannam@86: Applications using Vorbis for dedicated purposes may define channel cannam@86: mapping as seen fit. Future channel mappings (such as three and four cannam@86: channel \href{http://www.ambisonic.net/}{Ambisonics}) will cannam@86: make use of channel mappings other than mapping 0. cannam@86: cannam@86: