annotate src/libvorbis-1.3.3/doc/04-codec.tex @ 88:fe7c3a0b0259

Add some MinGW builds
author Chris Cannam <cannam@all-day-breakfast.com>
date Wed, 20 Mar 2013 13:49:36 +0000
parents 98c1576536ae
children
rev   line source
cannam@86 1
cannam@86 2 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
cannam@86 3 %!TEX root = Vorbis_I_spec.tex
cannam@86 4 % $Id$
cannam@86 5 \section{Codec Setup and Packet Decode} \label{vorbis:spec:codec}
cannam@86 6
cannam@86 7 \subsection{Overview}
cannam@86 8
cannam@86 9 This document serves as the top-level reference document for the
cannam@86 10 bit-by-bit decode specification of Vorbis I. This document assumes a
cannam@86 11 high-level understanding of the Vorbis decode process, which is
cannam@86 12 provided in \xref{vorbis:spec:intro}. \xref{vorbis:spec:bitpacking} covers reading and writing bit fields from
cannam@86 13 and to bitstream packets.
cannam@86 14
cannam@86 15
cannam@86 16
cannam@86 17 \subsection{Header decode and decode setup}
cannam@86 18
cannam@86 19 A Vorbis bitstream begins with three header packets. The header
cannam@86 20 packets are, in order, the identification header, the comments header,
cannam@86 21 and the setup header. All are required for decode compliance. An
cannam@86 22 end-of-packet condition during decoding the first or third header
cannam@86 23 packet renders the stream undecodable. End-of-packet decoding the
cannam@86 24 comment header is a non-fatal error condition.
cannam@86 25
cannam@86 26 \subsubsection{Common header decode}
cannam@86 27
cannam@86 28 Each header packet begins with the same header fields.
cannam@86 29
cannam@86 30
cannam@86 31 \begin{Verbatim}[commandchars=\\\{\}]
cannam@86 32 1) [packet\_type] : 8 bit value
cannam@86 33 2) 0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters 'v','o','r','b','i','s' as six octets
cannam@86 34 \end{Verbatim}
cannam@86 35
cannam@86 36 Decode continues according to packet type; the identification header
cannam@86 37 is type 1, the comment header type 3 and the setup header type 5
cannam@86 38 (these types are all odd as a packet with a leading single bit of '0'
cannam@86 39 is an audio packet). The packets must occur in the order of
cannam@86 40 identification, comment, setup.
cannam@86 41
cannam@86 42
cannam@86 43
cannam@86 44 \subsubsection{Identification header}
cannam@86 45
cannam@86 46 The identification header is a short header of only a few fields used
cannam@86 47 to declare the stream definitively as Vorbis, and provide a few externally
cannam@86 48 relevant pieces of information about the audio stream. The
cannam@86 49 identification header is coded as follows:
cannam@86 50
cannam@86 51 \begin{Verbatim}[commandchars=\\\{\}]
cannam@86 52 1) [vorbis\_version] = read 32 bits as unsigned integer
cannam@86 53 2) [audio\_channels] = read 8 bit integer as unsigned
cannam@86 54 3) [audio\_sample\_rate] = read 32 bits as unsigned integer
cannam@86 55 4) [bitrate\_maximum] = read 32 bits as signed integer
cannam@86 56 5) [bitrate\_nominal] = read 32 bits as signed integer
cannam@86 57 6) [bitrate\_minimum] = read 32 bits as signed integer
cannam@86 58 7) [blocksize\_0] = 2 exponent (read 4 bits as unsigned integer)
cannam@86 59 8) [blocksize\_1] = 2 exponent (read 4 bits as unsigned integer)
cannam@86 60 9) [framing\_flag] = read one bit
cannam@86 61 \end{Verbatim}
cannam@86 62
cannam@86 63 \varname{[vorbis\_version]} is to read '0' in order to be compatible
cannam@86 64 with this document. Both \varname{[audio\_channels]} and
cannam@86 65 \varname{[audio\_sample\_rate]} must read greater than zero. Allowed final
cannam@86 66 blocksize values are 64, 128, 256, 512, 1024, 2048, 4096 and 8192 in
cannam@86 67 Vorbis I. \varname{[blocksize\_0]} must be less than or equal to
cannam@86 68 \varname{[blocksize\_1]}. The framing bit must be nonzero. Failure to
cannam@86 69 meet any of these conditions renders a stream undecodable.
cannam@86 70
cannam@86 71 The bitrate fields above are used only as hints. The nominal bitrate
cannam@86 72 field especially may be considerably off in purely VBR streams. The
cannam@86 73 fields are meaningful only when greater than zero.
cannam@86 74
cannam@86 75 \begin{itemize}
cannam@86 76 \item All three fields set to the same value implies a fixed rate, or tightly bounded, nearly fixed-rate bitstream
cannam@86 77 \item Only nominal set implies a VBR or ABR stream that averages the nominal bitrate
cannam@86 78 \item Maximum and or minimum set implies a VBR bitstream that obeys the bitrate limits
cannam@86 79 \item None set indicates the encoder does not care to speculate.
cannam@86 80 \end{itemize}
cannam@86 81
cannam@86 82
cannam@86 83
cannam@86 84
cannam@86 85 \subsubsection{Comment header}
cannam@86 86 Comment header decode and data specification is covered in
cannam@86 87 \xref{vorbis:spec:comment}.
cannam@86 88
cannam@86 89
cannam@86 90 \subsubsection{Setup header}
cannam@86 91
cannam@86 92 Vorbis codec setup is configurable to an extreme degree:
cannam@86 93
cannam@86 94 \begin{center}
cannam@86 95 \includegraphics[width=\textwidth]{components}
cannam@86 96 \captionof{figure}{decoder pipeline configuration}
cannam@86 97 \end{center}
cannam@86 98
cannam@86 99
cannam@86 100 The setup header contains the bulk of the codec setup information
cannam@86 101 needed for decode. The setup header contains, in order, the lists of
cannam@86 102 codebook configurations, time-domain transform configurations
cannam@86 103 (placeholders in Vorbis I), floor configurations, residue
cannam@86 104 configurations, channel mapping configurations and mode
cannam@86 105 configurations. It finishes with a framing bit of '1'. Header decode
cannam@86 106 proceeds in the following order:
cannam@86 107
cannam@86 108 \paragraph{Codebooks}
cannam@86 109
cannam@86 110 \begin{enumerate}
cannam@86 111 \item \varname{[vorbis\_codebook\_count]} = read eight bits as unsigned integer and add one
cannam@86 112 \item Decode \varname{[vorbis\_codebook\_count]} codebooks in order as defined
cannam@86 113 in \xref{vorbis:spec:codebook}. Save each configuration, in
cannam@86 114 order, in an array of
cannam@86 115 codebook configurations \varname{[vorbis\_codebook\_configurations]}.
cannam@86 116 \end{enumerate}
cannam@86 117
cannam@86 118
cannam@86 119
cannam@86 120 \paragraph{Time domain transforms}
cannam@86 121
cannam@86 122 These hooks are placeholders in Vorbis I. Nevertheless, the
cannam@86 123 configuration placeholder values must be read to maintain bitstream
cannam@86 124 sync.
cannam@86 125
cannam@86 126 \begin{enumerate}
cannam@86 127 \item \varname{[vorbis\_time\_count]} = read 6 bits as unsigned integer and add one
cannam@86 128 \item read \varname{[vorbis\_time\_count]} 16 bit values; each value should be zero. If any value is nonzero, this is an error condition and the stream is undecodable.
cannam@86 129 \end{enumerate}
cannam@86 130
cannam@86 131
cannam@86 132
cannam@86 133 \paragraph{Floors}
cannam@86 134
cannam@86 135 Vorbis uses two floor types; header decode is handed to the decode
cannam@86 136 abstraction of the appropriate type.
cannam@86 137
cannam@86 138 \begin{enumerate}
cannam@86 139 \item \varname{[vorbis\_floor\_count]} = read 6 bits as unsigned integer and add one
cannam@86 140 \item For each \varname{[i]} of \varname{[vorbis\_floor\_count]} floor numbers:
cannam@86 141 \begin{enumerate}
cannam@86 142 \item read the floor type: vector \varname{[vorbis\_floor\_types]} element \varname{[i]} =
cannam@86 143 read 16 bits as unsigned integer
cannam@86 144 \item If the floor type is zero, decode the floor
cannam@86 145 configuration as defined in \xref{vorbis:spec:floor0}; save
cannam@86 146 this
cannam@86 147 configuration in slot \varname{[i]} of the floor configuration array \varname{[vorbis\_floor\_configurations]}.
cannam@86 148 \item If the floor type is one,
cannam@86 149 decode the floor configuration as defined in \xref{vorbis:spec:floor1}; save this configuration in slot \varname{[i]} of the floor configuration array \varname{[vorbis\_floor\_configurations]}.
cannam@86 150 \item If the the floor type is greater than one, this stream is undecodable; ERROR CONDITION
cannam@86 151 \end{enumerate}
cannam@86 152
cannam@86 153 \end{enumerate}
cannam@86 154
cannam@86 155
cannam@86 156
cannam@86 157 \paragraph{Residues}
cannam@86 158
cannam@86 159 Vorbis uses three residue types; header decode of each type is identical.
cannam@86 160
cannam@86 161
cannam@86 162 \begin{enumerate}
cannam@86 163 \item \varname{[vorbis\_residue\_count]} = read 6 bits as unsigned integer and add one
cannam@86 164
cannam@86 165 \item For each of \varname{[vorbis\_residue\_count]} residue numbers:
cannam@86 166 \begin{enumerate}
cannam@86 167 \item read the residue type; vector \varname{[vorbis\_residue\_types]} element \varname{[i]} = read 16 bits as unsigned integer
cannam@86 168 \item If the residue type is zero,
cannam@86 169 one or two, decode the residue configuration as defined in \xref{vorbis:spec:residue}; save this configuration in slot \varname{[i]} of the residue configuration array \varname{[vorbis\_residue\_configurations]}.
cannam@86 170 \item If the the residue type is greater than two, this stream is undecodable; ERROR CONDITION
cannam@86 171 \end{enumerate}
cannam@86 172
cannam@86 173 \end{enumerate}
cannam@86 174
cannam@86 175
cannam@86 176
cannam@86 177 \paragraph{Mappings}
cannam@86 178
cannam@86 179 Mappings are used to set up specific pipelines for encoding
cannam@86 180 multichannel audio with varying channel mapping applications. Vorbis I
cannam@86 181 uses a single mapping type (0), with implicit PCM channel mappings.
cannam@86 182
cannam@86 183 % FIXME/TODO: LaTeX cannot nest enumerate that deeply, so I have to use
cannam@86 184 % itemize at the innermost level. However, it would be much better to
cannam@86 185 % rewrite this pseudocode using listings or algoritmicx or some other
cannam@86 186 % package geared towards this.
cannam@86 187 \begin{enumerate}
cannam@86 188 \item \varname{[vorbis\_mapping\_count]} = read 6 bits as unsigned integer and add one
cannam@86 189 \item For each \varname{[i]} of \varname{[vorbis\_mapping\_count]} mapping numbers:
cannam@86 190 \begin{enumerate}
cannam@86 191 \item read the mapping type: 16 bits as unsigned integer. There's no reason to save the mapping type in Vorbis I.
cannam@86 192 \item If the mapping type is nonzero, the stream is undecodable
cannam@86 193 \item If the mapping type is zero:
cannam@86 194 \begin{enumerate}
cannam@86 195 \item read 1 bit as a boolean flag
cannam@86 196 \begin{enumerate}
cannam@86 197 \item if set, \varname{[vorbis\_mapping\_submaps]} = read 4 bits as unsigned integer and add one
cannam@86 198 \item if unset, \varname{[vorbis\_mapping\_submaps]} = 1
cannam@86 199 \end{enumerate}
cannam@86 200
cannam@86 201
cannam@86 202 \item read 1 bit as a boolean flag
cannam@86 203 \begin{enumerate}
cannam@86 204 \item if set, square polar channel mapping is in use:
cannam@86 205 \begin{itemize}
cannam@86 206 \item \varname{[vorbis\_mapping\_coupling\_steps]} = read 8 bits as unsigned integer and add one
cannam@86 207 \item for \varname{[j]} each of \varname{[vorbis\_mapping\_coupling\_steps]} steps:
cannam@86 208 \begin{itemize}
cannam@86 209 \item vector \varname{[vorbis\_mapping\_magnitude]} element \varname{[j]}= read \link{vorbis:spec:ilog}{ilog}(\varname{[audio\_channels]} - 1) bits as unsigned integer
cannam@86 210 \item vector \varname{[vorbis\_mapping\_angle]} element \varname{[j]}= read \link{vorbis:spec:ilog}{ilog}(\varname{[audio\_channels]} - 1) bits as unsigned integer
cannam@86 211 \item the numbers read in the above two steps are channel numbers representing the channel to treat as magnitude and the channel to treat as angle, respectively. If for any coupling step the angle channel number equals the magnitude channel number, the magnitude channel number is greater than \varname{[audio\_channels]}-1, or the angle channel is greater than \varname{[audio\_channels]}-1, the stream is undecodable.
cannam@86 212 \end{itemize}
cannam@86 213
cannam@86 214
cannam@86 215 \end{itemize}
cannam@86 216
cannam@86 217
cannam@86 218 \item if unset, \varname{[vorbis\_mapping\_coupling\_steps]} = 0
cannam@86 219 \end{enumerate}
cannam@86 220
cannam@86 221
cannam@86 222 \item read 2 bits (reserved field); if the value is nonzero, the stream is undecodable
cannam@86 223 \item if \varname{[vorbis\_mapping\_submaps]} is greater than one, we read channel multiplex settings. For each \varname{[j]} of \varname{[audio\_channels]} channels:
cannam@86 224 \begin{enumerate}
cannam@86 225 \item vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} = read 4 bits as unsigned integer
cannam@86 226 \item if the value is greater than the highest numbered submap (\varname{[vorbis\_mapping\_submaps]} - 1), this in an error condition rendering the stream undecodable
cannam@86 227 \end{enumerate}
cannam@86 228
cannam@86 229 \item for each submap \varname{[j]} of \varname{[vorbis\_mapping\_submaps]} submaps, read the floor and residue numbers for use in decoding that submap:
cannam@86 230 \begin{enumerate}
cannam@86 231 \item read and discard 8 bits (the unused time configuration placeholder)
cannam@86 232 \item read 8 bits as unsigned integer for the floor number; save in vector \varname{[vorbis\_mapping\_submap\_floor]} element \varname{[j]}
cannam@86 233 \item verify the floor number is not greater than the highest number floor configured for the bitstream. If it is, the bitstream is undecodable
cannam@86 234 \item read 8 bits as unsigned integer for the residue number; save in vector \varname{[vorbis\_mapping\_submap\_residue]} element \varname{[j]}
cannam@86 235 \item verify the residue number is not greater than the highest number residue configured for the bitstream. If it is, the bitstream is undecodable
cannam@86 236 \end{enumerate}
cannam@86 237
cannam@86 238 \item save this mapping configuration in slot \varname{[i]} of the mapping configuration array \varname{[vorbis\_mapping\_configurations]}.
cannam@86 239 \end{enumerate}
cannam@86 240
cannam@86 241 \end{enumerate}
cannam@86 242
cannam@86 243 \end{enumerate}
cannam@86 244
cannam@86 245
cannam@86 246
cannam@86 247 \paragraph{Modes}
cannam@86 248
cannam@86 249 \begin{enumerate}
cannam@86 250 \item \varname{[vorbis\_mode\_count]} = read 6 bits as unsigned integer and add one
cannam@86 251 \item For each of \varname{[vorbis\_mode\_count]} mode numbers:
cannam@86 252 \begin{enumerate}
cannam@86 253 \item \varname{[vorbis\_mode\_blockflag]} = read 1 bit
cannam@86 254 \item \varname{[vorbis\_mode\_windowtype]} = read 16 bits as unsigned integer
cannam@86 255 \item \varname{[vorbis\_mode\_transformtype]} = read 16 bits as unsigned integer
cannam@86 256 \item \varname{[vorbis\_mode\_mapping]} = read 8 bits as unsigned integer
cannam@86 257 \item verify ranges; zero is the only legal value in Vorbis I for
cannam@86 258 \varname{[vorbis\_mode\_windowtype]}
cannam@86 259 and \varname{[vorbis\_mode\_transformtype]}. \varname{[vorbis\_mode\_mapping]} must not be greater than the highest number mapping in use. Any illegal values render the stream undecodable.
cannam@86 260 \item save this mode configuration in slot \varname{[i]} of the mode configuration array
cannam@86 261 \varname{[vorbis\_mode\_configurations]}.
cannam@86 262 \end{enumerate}
cannam@86 263
cannam@86 264 \item read 1 bit as a framing flag. If unset, a framing error occurred and the stream is not
cannam@86 265 decodable.
cannam@86 266 \end{enumerate}
cannam@86 267
cannam@86 268 After reading mode descriptions, setup header decode is complete.
cannam@86 269
cannam@86 270
cannam@86 271
cannam@86 272
cannam@86 273
cannam@86 274
cannam@86 275
cannam@86 276
cannam@86 277 \subsection{Audio packet decode and synthesis}
cannam@86 278
cannam@86 279 Following the three header packets, all packets in a Vorbis I stream
cannam@86 280 are audio. The first step of audio packet decode is to read and
cannam@86 281 verify the packet type. \emph{A non-audio packet when audio is expected
cannam@86 282 indicates stream corruption or a non-compliant stream. The decoder
cannam@86 283 must ignore the packet and not attempt decoding it to audio}.
cannam@86 284
cannam@86 285
cannam@86 286 \subsubsection{packet type, mode and window decode}
cannam@86 287
cannam@86 288 \begin{enumerate}
cannam@86 289 \item read 1 bit \varname{[packet\_type]}; check that packet type is 0 (audio)
cannam@86 290 \item read \link{vorbis:spec:ilog}{ilog}([vorbis\_mode\_count]-1) bits
cannam@86 291 \varname{[mode\_number]}
cannam@86 292 \item decode blocksize \varname{[n]} is equal to \varname{[blocksize\_0]} if
cannam@86 293 \varname{[vorbis\_mode\_blockflag]} is 0, else \varname{[n]} is equal to \varname{[blocksize\_1]}.
cannam@86 294 \item perform window selection and setup; this window is used later by the inverse MDCT:
cannam@86 295 \begin{enumerate}
cannam@86 296 \item if this is a long window (the \varname{[vorbis\_mode\_blockflag]} flag of this mode is
cannam@86 297 set):
cannam@86 298 \begin{enumerate}
cannam@86 299 \item read 1 bit for \varname{[previous\_window\_flag]}
cannam@86 300 \item read 1 bit for \varname{[next\_window\_flag]}
cannam@86 301 \item if \varname{[previous\_window\_flag]} is not set, the left half
cannam@86 302 of the window will be a hybrid window for lapping with a
cannam@86 303 short block. See \xref{vorbis:spec:window} for an illustration of overlapping
cannam@86 304 dissimilar
cannam@86 305 windows. Else, the left half window will have normal long
cannam@86 306 shape.
cannam@86 307 \item if \varname{[next\_window\_flag]} is not set, the right half of
cannam@86 308 the window will be a hybrid window for lapping with a short
cannam@86 309 block. See \xref{vorbis:spec:window} for an
cannam@86 310 illustration of overlapping dissimilar
cannam@86 311 windows. Else, the left right window will have normal long
cannam@86 312 shape.
cannam@86 313 \end{enumerate}
cannam@86 314
cannam@86 315 \item if this is a short window, the window is always the same
cannam@86 316 short-window shape.
cannam@86 317 \end{enumerate}
cannam@86 318
cannam@86 319 \end{enumerate}
cannam@86 320
cannam@86 321 Vorbis windows all use the slope function $y=\sin(\frac{\pi}{2} * \sin^2((x+0.5)/n * \pi))$,
cannam@86 322 where $n$ is window size and $x$ ranges $0 \ldots n-1$, but dissimilar
cannam@86 323 lapping requirements can affect overall shape. Window generation
cannam@86 324 proceeds as follows:
cannam@86 325
cannam@86 326 \begin{enumerate}
cannam@86 327 \item \varname{[window\_center]} = \varname{[n]} / 2
cannam@86 328 \item if (\varname{[vorbis\_mode\_blockflag]} is set and \varname{[previous\_window\_flag]} is
cannam@86 329 not set) then
cannam@86 330 \begin{enumerate}
cannam@86 331 \item \varname{[left\_window\_start]} = \varname{[n]}/4 -
cannam@86 332 \varname{[blocksize\_0]}/4
cannam@86 333 \item \varname{[left\_window\_end]} = \varname{[n]}/4 + \varname{[blocksize\_0]}/4
cannam@86 334 \item \varname{[left\_n]} = \varname{[blocksize\_0]}/2
cannam@86 335 \end{enumerate}
cannam@86 336 else
cannam@86 337 \begin{enumerate}
cannam@86 338 \item \varname{[left\_window\_start]} = 0
cannam@86 339 \item \varname{[left\_window\_end]} = \varname{[window\_center]}
cannam@86 340 \item \varname{[left\_n]} = \varname{[n]}/2
cannam@86 341 \end{enumerate}
cannam@86 342
cannam@86 343 \item if (\varname{[vorbis\_mode\_blockflag]} is set and \varname{[next\_window\_flag]} is not
cannam@86 344 set) then
cannam@86 345 \begin{enumerate}
cannam@86 346 \item \varname{[right\_window\_start]} = \varname{[n]*3}/4 -
cannam@86 347 \varname{[blocksize\_0]}/4
cannam@86 348 \item \varname{[right\_window\_end]} = \varname{[n]*3}/4 +
cannam@86 349 \varname{[blocksize\_0]}/4
cannam@86 350 \item \varname{[right\_n]} = \varname{[blocksize\_0]}/2
cannam@86 351 \end{enumerate}
cannam@86 352 else
cannam@86 353 \begin{enumerate}
cannam@86 354 \item \varname{[right\_window\_start]} = \varname{[window\_center]}
cannam@86 355 \item \varname{[right\_window\_end]} = \varname{[n]}
cannam@86 356 \item \varname{[right\_n]} = \varname{[n]}/2
cannam@86 357 \end{enumerate}
cannam@86 358
cannam@86 359 \item window from range 0 ... \varname{[left\_window\_start]}-1 inclusive is zero
cannam@86 360 \item for \varname{[i]} in range \varname{[left\_window\_start]} ...
cannam@86 361 \varname{[left\_window\_end]}-1, window(\varname{[i]}) = $\sin(\frac{\pi}{2} * \sin^2($ (\varname{[i]}-\varname{[left\_window\_start]}+0.5) / \varname{[left\_n]} $* \frac{\pi}{2})$ )
cannam@86 362 \item window from range \varname{[left\_window\_end]} ... \varname{[right\_window\_start]}-1
cannam@86 363 inclusive is one\item for \varname{[i]} in range \varname{[right\_window\_start]} ... \varname{[right\_window\_end]}-1, window(\varname{[i]}) = $\sin(\frac{\pi}{2} * \sin^2($ (\varname{[i]}-\varname{[right\_window\_start]}+0.5) / \varname{[right\_n]} $ * \frac{\pi}{2} + \frac{\pi}{2})$ )
cannam@86 364 \item window from range \varname{[right\_window\_start]} ... \varname{[n]}-1 is
cannam@86 365 zero
cannam@86 366 \end{enumerate}
cannam@86 367
cannam@86 368 An end-of-packet condition up to this point should be considered an
cannam@86 369 error that discards this packet from the stream. An end of packet
cannam@86 370 condition past this point is to be considered a possible nominal
cannam@86 371 occurrence.
cannam@86 372
cannam@86 373
cannam@86 374
cannam@86 375 \subsubsection{floor curve decode}
cannam@86 376
cannam@86 377 From this point on, we assume out decode context is using mode number
cannam@86 378 \varname{[mode\_number]} from configuration array
cannam@86 379 \varname{[vorbis\_mode\_configurations]} and the map number
cannam@86 380 \varname{[vorbis\_mode\_mapping]} (specified by the current mode) taken
cannam@86 381 from the mapping configuration array
cannam@86 382 \varname{[vorbis\_mapping\_configurations]}.
cannam@86 383
cannam@86 384 Floor curves are decoded one-by-one in channel order.
cannam@86 385
cannam@86 386 For each floor \varname{[i]} of \varname{[audio\_channels]}
cannam@86 387 \begin{enumerate}
cannam@86 388 \item \varname{[submap\_number]} = element \varname{[i]} of vector [vorbis\_mapping\_mux]
cannam@86 389 \item \varname{[floor\_number]} = element \varname{[submap\_number]} of vector
cannam@86 390 [vorbis\_submap\_floor]
cannam@86 391 \item if the floor type of this
cannam@86 392 floor (vector \varname{[vorbis\_floor\_types]} element
cannam@86 393 \varname{[floor\_number]}) is zero then decode the floor for
cannam@86 394 channel \varname{[i]} according to the
cannam@86 395 \xref{vorbis:spec:floor0-decode}
cannam@86 396 \item if the type of this floor
cannam@86 397 is one then decode the floor for channel \varname{[i]} according
cannam@86 398 to the \xref{vorbis:spec:floor1-decode}
cannam@86 399 \item save the needed decoded floor information for channel for later synthesis
cannam@86 400 \item if the decoded floor returned 'unused', set vector \varname{[no\_residue]} element
cannam@86 401 \varname{[i]} to true, else set vector \varname{[no\_residue]} element \varname{[i]} to
cannam@86 402 false
cannam@86 403 \end{enumerate}
cannam@86 404
cannam@86 405
cannam@86 406 An end-of-packet condition during floor decode shall result in packet
cannam@86 407 decode zeroing all channel output vectors and skipping to the
cannam@86 408 add/overlap output stage.
cannam@86 409
cannam@86 410
cannam@86 411
cannam@86 412 \subsubsection{nonzero vector propagate}
cannam@86 413
cannam@86 414 A possible result of floor decode is that a specific vector is marked
cannam@86 415 'unused' which indicates that that final output vector is all-zero
cannam@86 416 values (and the floor is zero). The residue for that vector is not
cannam@86 417 coded in the stream, save for one complication. If some vectors are
cannam@86 418 used and some are not, channel coupling could result in mixing a
cannam@86 419 zeroed and nonzeroed vector to produce two nonzeroed vectors.
cannam@86 420
cannam@86 421 for each \varname{[i]} from 0 ... \varname{[vorbis\_mapping\_coupling\_steps]}-1
cannam@86 422
cannam@86 423 \begin{enumerate}
cannam@86 424 \item if either \varname{[no\_residue]} entry for channel
cannam@86 425 (\varname{[vorbis\_mapping\_magnitude]} element \varname{[i]})
cannam@86 426 or channel
cannam@86 427 (\varname{[vorbis\_mapping\_angle]} element \varname{[i]})
cannam@86 428 are set to false, then both must be set to false. Note that an 'unused'
cannam@86 429 floor has no decoded floor information; it is important that this is
cannam@86 430 remembered at floor curve synthesis time.
cannam@86 431 \end{enumerate}
cannam@86 432
cannam@86 433
cannam@86 434
cannam@86 435
cannam@86 436 \subsubsection{residue decode}
cannam@86 437
cannam@86 438 Unlike floors, which are decoded in channel order, the residue vectors
cannam@86 439 are decoded in submap order.
cannam@86 440
cannam@86 441 for each submap \varname{[i]} in order from 0 ... \varname{[vorbis\_mapping\_submaps]}-1
cannam@86 442
cannam@86 443 \begin{enumerate}
cannam@86 444 \item \varname{[ch]} = 0
cannam@86 445 \item for each channel \varname{[j]} in order from 0 ... \varname{[audio\_channels]} - 1
cannam@86 446 \begin{enumerate}
cannam@86 447 \item if channel \varname{[j]} in submap \varname{[i]} (vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} is equal to \varname{[i]})
cannam@86 448 \begin{enumerate}
cannam@86 449 \item if vector \varname{[no\_residue]} element \varname{[j]} is true
cannam@86 450 \begin{enumerate}
cannam@86 451 \item vector \varname{[do\_not\_decode\_flag]} element \varname{[ch]} is set
cannam@86 452 \end{enumerate}
cannam@86 453 else
cannam@86 454 \begin{enumerate}
cannam@86 455 \item vector \varname{[do\_not\_decode\_flag]} element \varname{[ch]} is unset
cannam@86 456 \end{enumerate}
cannam@86 457
cannam@86 458 \item increment \varname{[ch]}
cannam@86 459 \end{enumerate}
cannam@86 460
cannam@86 461 \end{enumerate}
cannam@86 462 \item \varname{[residue\_number]} = vector \varname{[vorbis\_mapping\_submap\_residue]} element \varname{[i]}
cannam@86 463 \item \varname{[residue\_type]} = vector \varname{[vorbis\_residue\_types]} element \varname{[residue\_number]}
cannam@86 464 \item decode \varname{[ch]} vectors using residue \varname{[residue\_number]}, according to type \varname{[residue\_type]}, also passing vector \varname{[do\_not\_decode\_flag]} to indicate which vectors in the bundle should not be decoded. Correct per-vector decode length is \varname{[n]}/2.
cannam@86 465 \item \varname{[ch]} = 0
cannam@86 466 \item for each channel \varname{[j]} in order from 0 ... \varname{[audio\_channels]}
cannam@86 467 \begin{enumerate}
cannam@86 468 \item if channel \varname{[j]} is in submap \varname{[i]} (vector \varname{[vorbis\_mapping\_mux]} element \varname{[j]} is equal to \varname{[i]})
cannam@86 469 \begin{enumerate}
cannam@86 470 \item residue vector for channel \varname{[j]} is set to decoded residue vector \varname{[ch]}
cannam@86 471 \item increment \varname{[ch]}
cannam@86 472 \end{enumerate}
cannam@86 473
cannam@86 474 \end{enumerate}
cannam@86 475
cannam@86 476 \end{enumerate}
cannam@86 477
cannam@86 478
cannam@86 479
cannam@86 480 \subsubsection{inverse coupling}
cannam@86 481
cannam@86 482 for each \varname{[i]} from \varname{[vorbis\_mapping\_coupling\_steps]}-1 descending to 0
cannam@86 483
cannam@86 484 \begin{enumerate}
cannam@86 485 \item \varname{[magnitude\_vector]} = the residue vector for channel
cannam@86 486 (vector \varname{[vorbis\_mapping\_magnitude]} element \varname{[i]})
cannam@86 487 \item \varname{[angle\_vector]} = the residue vector for channel (vector
cannam@86 488 \varname{[vorbis\_mapping\_angle]} element \varname{[i]})
cannam@86 489 \item for each scalar value \varname{[M]} in vector \varname{[magnitude\_vector]} and the corresponding scalar value \varname{[A]} in vector \varname{[angle\_vector]}:
cannam@86 490 \begin{enumerate}
cannam@86 491 \item if (\varname{[M]} is greater than zero)
cannam@86 492 \begin{enumerate}
cannam@86 493 \item if (\varname{[A]} is greater than zero)
cannam@86 494 \begin{enumerate}
cannam@86 495 \item \varname{[new\_M]} = \varname{[M]}
cannam@86 496 \item \varname{[new\_A]} = \varname{[M]}-\varname{[A]}
cannam@86 497 \end{enumerate}
cannam@86 498 else
cannam@86 499 \begin{enumerate}
cannam@86 500 \item \varname{[new\_A]} = \varname{[M]}
cannam@86 501 \item \varname{[new\_M]} = \varname{[M]}+\varname{[A]}
cannam@86 502 \end{enumerate}
cannam@86 503
cannam@86 504 \end{enumerate}
cannam@86 505 else
cannam@86 506 \begin{enumerate}
cannam@86 507 \item if (\varname{[A]} is greater than zero)
cannam@86 508 \begin{enumerate}
cannam@86 509 \item \varname{[new\_M]} = \varname{[M]}
cannam@86 510 \item \varname{[new\_A]} = \varname{[M]}+\varname{[A]}
cannam@86 511 \end{enumerate}
cannam@86 512 else
cannam@86 513 \begin{enumerate}
cannam@86 514 \item \varname{[new\_A]} = \varname{[M]}
cannam@86 515 \item \varname{[new\_M]} = \varname{[M]}-\varname{[A]}
cannam@86 516 \end{enumerate}
cannam@86 517
cannam@86 518 \end{enumerate}
cannam@86 519
cannam@86 520 \item set scalar value \varname{[M]} in vector \varname{[magnitude\_vector]} to \varname{[new\_M]}
cannam@86 521 \item set scalar value \varname{[A]} in vector \varname{[angle\_vector]} to \varname{[new\_A]}
cannam@86 522 \end{enumerate}
cannam@86 523
cannam@86 524 \end{enumerate}
cannam@86 525
cannam@86 526
cannam@86 527
cannam@86 528
cannam@86 529 \subsubsection{dot product}
cannam@86 530
cannam@86 531 For each channel, synthesize the floor curve from the decoded floor
cannam@86 532 information, according to packet type. Note that the vector synthesis
cannam@86 533 length for floor computation is \varname{[n]}/2.
cannam@86 534
cannam@86 535 For each channel, multiply each element of the floor curve by each
cannam@86 536 element of that channel's residue vector. The result is the dot
cannam@86 537 product of the floor and residue vectors for each channel; the produced
cannam@86 538 vectors are the length \varname{[n]}/2 audio spectrum for each
cannam@86 539 channel.
cannam@86 540
cannam@86 541 % TODO/FIXME: The following two paragraphs have identical twins
cannam@86 542 % in section 1 (under "compute floor/residue dot product")
cannam@86 543 One point is worth mentioning about this dot product; a common mistake
cannam@86 544 in a fixed point implementation might be to assume that a 32 bit
cannam@86 545 fixed-point representation for floor and residue and direct
cannam@86 546 multiplication of the vectors is sufficient for acceptable spectral
cannam@86 547 depth in all cases because it happens to mostly work with the current
cannam@86 548 Xiph.Org reference encoder.
cannam@86 549
cannam@86 550 However, floor vector values can span \~140dB (\~24 bits unsigned), and
cannam@86 551 the audio spectrum vector should represent a minimum of 120dB (\~21
cannam@86 552 bits with sign), even when output is to a 16 bit PCM device. For the
cannam@86 553 residue vector to represent full scale if the floor is nailed to
cannam@86 554 $-140$dB, it must be able to span 0 to $+140$dB. For the residue vector
cannam@86 555 to reach full scale if the floor is nailed at 0dB, it must be able to
cannam@86 556 represent $-140$dB to $+0$dB. Thus, in order to handle full range
cannam@86 557 dynamics, a residue vector may span $-140$dB to $+140$dB entirely within
cannam@86 558 spec. A 280dB range is approximately 48 bits with sign; thus the
cannam@86 559 residue vector must be able to represent a 48 bit range and the dot
cannam@86 560 product must be able to handle an effective 48 bit times 24 bit
cannam@86 561 multiplication. This range may be achieved using large (64 bit or
cannam@86 562 larger) integers, or implementing a movable binary point
cannam@86 563 representation.
cannam@86 564
cannam@86 565
cannam@86 566
cannam@86 567 \subsubsection{inverse MDCT}
cannam@86 568
cannam@86 569 Convert the audio spectrum vector of each channel back into time
cannam@86 570 domain PCM audio via an inverse Modified Discrete Cosine Transform
cannam@86 571 (MDCT). A detailed description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}. The window
cannam@86 572 function used for the MDCT is the function described earlier.
cannam@86 573
cannam@86 574
cannam@86 575
cannam@86 576 \subsubsection{overlap\_add}
cannam@86 577
cannam@86 578 Windowed MDCT output is overlapped and added with the right hand data
cannam@86 579 of the previous window such that the 3/4 point of the previous window
cannam@86 580 is aligned with the 1/4 point of the current window (as illustrated in
cannam@86 581 \xref{vorbis:spec:window}). The overlapped portion
cannam@86 582 produced from overlapping the previous and current frame data is
cannam@86 583 finished data to be returned by the decoder. This data spans from the
cannam@86 584 center of the previous window to the center of the current window. In
cannam@86 585 the case of same-sized windows, the amount of data to return is
cannam@86 586 one-half block consisting of and only of the overlapped portions. When
cannam@86 587 overlapping a short and long window, much of the returned range does not
cannam@86 588 actually overlap. This does not damage transform orthogonality. Pay
cannam@86 589 attention however to returning the correct data range; the amount of
cannam@86 590 data to be returned is:
cannam@86 591
cannam@86 592 \begin{programlisting}
cannam@86 593 window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4
cannam@86 594 \end{programlisting}
cannam@86 595
cannam@86 596 from the center (element windowsize/2) of the previous window to the
cannam@86 597 center (element windowsize/2-1, inclusive) of the current window.
cannam@86 598
cannam@86 599 Data is not returned from the first frame; it must be used to 'prime'
cannam@86 600 the decode engine. The encoder accounts for this priming when
cannam@86 601 calculating PCM offsets; after the first frame, the proper PCM output
cannam@86 602 offset is '0' (as no data has been returned yet).
cannam@86 603
cannam@86 604
cannam@86 605
cannam@86 606 \subsubsection{output channel order}
cannam@86 607
cannam@86 608 Vorbis I specifies only a channel mapping type 0. In mapping type 0,
cannam@86 609 channel mapping is implicitly defined as follows for standard audio
cannam@86 610 applications. As of revision 16781 (20100113), the specification adds
cannam@86 611 defined channel locations for 6.1 and 7.1 surround. Ordering/location
cannam@86 612 for greater-than-eight channels remains 'left to the implementation'.
cannam@86 613
cannam@86 614 These channel orderings refer to order within the encoded stream. It
cannam@86 615 is naturally possible for a decoder to produce output with channels in
cannam@86 616 any order. Any such decoder should explicitly document channel
cannam@86 617 reordering behavior.
cannam@86 618
cannam@86 619 \begin{description} %[style=nextline]
cannam@86 620 \item[one channel]
cannam@86 621 the stream is monophonic
cannam@86 622
cannam@86 623 \item[two channels]
cannam@86 624 the stream is stereo. channel order: left, right
cannam@86 625
cannam@86 626 \item[three channels]
cannam@86 627 the stream is a 1d-surround encoding. channel order: left,
cannam@86 628 center, right
cannam@86 629
cannam@86 630 \item[four channels]
cannam@86 631 the stream is quadraphonic surround. channel order: front left,
cannam@86 632 front right, rear left, rear right
cannam@86 633
cannam@86 634 \item[five channels]
cannam@86 635 the stream is five-channel surround. channel order: front left,
cannam@86 636 center, front right, rear left, rear right
cannam@86 637
cannam@86 638 \item[six channels]
cannam@86 639 the stream is 5.1 surround. channel order: front left, center,
cannam@86 640 front right, rear left, rear right, LFE
cannam@86 641
cannam@86 642 \item[seven channels]
cannam@86 643 the stream is 6.1 surround. channel order: front left, center,
cannam@86 644 front right, side left, side right, rear center, LFE
cannam@86 645
cannam@86 646 \item[eight channels]
cannam@86 647 the stream is 7.1 surround. channel order: front left, center,
cannam@86 648 front right, side left, side right, rear left, rear right,
cannam@86 649 LFE
cannam@86 650
cannam@86 651 \item[greater than eight channels]
cannam@86 652 channel use and order is defined by the application
cannam@86 653
cannam@86 654 \end{description}
cannam@86 655
cannam@86 656 Applications using Vorbis for dedicated purposes may define channel
cannam@86 657 mapping as seen fit. Future channel mappings (such as three and four
cannam@86 658 channel \href{http://www.ambisonic.net/}{Ambisonics}) will
cannam@86 659 make use of channel mappings other than mapping 0.
cannam@86 660
cannam@86 661