Vorbis I specification

cannam@86: cannam@86: cannam@86: Vorbis I specification cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

Vorbis I specification

cannam@86:

Xiph.Org Foundation

cannam@86:

February 3, 2012

cannam@86:

cannam@86: 1 Introduction and Description cannam@86:
  1.1 Overview cannam@86:
   1.1.1 Application cannam@86:
   1.1.2 Classification cannam@86:
   1.1.3 Assumptions cannam@86:
   1.1.4 Codec Setup and Probability Model cannam@86:
   1.1.5 Format Specification cannam@86:
   1.1.6 Hardware Profile cannam@86:
  1.2 Decoder Configuration cannam@86:
   1.2.1 Global Config cannam@86:
   1.2.2 Mode cannam@86:
   1.2.3 Mapping cannam@86: cannam@86: cannam@86: cannam@86:
   1.2.4 Floor cannam@86:
   1.2.5 Residue cannam@86:
   1.2.6 Codebooks cannam@86:
  1.3 High-level Decode Process cannam@86:
   1.3.1 Decode Setup cannam@86:
   1.3.2 Decode Procedure cannam@86:
2 Bitpacking Convention cannam@86:
  2.1 Overview cannam@86:
   2.1.1 octets, bytes and words cannam@86:
   2.1.2 bit order cannam@86:
   2.1.3 byte order cannam@86:
   2.1.4 coding bits into byte sequences cannam@86:
   2.1.5 signedness cannam@86:
   2.1.6 coding example cannam@86:
   2.1.7 decoding example cannam@86:
   2.1.8 end-of-packet alignment cannam@86:
   2.1.9 reading zero bits cannam@86:
3 Probability Model and Codebooks cannam@86:
  3.1 Overview cannam@86:
   3.1.1 Bitwise operation cannam@86:
  3.2 Packed codebook format cannam@86:
   3.2.1 codebook decode cannam@86:
  3.3 Use of the codebook abstraction cannam@86:
4 Codec Setup and Packet Decode cannam@86:
  4.1 Overview cannam@86:
  4.2 Header decode and decode setup cannam@86:
   4.2.1 Common header decode cannam@86:
   4.2.2 Identification header cannam@86:
   4.2.3 Comment header cannam@86: cannam@86: cannam@86: cannam@86:
   4.2.4 Setup header cannam@86:
  4.3 Audio packet decode and synthesis cannam@86:
   4.3.1 packet type, mode and window decode cannam@86:
   4.3.2 floor curve decode cannam@86:
   4.3.3 nonzero vector propagate cannam@86:
   4.3.4 residue decode cannam@86:
   4.3.5 inverse coupling cannam@86:
   4.3.6 dot product cannam@86:
   4.3.7 inverse MDCT cannam@86:
   4.3.8 overlap_add cannam@86:
   4.3.9 output channel order cannam@86:
5 comment field and header specification cannam@86:
  5.1 Overview cannam@86:
  5.2 Comment encoding cannam@86:
   5.2.1 Structure cannam@86:
   5.2.2 Content vector format cannam@86:
   5.2.3 Encoding cannam@86:
6 Floor type 0 setup and decode cannam@86:
  6.1 Overview cannam@86:
  6.2 Floor 0 format cannam@86:
   6.2.1 header decode cannam@86:
   6.2.2 packet decode cannam@86:
   6.2.3 curve computation cannam@86:
7 Floor type 1 setup and decode cannam@86:
  7.1 Overview cannam@86:
  7.2 Floor 1 format cannam@86:
   7.2.1 model cannam@86:
   7.2.2 header decode cannam@86:
   7.2.3 packet decode cannam@86: cannam@86: cannam@86: cannam@86:
   7.2.4 curve computation cannam@86:
8 Residue setup and decode cannam@86:
  8.1 Overview cannam@86:
  8.2 Residue format cannam@86:
  8.3 residue 0 cannam@86:
  8.4 residue 1 cannam@86:
  8.5 residue 2 cannam@86:
  8.6 Residue decode cannam@86:
   8.6.1 header decode cannam@86:
   8.6.2 packet decode cannam@86:
   8.6.3 format 0 specifics cannam@86:
   8.6.4 format 1 specifics cannam@86:
   8.6.5 format 2 specifics cannam@86:
9 Helper equations cannam@86:
  9.1 Overview cannam@86:
  9.2 Functions cannam@86:
   9.2.1 ilog cannam@86:
   9.2.2 float32_unpack cannam@86:
   9.2.3 lookup1_values cannam@86:
   9.2.4 low_neighbor cannam@86:
   9.2.5 high_neighbor cannam@86:
   9.2.6 render_point cannam@86:
   9.2.7 render_line cannam@86:
10 Tables cannam@86:
  10.1 floor1_inverse_dB_table cannam@86:
A Embedding Vorbis into an Ogg stream cannam@86:
  A.1 Overview cannam@86:
   A.1.1 Restrictions cannam@86:
   A.1.2 MIME type cannam@86: cannam@86: cannam@86: cannam@86:
  A.2 Encapsulation cannam@86:
B Vorbis encapsulation in RTP cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

1. Introduction and Description

cannam@86:

1.1. Overview

cannam@86:

This document provides a high level description of the Vorbis codec’s construction. A bit-by-bit cannam@86: specification appears beginning in Section 4, “Codec Setup and Packet Decode”. The later cannam@86: sections assume a high-level understanding of the Vorbis decode process, which is provided cannam@86: here. cannam@86:

cannam@86:

1.1.1. Application

cannam@86:

Vorbis is a general purpose perceptual audio CODEC intended to allow maximum encoder cannam@86: flexibility, thus allowing it to scale competitively over an exceptionally wide range of bitrates. At cannam@86: the high quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits) it is in the same cannam@86: league as MPEG-2 and MPC. Similarly, the 1.0 encoder can encode high-quality CD and DAT cannam@86: rate stereo at below 48kbps without resampling to a lower rate. Vorbis is also intended for lower cannam@86: and higher sample rates (from 8kHz telephony to 192kHz digital masters) and a range of channel cannam@86: representations (monaural, polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255 cannam@86: discrete channels). cannam@86:

cannam@86:

1.1.2. Classification

cannam@86:

Vorbis I is a forward-adaptive monolithic transform CODEC based on the Modified Discrete cannam@86: Cosine Transform. The codec is structured to allow addition of a hybrid wavelet filterbank in cannam@86: Vorbis II to offer better transient response and reproduction using a transform better suited to cannam@86: localized time events. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

1.1.3. Assumptions

cannam@86:

The Vorbis CODEC design assumes a complex, psychoacoustically-aware encoder and simple, cannam@86: low-complexity decoder. Vorbis decode is computationally simpler than mp3, although it does cannam@86: require more working memory as Vorbis has no static probability model; the vector codebooks cannam@86: used in the first stage of decoding from the bitstream are packed in their entirety into the Vorbis cannam@86: bitstream headers. In packed form, these codebooks occupy only a few kilobytes; the extent to cannam@86: which they are pre-decoded into a cache is the dominant factor in decoder memory cannam@86: usage. cannam@86:

Vorbis provides none of its own framing, synchronization or protection against errors; it cannam@86: is solely a method of accepting input audio, dividing it into individual frames and cannam@86: compressing these frames into raw, unformatted ’packets’. The decoder then accepts cannam@86: these raw packets in sequence, decodes them, synthesizes audio frames from them, and cannam@86: reassembles the frames into a facsimile of the original audio stream. Vorbis is a free-form cannam@86: variable bit rate (VBR) codec and packets have no minimum size, maximum size, or cannam@86: fixed/expected size. Packets are designed that they may be truncated (or padded) cannam@86: and remain decodable; this is not to be considered an error condition and is used cannam@86: extensively in bitrate management in peeling. Both the transport mechanism and cannam@86: decoder must allow that a packet may be any size, or end before or after packet decode cannam@86: expects. cannam@86:

Vorbis packets are thus intended to be used with a transport mechanism that provides free-form cannam@86: framing, sync, positioning and error correction in accordance with these design assumptions, such cannam@86: as Ogg (for file transport) or RTP (for network multicast). For purposes of a few examples in this cannam@86: document, we will assume that Vorbis is to be embedded in an Ogg stream specifically, cannam@86: although this is by no means a requirement or fundamental assumption in the Vorbis cannam@86: design. cannam@86:

The specification for embedding Vorbis into an Ogg transport stream is in Section A, cannam@86: “Embedding Vorbis into an Ogg stream”. cannam@86:

cannam@86:

1.1.4. Codec Setup and Probability Model

cannam@86:

Vorbis’ heritage is as a research CODEC and its current design reflects a desire to allow multiple cannam@86: decades of continuous encoder improvement before running out of room within the codec cannam@86: specification. For these reasons, configurable aspects of codec setup intentionally lean toward the cannam@86: extreme of forward adaptive. cannam@86: cannam@86: cannam@86: cannam@86:

The single most controversial design decision in Vorbis (and the most unusual for a Vorbis cannam@86: developer to keep in mind) is that the entire probability model of the codec, the Huffman and cannam@86: VQ codebooks, is packed into the bitstream header along with extensive CODEC setup cannam@86: parameters (often several hundred fields). This makes it impossible, as it would be with cannam@86: MPEG audio layers, to embed a simple frame type flag in each audio packet, or begin cannam@86: decode at any frame in the stream without having previously fetched the codec setup cannam@86: header. cannam@86:

Note: Vorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec cannam@86: has been initialized/setup with the setup headers. cannam@86:

Thus, Vorbis headers are both required for decode to begin and relatively large as bitstream cannam@86: headers go. The header size is unbounded, although for streaming a rule-of-thumb of 4kB or less cannam@86: is recommended (and Xiph.Org’s Vorbis encoder follows this suggestion). cannam@86:

Our own design work indicates the primary liability of the required header is in mindshare; it is cannam@86: an unusual design and thus causes some amount of complaint among engineers as this runs cannam@86: against current design trends (and also points out limitations in some existing software/interface cannam@86: designs, such as Windows’ ACM codec framework). However, we find that it does not cannam@86: fundamentally limit Vorbis’ suitable application space. cannam@86:

cannam@86:

1.1.5. Format Specification

cannam@86:

The Vorbis format is well-defined by its decode specification; any encoder that produces packets cannam@86: that are correctly decoded by the reference Vorbis decoder described below may be considered cannam@86: a proper Vorbis encoder. A decoder must faithfully and completely implement the cannam@86: specification defined below (except where noted) to be considered a proper Vorbis cannam@86: decoder. cannam@86:

cannam@86:

1.1.6. Hardware Profile

cannam@86: cannam@86: cannam@86: cannam@86:

Although Vorbis decode is computationally simple, it may still run into specific limitations of an cannam@86: embedded design. For this reason, embedded designs are allowed to deviate in limited ways from cannam@86: the ‘full’ decode specification yet still be certified compliant. These optional omissions are cannam@86: labelled in the spec where relevant. cannam@86:

cannam@86:

1.2. Decoder Configuration

cannam@86:

Decoder setup consists of configuration of multiple, self-contained component abstractions that cannam@86: perform specific functions in the decode pipeline. Each different component instance of a specific cannam@86: type is semantically interchangeable; decoder configuration consists both of internal component cannam@86: configuration, as well as arrangement of specific instances into a decode pipeline. Componentry cannam@86: arrangement is roughly as follows: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 1: decoder pipeline configuration

cannam@86:

1.2.1. Global Config

cannam@86:

Global codec configuration consists of a few audio related fields (sample rate, channels), Vorbis cannam@86: version (always ’0’ in Vorbis I), bitrate hints, and the lists of component instances. All other cannam@86: configuration is in the context of specific components. cannam@86:

cannam@86:

1.2.2. Mode

cannam@86: cannam@86: cannam@86: cannam@86:

Each Vorbis frame is coded according to a master ’mode’. A bitstream may use one or many cannam@86: modes. cannam@86:

The mode mechanism is used to encode a frame according to one of multiple possible cannam@86: methods with the intention of choosing a method best suited to that frame. Different cannam@86: modes are, e.g. how frame size is changed from frame to frame. The mode number of a cannam@86: frame serves as a top level configuration switch for all other specific aspects of frame cannam@86: decode. cannam@86:

A ’mode’ configuration consists of a frame size setting, window type (always 0, the Vorbis cannam@86: window, in Vorbis I), transform type (always type 0, the MDCT, in Vorbis I) and a mapping cannam@86: number. The mapping number specifies which mapping configuration instance to use for low-level cannam@86: packet decode and synthesis. cannam@86:

cannam@86:

1.2.3. Mapping

cannam@86:

A mapping contains a channel coupling description and a list of ’submaps’ that bundle sets cannam@86: of channel vectors together for grouped encoding and decoding. These submaps are cannam@86: not references to external components; the submap list is internal and specific to a cannam@86: mapping. cannam@86:

A ’submap’ is a configuration/grouping that applies to a subset of floor and residue vectors cannam@86: within a mapping. The submap functions as a last layer of indirection such that specific special cannam@86: floor or residue settings can be applied not only to all the vectors in a given mode, but also cannam@86: specific vectors in a specific mode. Each submap specifies the proper floor and residue cannam@86: instance number to use for decoding that submap’s spectral floor and spectral residue cannam@86: vectors. cannam@86:

As an example: cannam@86:

Assume a Vorbis stream that contains six channels in the standard 5.1 format. The sixth cannam@86: channel, as is normal in 5.1, is bass only. Therefore it would be wasteful to encode a cannam@86: full-spectrum version of it as with the other channels. The submapping mechanism can be used cannam@86: to apply a full range floor and residue encoding to channels 0 through 4, and a bass-only cannam@86: representation to the bass channel, thus saving space. In this example, channels 0-4 belong to cannam@86: submap 0 (which indicates use of a full-range floor) and channel 5 belongs to submap 1, which cannam@86: uses a bass-only representation. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

1.2.4. Floor

cannam@86:

Vorbis encodes a spectral ’floor’ vector for each PCM channel. This vector is a low-resolution cannam@86: representation of the audio spectrum for the given channel in the current frame, generally used cannam@86: akin to a whitening filter. It is named a ’floor’ because the Xiph.Org reference encoder has cannam@86: historically used it as a unit-baseline for spectral resolution. cannam@86:

A floor encoding may be of two types. Floor 0 uses a packed LSP representation on a dB cannam@86: amplitude scale and Bark frequency scale. Floor 1 represents the curve as a piecewise linear cannam@86: interpolated representation on a dB amplitude scale and linear frequency scale. The two floors cannam@86: are semantically interchangeable in encoding/decoding. However, floor type 1 provides more cannam@86: stable inter-frame behavior, and so is the preferred choice in all coupled-stereo and cannam@86: high bitrate modes. Floor 1 is also considerably less expensive to decode than floor cannam@86: 0. cannam@86:

Floor 0 is not to be considered deprecated, but it is of limited modern use. No known Vorbis cannam@86: encoder past Xiph.Org’s own beta 4 makes use of floor 0. cannam@86:

The values coded/decoded by a floor are both compactly formatted and make use of entropy cannam@86: coding to save space. For this reason, a floor configuration generally refers to multiple cannam@86: codebooks in the codebook component list. Entropy coding is thus provided as an cannam@86: abstraction, and each floor instance may choose from any and all available codebooks when cannam@86: coding/decoding. cannam@86:

cannam@86:

1.2.5. Residue

cannam@86:

The spectral residue is the fine structure of the audio spectrum once the floor curve has been cannam@86: subtracted out. In simplest terms, it is coded in the bitstream using cascaded (multi-pass) vector cannam@86: quantization according to one of three specific packing/coding algorithms numbered cannam@86: 0 through 2. The packing algorithm details are configured by residue instance. As cannam@86: with the floor components, the final VQ/entropy encoding is provided by external cannam@86: codebook instances and each residue instance may choose from any and all available cannam@86: codebooks. cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

1.2.6. Codebooks

cannam@86:

Codebooks are a self-contained abstraction that perform entropy decoding and, optionally, use cannam@86: the entropy-decoded integer value as an offset into an index of output value vectors, returning cannam@86: the indicated vector of values. cannam@86:

The entropy coding in a Vorbis I codebook is provided by a standard Huffman binary tree cannam@86: representation. This tree is tightly packed using one of several methods, depending on whether cannam@86: codeword lengths are ordered or unordered, or the tree is sparse. cannam@86:

The codebook vector index is similarly packed according to index characteristic. Most commonly, cannam@86: the vector index is encoded as a single list of values of possible values that are then permuted cannam@86: into a list of n-dimensional rows (lattice VQ). cannam@86:

cannam@86:

1.3. High-level Decode Process

cannam@86:

1.3.1. Decode Setup

cannam@86:

Before decoding can begin, a decoder must initialize using the bitstream headers matching the cannam@86: stream to be decoded. Vorbis uses three header packets; all are required, in-order, by cannam@86: this specification. Once set up, decode may begin at any audio packet belonging to cannam@86: the Vorbis stream. In Vorbis I, all packets after the three initial headers are audio cannam@86: packets. cannam@86:

The header packets are, in order, the identification header, the comments header, and the setup cannam@86: header. cannam@86:

Identification Header cannam@86: The identification header identifies the bitstream as Vorbis, Vorbis version, and the simple audio cannam@86: characteristics of the stream such as sample rate and number of channels. cannam@86: cannam@86: cannam@86: cannam@86:

Comment Header cannam@86: The comment header includes user text comments (“tags”) and a vendor string for the cannam@86: application/library that produced the bitstream. The encoding and proper use of the comment cannam@86: header is described in Section 5, “comment field and header specification”. cannam@86:

Setup Header cannam@86: The setup header includes extensive CODEC setup information as well as the complete VQ and cannam@86: Huffman codebooks needed for decode. cannam@86:

cannam@86:

1.3.2. Decode Procedure

cannam@86:

The decoding and synthesis procedure for all audio packets is fundamentally the same. cannam@86:

cannam@86: 1.: decode packet type flag cannam@86:
cannam@86: 2.: decode mode number cannam@86:
cannam@86: 3.: decode window shape (long windows only) cannam@86:
cannam@86: 4.: decode floor cannam@86:
cannam@86: 5.: decode residue into residue vectors cannam@86:
cannam@86: 6.: inverse channel coupling of residue vectors cannam@86:
cannam@86: 7.: generate floor curve from decoded floor data cannam@86:
cannam@86: 8.: compute dot product of floor and residue, producing audio spectrum vector cannam@86:
cannam@86: 9.: inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis cannam@86: I cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: 10.: overlap/add left-hand output of transform with right-hand output of previous frame cannam@86:
cannam@86: 11.: store right hand-data from transform of current frame for future lapping cannam@86:
cannam@86: 12.: if not first frame, return results of overlap/add as audio result of current frame

cannam@86:

Note that clever rearrangement of the synthesis arithmetic is possible; as an example, one can cannam@86: take advantage of symmetries in the MDCT to store the right-hand transform data of a partial cannam@86: MDCT for a 50% inter-frame buffer space savings, and then complete the transform later before cannam@86: overlap/add with the next frame. This optimization produces entirely equivalent output and is cannam@86: naturally perfectly legal. The decoder must be entirely mathematically equivalent to the cannam@86: specification, it need not be a literal semantic implementation. cannam@86:

Packet type decode cannam@86: Vorbis I uses four packet types. The first three packet types mark each of the three Vorbis cannam@86: headers described above. The fourth packet type marks an audio packet. All other packet types cannam@86: are reserved; packets marked with a reserved type should be ignored. cannam@86:

Following the three header packets, all packets in a Vorbis I stream are audio. The first step of cannam@86: audio packet decode is to read and verify the packet type; a non-audio packet when audio is cannam@86: expected indicates stream corruption or a non-compliant stream. The decoder must ignore the cannam@86: packet and not attempt decoding it to audio. cannam@86:

Mode decode cannam@86: Vorbis allows an encoder to set up multiple, numbered packet ’modes’, as described earlier, all of cannam@86: which may be used in a given Vorbis stream. The mode is encoded as an integer used as a direct cannam@86: offset into the mode instance index. cannam@86:

Window shape decode (long windows only) cannam@86: Vorbis frames may be one of two PCM sample sizes specified during codec setup. In Vorbis I, cannam@86: legal frame sizes are powers of two from 64 to 8192 samples. Aside from coupling, Vorbis cannam@86: handles channels as independent vectors and these frame sizes are in samples per cannam@86: channel. cannam@86: cannam@86: cannam@86: cannam@86:

Vorbis uses an overlapping transform, namely the MDCT, to blend one frame into the next, cannam@86: avoiding most inter-frame block boundary artifacts. The MDCT output of one frame is windowed cannam@86: according to MDCT requirements, overlapped 50% with the output of the previous frame and cannam@86: added. The window shape assures seamless reconstruction. cannam@86:

This is easy to visualize in the case of equal sized-windows: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 2: overlap of two equal-sized windows

cannam@86:

And slightly more complex in the case of overlapping unequal sized windows: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 3: overlap of a long and a short window

cannam@86:

In the unequal-sized window case, the window shape of the long window must be modified for cannam@86: seamless lapping as above. It is possible to correctly infer window shape to be applied to the cannam@86: current window from knowing the sizes of the current, previous and next window. It is legal for a cannam@86: decoder to use this method. However, in the case of a long window (short windows require no cannam@86: modification), Vorbis also codes two flag bits to specify pre- and post- window shape. Although cannam@86: not strictly necessary for function, this minor redundancy allows a packet to be fully decoded to cannam@86: the point of lapping entirely independently of any other packet, allowing easier abstraction of cannam@86: decode layers as well as allowing a greater level of easy parallelism in encode and cannam@86: decode. cannam@86:

A description of valid window functions for use with an inverse MDCT can be found in [1]. cannam@86: Vorbis windows all use the slope function cannam@86:

y = sin (.5 ∗ π sin2((x + .5)∕n ∗ π)). cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

floor decode cannam@86: Each floor is encoded/decoded in channel order, however each floor belongs to a ’submap’ that cannam@86: specifies which floor configuration to use. All floors are decoded before residue decode cannam@86: begins. cannam@86:

residue decode cannam@86: Although the number of residue vectors equals the number of channels, channel coupling may cannam@86: mean that the raw residue vectors extracted during decode do not map directly to specific cannam@86: channels. When channel coupling is in use, some vectors will correspond to coupled magnitude or cannam@86: angle. The coupling relationships are described in the codec setup and may differ from frame to cannam@86: frame, due to different mode numbers. cannam@86:

Vorbis codes residue vectors in groups by submap; the coding is done in submap order from cannam@86: submap 0 through n-1. This differs from floors which are coded using a configuration provided by cannam@86: submap number, but are coded individually in channel order. cannam@86:

inverse channel coupling cannam@86: A detailed discussion of stereo in the Vorbis codec can be found in the document cannam@86: Stereo Channel Coupling in the Vorbis CODEC. Vorbis is not limited to only stereo cannam@86: coupling, but the stereo document also gives a good overview of the generic coupling cannam@86: mechanism. cannam@86:

Vorbis coupling applies to pairs of residue vectors at a time; decoupling is done in-place a cannam@86: pair at a time in the order and using the vectors specified in the current mapping cannam@86: configuration. The decoupling operation is the same for all pairs, converting square polar cannam@86: representation (where one vector is magnitude and the second angle) back to Cartesian cannam@86: representation. cannam@86:

After decoupling, in order, each pair of vectors on the coupling list, the resulting residue vectors cannam@86: represent the fine spectral detail of each output channel. cannam@86: cannam@86: cannam@86: cannam@86:

generate floor curve cannam@86: The decoder may choose to generate the floor curve at any appropriate time. It is reasonable to cannam@86: generate the output curve when the floor data is decoded from the raw packet, or it cannam@86: can be generated after inverse coupling and applied to the spectral residue directly, cannam@86: combining generation and the dot product into one step and eliminating some working cannam@86: space. cannam@86:

Both floor 0 and floor 1 generate a linear-range, linear-domain output vector to be multiplied cannam@86: (dot product) by the linear-range, linear-domain spectral residue. cannam@86:

compute floor/residue dot product cannam@86: This step is straightforward; for each output channel, the decoder multiplies the floor curve and cannam@86: residue vectors element by element, producing the finished audio spectrum of each cannam@86: channel. cannam@86:

One point is worth mentioning about this dot product; a common mistake in a fixed point cannam@86: implementation might be to assume that a 32 bit fixed-point representation for floor and cannam@86: residue and direct multiplication of the vectors is sufficient for acceptable spectral depth cannam@86: in all cases because it happens to mostly work with the current Xiph.Org reference cannam@86: encoder. cannam@86:

However, floor vector values can span ∼140dB (∼24 bits unsigned), and the audio spectrum cannam@86: vector should represent a minimum of 120dB (∼21 bits with sign), even when output is to a 16 cannam@86: bit PCM device. For the residue vector to represent full scale if the floor is nailed cannam@86: to −140dB, it must be able to span 0 to +140dB. For the residue vector to reach cannam@86: full scale if the floor is nailed at 0dB, it must be able to represent −140dB to +0dB. cannam@86: Thus, in order to handle full range dynamics, a residue vector may span −140dB to cannam@86: +140dB entirely within spec. A 280dB range is approximately 48 bits with sign; thus the cannam@86: residue vector must be able to represent a 48 bit range and the dot product must cannam@86: be able to handle an effective 48 bit times 24 bit multiplication. This range may be cannam@86: achieved using large (64 bit or larger) integers, or implementing a movable binary point cannam@86: representation. cannam@86:

inverse monolithic transform (MDCT) cannam@86: The audio spectrum is converted back into time domain PCM audio via an inverse Modified cannam@86: Discrete Cosine Transform (MDCT). A detailed description of the MDCT is available in cannam@86: [1]. cannam@86:

Note that the PCM produced directly from the MDCT is not yet finished audio; it must be cannam@86: cannam@86: cannam@86: cannam@86: lapped with surrounding frames using an appropriate window (such as the Vorbis window) before cannam@86: the MDCT can be considered orthogonal. cannam@86:

overlap/add data cannam@86: Windowed MDCT output is overlapped and added with the right hand data of the previous cannam@86: window such that the 3/4 point of the previous window is aligned with the 1/4 point of the cannam@86: current window (as illustrated in the window overlap diagram). At this point, the audio data cannam@86: between the center of the previous frame and the center of the current frame is now finished and cannam@86: ready to be returned. cannam@86:

cache right hand data cannam@86: The decoder must cache the right hand portion of the current frame to be lapped with the left cannam@86: hand portion of the next frame. cannam@86:

return finished audio data cannam@86: The overlapped portion produced from overlapping the previous and current frame data cannam@86: is finished data to be returned by the decoder. This data spans from the center of cannam@86: the previous window to the center of the current window. In the case of same-sized cannam@86: windows, the amount of data to return is one-half block consisting of and only of the cannam@86: overlapped portions. When overlapping a short and long window, much of the returned cannam@86: range is not actually overlap. This does not damage transform orthogonality. Pay cannam@86: attention however to returning the correct data range; the amount of data to be returned cannam@86: is: cannam@86:

cannam@86:

cannam@86: 1 window_blocksize(previous_window)/4+window_blocksize(current_window)/4 cannam@86:

cannam@86:

from the center of the previous window to the center of the current window. cannam@86:

Data is not returned from the first frame; it must be used to ’prime’ the decode engine. The cannam@86: encoder accounts for this priming when calculating PCM offsets; after the first frame, the proper cannam@86: PCM output offset is ’0’ (as no data has been returned yet). cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

2. Bitpacking Convention

cannam@86:

2.1. Overview

cannam@86:

The Vorbis codec uses relatively unstructured raw packets containing arbitrary-width binary cannam@86: integer fields. Logically, these packets are a bitstream in which bits are coded one-by-one by the cannam@86: encoder and then read one-by-one in the same monotonically increasing order by the decoder. cannam@86: Most current binary storage arrangements group bits into a native word size of eight bits cannam@86: (octets), sixteen bits, thirty-two bits or, less commonly other fixed word sizes. The Vorbis cannam@86: bitpacking convention specifies the correct mapping of the logical packet bitstream into an actual cannam@86: representation in fixed-width words. cannam@86:

cannam@86:

2.1.1. octets, bytes and words

cannam@86:

In most contemporary architectures, a ’byte’ is synonymous with an ’octet’, that is, eight bits. cannam@86: This has not always been the case; seven, ten, eleven and sixteen bit ’bytes’ have been used. cannam@86: For purposes of the bitpacking convention, a byte implies the native, smallest integer cannam@86: storage representation offered by a platform. On modern platforms, this is generally cannam@86: assumed to be eight bits (not necessarily because of the processor but because of the cannam@86: filesystem/memory architecture. Modern filesystems invariably offer bytes as the fundamental cannam@86: atom of storage). A ’word’ is an integer size that is a grouped multiple of this smallest cannam@86: size. cannam@86:

The most ubiquitous architectures today consider a ’byte’ to be an octet (eight bits) and a word cannam@86: to be a group of two, four or eight bytes (16, 32 or 64 bits). Note however that the Vorbis cannam@86: bitpacking convention is still well defined for any native byte size; Vorbis uses the native cannam@86: bit-width of a given storage system. This document assumes that a byte is one octet for purposes cannam@86: of example. cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

2.1.2. bit order

cannam@86:

A byte has a well-defined ’least significant’ bit (LSb), which is the only bit set when the byte is cannam@86: storing the two’s complement integer value +1. A byte’s ’most significant’ bit (MSb) is at the cannam@86: opposite end of the byte. Bits in a byte are numbered from zero at the LSb to n (n = 7 in an cannam@86: octet) for the MSb. cannam@86:

cannam@86:

2.1.3. byte order

cannam@86:

Words are native groupings of multiple bytes. Several byte orderings are possible in a word; the cannam@86: common ones are 3-2-1-0 (’big endian’ or ’most significant byte first’ in which the cannam@86: highest-valued byte comes first), 0-1-2-3 (’little endian’ or ’least significant byte first’ in cannam@86: which the lowest value byte comes first) and less commonly 3-1-2-0 and 0-2-1-3 (’mixed cannam@86: endian’). cannam@86:

The Vorbis bitpacking convention specifies storage and bitstream manipulation at the byte, not cannam@86: word, level, thus host word ordering is of a concern only during optimization when writing high cannam@86: performance code that operates on a word of storage at a time rather than by byte. cannam@86: Logically, bytes are always coded and decoded in order from byte zero through byte cannam@86: n. cannam@86:

cannam@86:

2.1.4. coding bits into byte sequences

cannam@86:

The Vorbis codec has need to code arbitrary bit-width integers, from zero to 32 bits cannam@86: wide, into packets. These integer fields are not aligned to the boundaries of the byte cannam@86: representation; the next field is written at the bit position at which the previous field cannam@86: ends. cannam@86:

The encoder logically packs integers by writing the LSb of a binary integer to the logical cannam@86: bitstream first, followed by next least significant bit, etc, until the requested number of bits cannam@86: have been coded. When packing the bits into bytes, the encoder begins by placing cannam@86: the LSb of the integer to be written into the least significant unused bit position of cannam@86: the destination byte, followed by the next-least significant bit of the source integer cannam@86: and so on up to the requested number of bits. When all bits of the destination byte cannam@86: have been filled, encoding continues by zeroing all bits of the next byte and writing cannam@86: the next bit into the bit position 0 of that byte. Decoding follows the same process cannam@86: cannam@86: cannam@86: cannam@86: as encoding, but by reading bits from the byte stream and reassembling them into cannam@86: integers. cannam@86:

cannam@86:

2.1.5. signedness

cannam@86:

The signedness of a specific number resulting from decode is to be interpreted by the decoder cannam@86: given decode context. That is, the three bit binary pattern ’b111’ can be taken to represent cannam@86: either ’seven’ as an unsigned integer, or ’-1’ as a signed, two’s complement integer. The cannam@86: encoder and decoder are responsible for knowing if fields are to be treated as signed or cannam@86: unsigned. cannam@86:

cannam@86:

2.1.6. coding example

cannam@86:

Code the 4 bit integer value ’12’ [b1100] into an empty bytestream. Bytestream result: cannam@86:

cannam@86:

cannam@86: 1                | cannam@86:
2                V cannam@86:
3   cannam@86:
4          7 6 5 4 3 2 1 0 cannam@86:
5  byte 0 [0 0 0 0 1 1 0 0]  <- cannam@86:
6  byte 1 [               ] cannam@86:
7  byte 2 [               ] cannam@86:
8  byte 3 [               ] cannam@86:
9               ... cannam@86:
10  byte n [               ]  bytestream length == 1 byte cannam@86:
11   cannam@86:

cannam@86:

Continue by coding the 3 bit integer value ’-1’ [b111]: cannam@86:

cannam@86:

cannam@86: 1          | cannam@86:
2          V cannam@86:
3   cannam@86:
4          7 6 5 4 3 2 1 0 cannam@86:
5  byte 0 [0 1 1 1 1 1 0 0]  <- cannam@86:
6  byte 1 [               ] cannam@86: cannam@86: cannam@86: cannam@86:
7  byte 2 [               ] cannam@86:
8  byte 3 [               ] cannam@86:
9               ... cannam@86:
10  byte n [               ]  bytestream length == 1 byte cannam@86:

cannam@86:

Continue by coding the 7 bit integer value ’17’ [b0010001]: cannam@86:

cannam@86:

cannam@86: 1            | cannam@86:
2            V cannam@86:
3   cannam@86:
4          7 6 5 4 3 2 1 0 cannam@86:
5  byte 0 [1 1 1 1 1 1 0 0] cannam@86:
6  byte 1 [0 0 0 0 1 0 0 0]  <- cannam@86:
7  byte 2 [               ] cannam@86:
8  byte 3 [               ] cannam@86:
9               ... cannam@86:
10  byte n [               ]  bytestream length == 2 bytes cannam@86:
11                            bit cursor == 6 cannam@86:

cannam@86:

Continue by coding the 13 bit integer value ’6969’ [b110 11001110 01]: cannam@86:

cannam@86:

cannam@86: 1                  | cannam@86:
2                  V cannam@86:
3   cannam@86:
4          7 6 5 4 3 2 1 0 cannam@86:
5  byte 0 [1 1 1 1 1 1 0 0] cannam@86:
6  byte 1 [0 1 0 0 1 0 0 0] cannam@86:
7  byte 2 [1 1 0 0 1 1 1 0] cannam@86:
8  byte 3 [0 0 0 0 0 1 1 0]  <- cannam@86:
9               ... cannam@86:
10  byte n [               ]  bytestream length == 4 bytes cannam@86:
11   cannam@86:

cannam@86:

2.1.7. decoding example

cannam@86:

Reading from the beginning of the bytestream encoded in the above example: cannam@86:

cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: 1                        | cannam@86:
2                        V cannam@86:
3   cannam@86:
4          7 6 5 4 3 2 1 0 cannam@86:
5  byte 0 [1 1 1 1 1 1 0 0]  <- cannam@86:
6  byte 1 [0 1 0 0 1 0 0 0] cannam@86:
7  byte 2 [1 1 0 0 1 1 1 0] cannam@86:
8  byte 3 [0 0 0 0 0 1 1 0]  bytestream length == 4 bytes cannam@86:
9   cannam@86:

cannam@86:

We read two, two-bit integer fields, resulting in the returned numbers ’b00’ and ’b11’. Two things cannam@86: are worth noting here: cannam@86:

Although these four bits were originally written as a single four-bit integer, reading cannam@86: some other combination of bit-widths from the bitstream is well defined. There are cannam@86: no artificial alignment boundaries maintained in the bitstream. cannam@86:
The second value is the two-bit-wide integer ’b11’. This value may be interpreted cannam@86: either as the unsigned value ’3’, or the signed value ’-1’. Signedness is dependent on cannam@86: decode context.

cannam@86:

2.1.8. end-of-packet alignment

cannam@86:

The typical use of bitpacking is to produce many independent byte-aligned packets which are cannam@86: embedded into a larger byte-aligned container structure, such as an Ogg transport bitstream. cannam@86: Externally, each bytestream (encoded bitstream) must begin and end on a byte boundary. Often, cannam@86: the encoded bitstream is not an integer number of bytes, and so there is unused (uncoded) space cannam@86: in the last byte of a packet. cannam@86:

Unused space in the last byte of a bytestream is always zeroed during the coding process. Thus, cannam@86: should this unused space be read, it will return binary zeroes. cannam@86:

Attempting to read past the end of an encoded packet results in an ’end-of-packet’ condition. cannam@86: End-of-packet is not to be considered an error; it is merely a state indicating that there is cannam@86: insufficient remaining data to fulfill the desired read size. Vorbis uses truncated packets as a cannam@86: normal mode of operation, and as such, decoders must handle reading past the end of a packet as cannam@86: a typical mode of operation. Any further read operations after an ’end-of-packet’ condition shall cannam@86: also return ’end-of-packet’. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

2.1.9. reading zero bits

cannam@86:

Reading a zero-bit-wide integer returns the value ’0’ and does not increment the stream cursor. cannam@86: Reading to the end of the packet (but not past, such that an ’end-of-packet’ condition has not cannam@86: triggered) and then reading a zero bit integer shall succeed, returning 0, and not trigger an cannam@86: end-of-packet condition. Reading a zero-bit-wide integer after a previous read sets ’end-of-packet’ cannam@86: shall also fail with ’end-of-packet’. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

3. Probability Model and Codebooks

cannam@86:

3.1. Overview

cannam@86:

Unlike practically every other mainstream audio codec, Vorbis has no statically configured cannam@86: probability model, instead packing all entropy decoding configuration, VQ and Huffman, into the cannam@86: bitstream itself in the third header, the codec setup header. This packed configuration consists of cannam@86: multiple ’codebooks’, each containing a specific Huffman-equivalent representation for decoding cannam@86: compressed codewords as well as an optional lookup table of output vector values to which a cannam@86: decoded Huffman value is applied as an offset, generating the final decoded output corresponding cannam@86: to a given compressed codeword. cannam@86:

cannam@86:

3.1.1. Bitwise operation

cannam@86:

The codebook mechanism is built on top of the vorbis bitpacker. Both the codebooks themselves cannam@86: and the codewords they decode are unrolled from a packet as a series of arbitrary-width values cannam@86: read from the stream according to Section 2, “Bitpacking Convention”. cannam@86:

cannam@86:

3.2. Packed codebook format

cannam@86:

For purposes of the examples below, we assume that the storage system’s native byte width is cannam@86: eight bits. This is not universally true; see Section 2, “Bitpacking Convention” for discussion cannam@86: relating to non-eight-bit bytes. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

3.2.1. codebook decode

cannam@86:

A codebook begins with a 24 bit sync pattern, 0x564342: cannam@86:

cannam@86:

cannam@86: 1  byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42) cannam@86:
2  byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43) cannam@86:
3  byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56) cannam@86:

cannam@86:

16 bit [codebook_dimensions] and 24 bit [codebook_entries] fields: cannam@86:

cannam@86:

cannam@86: 1   cannam@86:
2  byte 3: [ X X X X X X X X ] cannam@86:
3  byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned) cannam@86:
4   cannam@86:
5  byte 5: [ X X X X X X X X ] cannam@86:
6  byte 6: [ X X X X X X X X ] cannam@86:
7  byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned) cannam@86:
8   cannam@86:

cannam@86:

Next is the [ordered] bit flag: cannam@86:

cannam@86:

cannam@86: 1   cannam@86:
2  byte 8: [               X ] [ordered] (1 bit) cannam@86:
3   cannam@86:

cannam@86:

Each entry, numbering a total of [codebook_entries], is assigned a codeword length. cannam@86: We now read the list of codeword lengths and store these lengths in the array cannam@86: [codebook_codeword_lengths]. Decode of lengths is according to whether the [ordered] flag cannam@86: is set or unset. cannam@86:

If the [ordered] flag is unset, the codeword list is not length ordered and the decoder cannam@86: needs to read each codeword length one-by-one. cannam@86:
The decoder first reads one additional bit flag, the [sparse] flag. This flag determines cannam@86: whether or not the codebook contains unused entries that are not to be included in cannam@86: cannam@86: cannam@86: cannam@86: the codeword decode tree: cannam@86:
cannam@86:
cannam@86: 1  byte 8: [             X 1 ] [sparse] flag (1 bit) cannam@86:
cannam@86:
The decoder now performs for each of the [codebook_entries] codebook entries: cannam@86:
cannam@86:
cannam@86: 1   cannam@86:
2    1) if([sparse] is set) { cannam@86:
3   cannam@86:
4           2) [flag] = read one bit; cannam@86:
5           3) if([flag] is set) { cannam@86:
6   cannam@86:
7                4) [length] = read a five bit unsigned integer; cannam@86:
8                5) codeword length for this entry is [length]+1; cannam@86:
9   cannam@86:
10              } else { cannam@86:
11   cannam@86:
12                6) this entry is unused.  mark it as such. cannam@86:
13   cannam@86:
14              } cannam@86:
15   cannam@86:
16       } else the sparse flag is not set { cannam@86:
17   cannam@86:
18          7) [length] = read a five bit unsigned integer; cannam@86:
19          8) the codeword length for this entry is [length]+1; cannam@86:
20   cannam@86:
21       } cannam@86:
22   cannam@86:
cannam@86:
If the [ordered] flag is set, the codeword list for this codebook is encoded in cannam@86: ascending length order. Rather than reading a length for every codeword, the cannam@86: encoder reads the number of codewords per length. That is, beginning at entry cannam@86: zero: cannam@86:
cannam@86:
cannam@86: 1    1) [current_entry] = 0; cannam@86:
2    2) [current_length] = read a five bit unsigned integer and add 1; cannam@86:
3    3) [number] = read ilog([codebook_entries] - [current_entry]) bits as an unsigned integer cannam@86:
4    4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive, cannam@86:
5      of the [codebook_codeword_lengths] array to [current_length] cannam@86:
6    5) set [current_entry] to [number] + [current_entry] cannam@86:
7    6) increment [current_length] by 1 cannam@86:
8    7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; cannam@86:
9      the decoder will not be able to read this stream. cannam@86:
10    8) if [current_entry] is less than [codebook_entries], repeat process starting at 3) cannam@86:
11    9) done. cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86:

cannam@86:

After all codeword lengths have been decoded, the decoder reads the vector lookup table. Vorbis cannam@86: I supports three lookup types: cannam@86:

cannam@86: 1.: No lookup cannam@86:
cannam@86: 2.: Implicitly populated value mapping (lattice VQ) cannam@86:
cannam@86: 3.: Explicitly populated value mapping (tessellated or ’foam’ VQ)

cannam@86:

The lookup table type is read as a four bit unsigned integer: cannam@86:

cannam@86: 1 1) [codebook_lookup_type] = read four bits as an unsigned integer cannam@86:

cannam@86:

Codebook decode precedes according to [codebook_lookup_type]: cannam@86:

Lookup type zero indicates no lookup to be read. Proceed past lookup decode. cannam@86:
Lookup types one and two are similar, differing only in the number of lookup values to cannam@86: be read. Lookup type one reads a list of values that are permuted in a set pattern to cannam@86: build a list of vectors, each vector of order [codebook_dimensions] scalars. Lookup cannam@86: type two builds the same vector list, but reads each scalar for each vector explicitly, cannam@86: rather than building vectors from a smaller list of possible scalar values. Lookup cannam@86: decode proceeds as follows: cannam@86:
cannam@86:
cannam@86: 1    1) [codebook_minimum_value] = float32_unpack( read 32 bits as an unsigned integer) cannam@86:
2    2) [codebook_delta_value] = float32_unpack( read 32 bits as an unsigned integer) cannam@86:
3    3) [codebook_value_bits] = read 4 bits as an unsigned integer and add 1 cannam@86:
4    4) [codebook_sequence_p] = read 1 bit as a boolean flag cannam@86:
5   cannam@86:
6    if ( [codebook_lookup_type] is 1 ) { cannam@86:
7   cannam@86:
8       5) [codebook_lookup_values] = lookup1_values([codebook_entries], [codebook_dimensions] ) cannam@86:
9   cannam@86:
10    } else { cannam@86:
11   cannam@86:
12       6) [codebook_lookup_values] = [codebook_entries] * [codebook_dimensions] cannam@86:
13   cannam@86:
14    } cannam@86:
15   cannam@86:
16    7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each; cannam@86: cannam@86: cannam@86: cannam@86:
17       store these in order in the array [codebook_multiplicands] cannam@86:
cannam@86:
A [codebook_lookup_type] of greater than two is reserved and indicates a stream that is cannam@86: not decodable by the specification in this document. cannam@86:

cannam@86:

An ’end of packet’ during any read operation in the above steps is considered an error condition cannam@86: rendering the stream undecodable. cannam@86:

Huffman decision tree representation cannam@86: The [codebook_codeword_lengths] array and [codebook_entries] value uniquely define the cannam@86: Huffman decision tree used for entropy decoding. cannam@86:

Briefly, each used codebook entry (recall that length-unordered codebooks support unused cannam@86: codeword entries) is assigned, in order, the lowest valued unused binary Huffman codeword cannam@86: possible. Assume the following codeword length list: cannam@86:

cannam@86:

cannam@86: 1  entry 0: length 2 cannam@86:
2  entry 1: length 4 cannam@86:
3  entry 2: length 4 cannam@86:
4  entry 3: length 4 cannam@86:
5  entry 4: length 4 cannam@86:
6  entry 5: length 2 cannam@86:
7  entry 6: length 3 cannam@86:
8  entry 7: length 3 cannam@86:

cannam@86:

Assigning codewords in order (lowest possible value of the appropriate length to highest) results cannam@86: in the following codeword list: cannam@86:

cannam@86:

cannam@86: 1  entry 0: length 2 codeword 00 cannam@86:
2  entry 1: length 4 codeword 0100 cannam@86:
3  entry 2: length 4 codeword 0101 cannam@86:
4  entry 3: length 4 codeword 0110 cannam@86:
5  entry 4: length 4 codeword 0111 cannam@86:
6  entry 5: length 2 codeword 10 cannam@86:
7  entry 6: length 3 codeword 110 cannam@86:
8  entry 7: length 3 codeword 111 cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

Note: Unlike most binary numerical values in this document, we intend the above codewords to cannam@86: be read and used bit by bit from left to right, thus the codeword ’001’ is the bit string ’zero, zero, cannam@86: one’. When determining ’lowest possible value’ in the assignment definition above, the leftmost cannam@86: bit is the MSb. cannam@86:

It is clear that the codeword length list represents a Huffman decision tree with the entry cannam@86: numbers equivalent to the leaves numbered left-to-right: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 4: huffman tree illustration

cannam@86:

As we assign codewords in order, we see that each choice constructs a new leaf in the leftmost cannam@86: possible position. cannam@86:

Note that it’s possible to underspecify or overspecify a Huffman tree via the length list. cannam@86: In the above example, if codeword seven were eliminated, it’s clear that the tree is cannam@86: unfinished: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 5: underspecified huffman tree illustration

cannam@86:

Similarly, in the original codebook, it’s clear that the tree is fully populated and a ninth cannam@86: codeword is impossible. Both underspecified and overspecified trees are an error condition cannam@86: rendering the stream undecodable. Take special care that a codebook with a single used cannam@86: entry is handled properly; it consists of a single codework of zero bits and ’reading’ cannam@86: a value out of such a codebook always returns the single used value and sinks zero cannam@86: bits. cannam@86:

Codebook entries marked ’unused’ are simply skipped in the assigning process. They have no cannam@86: codeword and do not appear in the decision tree, thus it’s impossible for any bit pattern read cannam@86: from the stream to decode to that entry number. cannam@86: cannam@86: cannam@86: cannam@86:

VQ lookup table vector representation cannam@86: Unpacking the VQ lookup table vectors relies on the following values: cannam@86:

cannam@86: 1  the [codebook\_multiplicands] array cannam@86:
2  [codebook\_minimum\_value] cannam@86:
3  [codebook\_delta\_value] cannam@86:
4  [codebook\_sequence\_p] cannam@86:
5  [codebook\_lookup\_type] cannam@86:
6  [codebook\_entries] cannam@86:
7  [codebook\_dimensions] cannam@86:
8  [codebook\_lookup\_values] cannam@86:

cannam@86:

Decoding (unpacking) a specific vector in the vector lookup table proceeds according to cannam@86: [codebook_lookup_type]. The unpacked vector values are what a codebook would return cannam@86: during audio packet decode in a VQ context. cannam@86:

Vector value decode: Lookup type 1 cannam@86: Lookup type one specifies a lattice VQ lookup table built algorithmically from a list of cannam@86: scalar values. Calculate (unpack) the final values of a codebook entry vector from cannam@86: the entries in [codebook_multiplicands] as follows ([value_vector] is the output cannam@86: vector representing the vector of values for entry number [lookup_offset] in this cannam@86: codebook): cannam@86:

cannam@86:

cannam@86: 1    1) [last] = 0; cannam@86:
2    2) [index_divisor] = 1; cannam@86:
3    3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) { cannam@86:
4   cannam@86:
5         4) [multiplicand_offset] = ( [lookup_offset] divided by [index_divisor] using integer cannam@86:
6            division ) integer modulo [codebook_lookup_values] cannam@86:
7   cannam@86:
8         5) vector [value_vector] element [i] = cannam@86:
9              ( [codebook_multiplicands] array element number [multiplicand_offset] ) * cannam@86:
10              [codebook_delta_value] + [codebook_minimum_value] + [last]; cannam@86:
11   cannam@86:
12         6) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i] cannam@86:
13   cannam@86:
14         7) [index_divisor] = [index_divisor] * [codebook_lookup_values] cannam@86:
15   cannam@86:
16       } cannam@86:
17   cannam@86:
18    8) vector calculation completed. cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

Vector value decode: Lookup type 2 cannam@86: Lookup type two specifies a VQ lookup table in which each scalar in each vector is explicitly set cannam@86: by the [codebook_multiplicands] array in a one-to-one mapping. Calculate [unpack] the final cannam@86: values of a codebook entry vector from the entries in [codebook_multiplicands] as follows cannam@86: ([value_vector] is the output vector representing the vector of values for entry number cannam@86: [lookup_offset] in this codebook): cannam@86:

cannam@86:

cannam@86: 1    1) [last] = 0; cannam@86:
2    2) [multiplicand_offset] = [lookup_offset] * [codebook_dimensions] cannam@86:
3    3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) { cannam@86:
4   cannam@86:
5         4) vector [value_vector] element [i] = cannam@86:
6              ( [codebook_multiplicands] array element number [multiplicand_offset] ) * cannam@86:
7              [codebook_delta_value] + [codebook_minimum_value] + [last]; cannam@86:
8   cannam@86:
9         5) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i] cannam@86:
10   cannam@86:
11         6) increment [multiplicand_offset] cannam@86:
12   cannam@86:
13       } cannam@86:
14   cannam@86:
15    7) vector calculation completed. cannam@86:

cannam@86:

3.3. Use of the codebook abstraction

cannam@86:

The decoder uses the codebook abstraction much as it does the bit-unpacking convention; a cannam@86: specific codebook reads a codeword from the bitstream, decoding it into an entry number, and cannam@86: then returns that entry number to the decoder (when used in a scalar entropy coding context), or cannam@86: uses that entry number as an offset into the VQ lookup table, returning a vector of values (when cannam@86: used in a context desiring a VQ value). Scalar or VQ context is always explicit; any cannam@86: call to the codebook mechanism requests either a scalar entry number or a lookup cannam@86: vector. cannam@86:

Note that VQ lookup type zero indicates that there is no lookup table; requesting cannam@86: decode using a codebook of lookup type 0 in any context expecting a vector return cannam@86: value (even in a case where a vector of dimension one) is forbidden. If decoder setup cannam@86: or decode requests such an action, that is an error condition rendering the packet cannam@86: cannam@86: cannam@86: cannam@86: undecodable. cannam@86:

Using a codebook to read from the packet bitstream consists first of reading and decoding the cannam@86: next codeword in the bitstream. The decoder reads bits until the accumulated bits match a cannam@86: codeword in the codebook. This process can be though of as logically walking the cannam@86: Huffman decode tree by reading one bit at a time from the bitstream, and using the cannam@86: bit as a decision boolean to take the 0 branch (left in the above examples) or the 1 cannam@86: branch (right in the above examples). Walking the tree finishes when the decode process cannam@86: hits a leaf in the decision tree; the result is the entry number corresponding to that cannam@86: leaf. Reading past the end of a packet propagates the ’end-of-stream’ condition to the cannam@86: decoder. cannam@86:

When used in a scalar context, the resulting codeword entry is the desired return cannam@86: value. cannam@86:

When used in a VQ context, the codeword entry number is used as an offset into the VQ lookup cannam@86: table. The value returned to the decoder is the vector of scalars corresponding to this cannam@86: offset. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

4. Codec Setup and Packet Decode

cannam@86:

4.1. Overview

cannam@86:

This document serves as the top-level reference document for the bit-by-bit decode specification cannam@86: of Vorbis I. This document assumes a high-level understanding of the Vorbis decode cannam@86: process, which is provided in Section 1, “Introduction and Description”. Section 2, cannam@86: “Bitpacking Convention” covers reading and writing bit fields from and to bitstream cannam@86: packets. cannam@86:

cannam@86:

4.2. Header decode and decode setup

cannam@86:

A Vorbis bitstream begins with three header packets. The header packets are, in order, the cannam@86: identification header, the comments header, and the setup header. All are required for decode cannam@86: compliance. An end-of-packet condition during decoding the first or third header packet renders cannam@86: the stream undecodable. End-of-packet decoding the comment header is a non-fatal error cannam@86: condition. cannam@86:

cannam@86:

4.2.1. Common header decode

cannam@86:

Each header packet begins with the same header fields. cannam@86:

cannam@86:

cannam@86: 1 1) [packet_type] : 8 bit value cannam@86:
2 2) 0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters ’v’,’o’,’r’,’b’,’i’,’s’ as six octets cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

Decode continues according to packet type; the identification header is type 1, the comment cannam@86: header type 3 and the setup header type 5 (these types are all odd as a packet with a leading cannam@86: single bit of ’0’ is an audio packet). The packets must occur in the order of identification, cannam@86: comment, setup. cannam@86:

cannam@86:

4.2.2. Identification header

cannam@86:

The identification header is a short header of only a few fields used to declare the stream cannam@86: definitively as Vorbis, and provide a few externally relevant pieces of information about the audio cannam@86: stream. The identification header is coded as follows: cannam@86:

cannam@86:

cannam@86: 1   1) [vorbis_version] = read 32 bits as unsigned integer cannam@86:
2   2) [audio_channels] = read 8 bit integer as unsigned cannam@86:
3   3) [audio_sample_rate] = read 32 bits as unsigned integer cannam@86:
4   4) [bitrate_maximum] = read 32 bits as signed integer cannam@86:
5   5) [bitrate_nominal] = read 32 bits as signed integer cannam@86:
6   6) [bitrate_minimum] = read 32 bits as signed integer cannam@86:
7   7) [blocksize_0] = 2 exponent (read 4 bits as unsigned integer) cannam@86:
8   8) [blocksize_1] = 2 exponent (read 4 bits as unsigned integer) cannam@86:
9   9) [framing_flag] = read one bit cannam@86:

cannam@86:

[vorbis_version] is to read ’0’ in order to be compatible with this document. Both cannam@86: [audio_channels] and [audio_sample_rate] must read greater than zero. Allowed final cannam@86: blocksize values are 64, 128, 256, 512, 1024, 2048, 4096 and 8192 in Vorbis I. [blocksize_0] cannam@86: must be less than or equal to [blocksize_1]. The framing bit must be nonzero. Failure to meet cannam@86: any of these conditions renders a stream undecodable. cannam@86:

The bitrate fields above are used only as hints. The nominal bitrate field especially may be cannam@86: considerably off in purely VBR streams. The fields are meaningful only when greater than cannam@86: zero. cannam@86:

All three fields set to the same value implies a fixed rate, or tightly bounded, nearly cannam@86: fixed-rate bitstream cannam@86:
Only nominal set implies a VBR or ABR stream that averages the nominal bitrate cannam@86: cannam@86: cannam@86: cannam@86:
Maximum and or minimum set implies a VBR bitstream that obeys the bitrate limits cannam@86:
None set indicates the encoder does not care to speculate.

cannam@86:

4.2.3. Comment header

cannam@86:

Comment header decode and data specification is covered in Section 5, “comment field and cannam@86: header specification”. cannam@86:

cannam@86:

4.2.4. Setup header

cannam@86:

Vorbis codec setup is configurable to an extreme degree: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 6: decoder pipeline configuration

cannam@86:

The setup header contains the bulk of the codec setup information needed for decode. The setup cannam@86: header contains, in order, the lists of codebook configurations, time-domain transform cannam@86: configurations (placeholders in Vorbis I), floor configurations, residue configurations, channel cannam@86: mapping configurations and mode configurations. It finishes with a framing bit of ’1’. Header cannam@86: decode proceeds in the following order: cannam@86:

Codebooks cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86: 1.: [vorbis_codebook_count] = read eight bits as unsigned integer and add one cannam@86:
cannam@86: 2.: Decode [vorbis_codebook_count] codebooks in order as defined in Section 3, cannam@86: “Probability Model and Codebooks”. Save each configuration, in order, in an array cannam@86: of codebook configurations [vorbis_codebook_configurations].

cannam@86:

Time domain transforms cannam@86: These hooks are placeholders in Vorbis I. Nevertheless, the configuration placeholder values must cannam@86: be read to maintain bitstream sync. cannam@86:

cannam@86:

cannam@86: 1.: [vorbis_time_count] = read 6 bits as unsigned integer and add one cannam@86:
cannam@86: 2.: read [vorbis_time_count] 16 bit values; each value should be zero. If any value is cannam@86: nonzero, this is an error condition and the stream is undecodable.

cannam@86:

Floors cannam@86: Vorbis uses two floor types; header decode is handed to the decode abstraction of the appropriate cannam@86: type. cannam@86:

cannam@86:

cannam@86: 1.

[vorbis_floor_count] = read 6 bits as unsigned integer and add one cannam@86:

cannam@86: 2.

For each [i] of [vorbis_floor_count] floor numbers: cannam@86:

cannam@86: a): read the floor type: vector [vorbis_floor_types] element [i] = read 16 bits cannam@86: as unsigned integer cannam@86:
cannam@86: b): If the floor type is zero, decode the floor configuration as defined in Section 6, cannam@86: “Floor type 0 setup and decode”; save this configuration in slot [i] of the floor cannam@86: configuration array [vorbis_floor_configurations]. cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: c): If the floor type is one, decode the floor configuration as defined in Section 7, cannam@86: “Floor type 1 setup and decode”; save this configuration in slot [i] of the floor cannam@86: configuration array [vorbis_floor_configurations]. cannam@86:
cannam@86: d): If the the floor type is greater than one, this stream is undecodable; ERROR cannam@86: CONDITION

cannam@86:

Residues cannam@86: Vorbis uses three residue types; header decode of each type is identical. cannam@86:

cannam@86:

cannam@86: 1.

[vorbis_residue_count] = read 6 bits as unsigned integer and add one cannam@86:

cannam@86: 2.

For each of [vorbis_residue_count] residue numbers: cannam@86:

cannam@86: a): read the residue type; vector [vorbis_residue_types] element [i] = read 16 cannam@86: bits as unsigned integer cannam@86:
cannam@86: b): If the residue type is zero, one or two, decode the residue configuration as defined cannam@86: in Section 8, “Residue setup and decode”; save this configuration in slot [i] of cannam@86: the residue configuration array [vorbis_residue_configurations]. cannam@86:
cannam@86: c): If the the residue type is greater than two, this stream is undecodable; ERROR cannam@86: CONDITION

cannam@86:

Mappings cannam@86: Mappings are used to set up specific pipelines for encoding multichannel audio with varying cannam@86: channel mapping applications. Vorbis I uses a single mapping type (0), with implicit PCM cannam@86: channel mappings. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: 1.

[vorbis_mapping_count] = read 6 bits as unsigned integer and add one cannam@86:

cannam@86: 2.

For each [i] of [vorbis_mapping_count] mapping numbers: cannam@86:

cannam@86: a)

read the mapping type: 16 bits as unsigned integer. There’s no reason to save cannam@86: the mapping type in Vorbis I. cannam@86:

cannam@86: b)

If the mapping type is nonzero, the stream is undecodable cannam@86:

cannam@86: c)

If the mapping type is zero: cannam@86:

cannam@86: i.

read 1 bit as a boolean flag cannam@86:

cannam@86: A.: if set, [vorbis_mapping_submaps] = read 4 bits as unsigned integer cannam@86: and add one cannam@86:
cannam@86: B.: if unset, [vorbis_mapping_submaps] = 1

cannam@86:

cannam@86: ii.

read 1 bit as a boolean flag cannam@86:

cannam@86: A.

if set, square polar channel mapping is in use: cannam@86:

[vorbis_mapping_coupling_steps] = read 8 bits as unsigned cannam@86: integer and add one cannam@86:
for [j] each of [vorbis_mapping_coupling_steps] steps: cannam@86:
- vector [vorbis_mapping_magnitude] element [j]= read cannam@86: ilog([audio_channels] - 1) bits as unsigned integer cannam@86:
- vector [vorbis_mapping_angle] element [j]= read cannam@86: ilog([audio_channels] - 1) bits as unsigned integer cannam@86:
- the numbers read in the above two steps are channel numbers cannam@86: representing the channel to treat as magnitude and the channel cannam@86: to treat as angle, respectively. If for any coupling step the cannam@86: angle channel number equals the magnitude channel number, the cannam@86: magnitude channel number is greater than [audio_channels]-1, or cannam@86: the angle channel is greater than [audio_channels]-1, the stream cannam@86: is undecodable.
cannam@86:

cannam@86:

cannam@86: B.

if unset, [vorbis_mapping_coupling_steps] = 0

cannam@86:

cannam@86: iii.

read 2 bits (reserved field); if the value is nonzero, the stream is undecodable cannam@86:

cannam@86: iv.

if [vorbis_mapping_submaps] is greater than one, we read channel multiplex cannam@86: settings. For each [j] of [audio_channels] channels: cannam@86:

cannam@86: A.: vector [vorbis_mapping_mux] element [j] = read 4 bits as unsigned cannam@86: integer cannam@86:
cannam@86: B.: if the value is greater than the highest numbered submap cannam@86: ([vorbis_mapping_submaps] - 1), this in an error condition rendering cannam@86: the stream undecodable

cannam@86:

cannam@86: v.

for each submap [j] of [vorbis_mapping_submaps] submaps, read the floor and cannam@86: residue numbers for use in decoding that submap: cannam@86:

cannam@86: A.: read and discard 8 bits (the unused time configuration placeholder) cannam@86:
cannam@86: B.: read 8 bits as unsigned integer for the floor number; save in vector cannam@86: [vorbis_mapping_submap_floor] element [j] cannam@86:
cannam@86: C.: verify the floor number is not greater than the highest number floor cannam@86: configured for the bitstream. If it is, the bitstream is undecodable cannam@86:
cannam@86: D.: read 8 bits as unsigned integer for the residue number; save in vector cannam@86: [vorbis_mapping_submap_residue] element [j] cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: E.: verify the residue number is not greater than the highest number residue cannam@86: configured for the bitstream. If it is, the bitstream is undecodable

cannam@86:

cannam@86: vi.

save this mapping configuration in slot [i] of the mapping configuration array cannam@86: [vorbis_mapping_configurations].

cannam@86:

Modes cannam@86:

cannam@86: 1.

[vorbis_mode_count] = read 6 bits as unsigned integer and add one cannam@86:

cannam@86: 2.

For each of [vorbis_mode_count] mode numbers: cannam@86:

cannam@86: a): [vorbis_mode_blockflag] = read 1 bit cannam@86:
cannam@86: b): [vorbis_mode_windowtype] = read 16 bits as unsigned integer cannam@86:
cannam@86: c): [vorbis_mode_transformtype] = read 16 bits as unsigned integer cannam@86:
cannam@86: d): [vorbis_mode_mapping] = read 8 bits as unsigned integer cannam@86:
cannam@86: e): verify ranges; zero is the only legal value in cannam@86: Vorbis I for [vorbis_mode_windowtype] and [vorbis_mode_transformtype]. cannam@86: [vorbis_mode_mapping] must not be greater than the highest number mapping cannam@86: in use. Any illegal values render the stream undecodable. cannam@86:
cannam@86: f): save this mode configuration in slot [i] of the mode configuration array cannam@86: [vorbis_mode_configurations].

cannam@86:

cannam@86: 3.

read 1 bit as a framing flag. If unset, a framing error occurred and the stream is not cannam@86: decodable.

cannam@86: cannam@86: cannam@86: cannam@86:

After reading mode descriptions, setup header decode is complete. cannam@86:

cannam@86:

4.3. Audio packet decode and synthesis

cannam@86:

Following the three header packets, all packets in a Vorbis I stream are audio. The first step of cannam@86: audio packet decode is to read and verify the packet type. A non-audio packet when audio is cannam@86: expected indicates stream corruption or a non-compliant stream. The decoder must ignore the cannam@86: packet and not attempt decoding it to audio. cannam@86:

cannam@86:

4.3.1. packet type, mode and window decode

cannam@86:

cannam@86: 1.

read 1 bit [packet_type]; check that packet type is 0 (audio) cannam@86:

cannam@86: 2.

read ilog([vorbis_mode_count]-1) bits [mode_number] cannam@86:

cannam@86: 3.

decode blocksize [n] is equal to [blocksize_0] if [vorbis_mode_blockflag] is 0, cannam@86: else [n] is equal to [blocksize_1]. cannam@86:

cannam@86: 4.

perform window selection and setup; this window is used later by the inverse cannam@86: MDCT: cannam@86:

cannam@86: a)

if this is a long window (the [vorbis_mode_blockflag] flag of this mode is cannam@86: set): cannam@86:

cannam@86: i.: read 1 bit for [previous_window_flag] cannam@86:
cannam@86: ii.: read 1 bit for [next_window_flag] cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: iii.: if [previous_window_flag] is not set, the left half of the window will cannam@86: be a hybrid window for lapping with a short block. See paragraph 1.3.2, cannam@86: “Window shape decode (long windows only)” for an illustration of cannam@86: overlapping dissimilar windows. Else, the left half window will have normal cannam@86: long shape. cannam@86:
cannam@86: iv.: if [next_window_flag] is not set, the right half of the window will be cannam@86: a hybrid window for lapping with a short block. See paragraph 1.3.2, cannam@86: “Window shape decode (long windows only)” for an illustration of cannam@86: overlapping dissimilar windows. Else, the left right window will have normal cannam@86: long shape.

cannam@86:

cannam@86: b)

if this is a short window, the window is always the same short-window cannam@86: shape.

cannam@86:

Vorbis windows all use the slope function y = sin( ∗ sin ²((x + 0.5)∕n ∗ π)), where n is window cannam@86: size and x ranges 0…n− 1, but dissimilar lapping requirements can affect overall shape. Window cannam@86: generation proceeds as follows: cannam@86:

cannam@86:

cannam@86: 1.

[window_center] = [n] / 2 cannam@86:

cannam@86: 2.

if ([vorbis_mode_blockflag] is set and [previous_window_flag] is not set) cannam@86: then cannam@86:

cannam@86: a): [left_window_start] = [n]/4 - [blocksize_0]/4 cannam@86:
cannam@86: b): [left_window_end] = [n]/4 + [blocksize_0]/4 cannam@86:
cannam@86: c): [left_n] = [blocksize_0]/2

cannam@86:

else cannam@86:

cannam@86: a): [left_window_start] = 0 cannam@86:
cannam@86: b): [left_window_end] = [window_center] cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: c): [left_n] = [n]/2

cannam@86:

cannam@86: 3.

if ([vorbis_mode_blockflag] is set and [next_window_flag] is not set) then cannam@86:

cannam@86: a): [right_window_start] = [n]*3/4 - [blocksize_0]/4 cannam@86:
cannam@86: b): [right_window_end] = [n]*3/4 + [blocksize_0]/4 cannam@86:
cannam@86: c): [right_n] = [blocksize_0]/2

cannam@86:

else cannam@86:

cannam@86: a): [right_window_start] = [window_center] cannam@86:
cannam@86: b): [right_window_end] = [n] cannam@86:
cannam@86: c): [right_n] = [n]/2

cannam@86:

cannam@86: 4.

window from range 0 ... [left_window_start]-1 inclusive is zero cannam@86:

cannam@86: 5.

for [i] in range [left_window_start] ... [left_window_end]-1, window([i]) = cannam@86: sin( π
cannam@86: 2

∗ sin ²( ([i]-[left_window_start]+0.5) / [left_n] ∗ π
cannam@86: 2

) ) cannam@86:

cannam@86: 6.

window from range [left_window_end] ... [right_window_start]-1 inclusive is cannam@86: one cannam@86:

cannam@86: 7.

for [i] in range [right_window_start] ... [right_window_end]-1, window([i]) = cannam@86: sin( π
cannam@86: 2

∗ sin ²( ([i]-[right_window_start]+0.5) / [right_n] ∗ π
cannam@86: 2

) ) cannam@86:

cannam@86: 8.

window from range [right_window_start] ... [n]-1 is zero

cannam@86:

An end-of-packet condition up to this point should be considered an error that discards this cannam@86: packet from the stream. An end of packet condition past this point is to be considered a possible cannam@86: nominal occurrence. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

4.3.2. floor curve decode

cannam@86:

From this point on, we assume out decode context is using mode number [mode_number] cannam@86: from configuration array [vorbis_mode_configurations] and the map number cannam@86: [vorbis_mode_mapping] (specified by the current mode) taken from the mapping configuration cannam@86: array [vorbis_mapping_configurations]. cannam@86:

Floor curves are decoded one-by-one in channel order. cannam@86:

For each floor [i] of [audio_channels] cannam@86:

cannam@86: 1.: [submap_number] = element [i] of vector [vorbis_mapping_mux] cannam@86:
cannam@86: 2.: [floor_number] = element [submap_number] of vector [vorbis_submap_floor] cannam@86:
cannam@86: 3.: if the floor type of this floor (vector cannam@86: [vorbis_floor_types] element [floor_number]) is zero then decode the floor for cannam@86: channel [i] according to the subsubsection 6.2.2, “packet decode” cannam@86:
cannam@86: 4.: if the type of this floor is one then decode the floor for channel [i] according to the cannam@86: subsubsection 7.2.3, “packet decode” cannam@86:
cannam@86: 5.: save the needed decoded floor information for channel for later synthesis cannam@86:
cannam@86: 6.: if the decoded floor returned ’unused’, set vector [no_residue] element [i] to true, cannam@86: else set vector [no_residue] element [i] to false

cannam@86:

An end-of-packet condition during floor decode shall result in packet decode zeroing all channel cannam@86: output vectors and skipping to the add/overlap output stage. cannam@86:

cannam@86:

4.3.3. nonzero vector propagate

cannam@86:

A possible result of floor decode is that a specific vector is marked ’unused’ which indicates that cannam@86: that final output vector is all-zero values (and the floor is zero). The residue for that vector is not cannam@86: coded in the stream, save for one complication. If some vectors are used and some are not, cannam@86: cannam@86: cannam@86: cannam@86: channel coupling could result in mixing a zeroed and nonzeroed vector to produce two nonzeroed cannam@86: vectors. cannam@86:

for each [i] from 0 ... [vorbis_mapping_coupling_steps]-1 cannam@86:

cannam@86:

cannam@86: 1.: if either [no_residue] entry for channel ([vorbis_mapping_magnitude] element cannam@86: [i]) or channel ([vorbis_mapping_angle] element [i]) are set to false, then both cannam@86: must be set to false. Note that an ’unused’ floor has no decoded floor information; it cannam@86: is important that this is remembered at floor curve synthesis time.

cannam@86:

4.3.4. residue decode

cannam@86:

Unlike floors, which are decoded in channel order, the residue vectors are decoded in submap cannam@86: order. cannam@86:

for each submap [i] in order from 0 ... [vorbis_mapping_submaps]-1 cannam@86:

cannam@86:

cannam@86: 1.

[ch] = 0 cannam@86:

cannam@86: 2.

for each channel [j] in order from 0 ... [audio_channels] - 1 cannam@86:

cannam@86: a)

if channel [j] in submap [i] (vector [vorbis_mapping_mux] element [j] is equal to cannam@86: [i]) cannam@86:

cannam@86: i.

if vector [no_residue] element [j] is true cannam@86:

cannam@86: A.: vector [do_not_decode_flag] element [ch] is set

cannam@86:

else cannam@86:

cannam@86: A.: vector [do_not_decode_flag] element [ch] is unset

cannam@86:

cannam@86: ii.

increment [ch]

cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: 3.

[residue_number] = vector [vorbis_mapping_submap_residue] element [i] cannam@86:

cannam@86: 4.

[residue_type] = vector [vorbis_residue_types] element [residue_number] cannam@86:

cannam@86: 5.

decode [ch] vectors using residue [residue_number], according to type [residue_type], cannam@86: also passing vector [do_not_decode_flag] to indicate which vectors in the bundle should cannam@86: not be decoded. Correct per-vector decode length is [n]/2. cannam@86:

cannam@86: 6.

[ch] = 0 cannam@86:

cannam@86: 7.

for each channel [j] in order from 0 ... [audio_channels] cannam@86:

cannam@86: a)

if channel [j] is in submap [i] (vector [vorbis_mapping_mux] element [j] is equal cannam@86: to [i]) cannam@86:

cannam@86: i.: residue vector for channel [j] is set to decoded residue vector [ch] cannam@86:
cannam@86: ii.: increment [ch]

cannam@86:

4.3.5. inverse coupling

cannam@86:

for each [i] from [vorbis_mapping_coupling_steps]-1 descending to 0 cannam@86:

cannam@86:

cannam@86: 1.

[magnitude_vector] = the residue vector for channel (vector cannam@86: [vorbis_mapping_magnitude] element [i]) cannam@86:

cannam@86: 2.

[angle_vector] = the residue vector for channel (vector [vorbis_mapping_angle] cannam@86: cannam@86: cannam@86: cannam@86: element [i]) cannam@86:

cannam@86: 3.

for each scalar value [M] in vector [magnitude_vector] and the corresponding scalar value cannam@86: [A] in vector [angle_vector]: cannam@86:

cannam@86: a)

if ([M] is greater than zero) cannam@86:

cannam@86: i.

if ([A] is greater than zero) cannam@86:

cannam@86: A.: [new_M] = [M] cannam@86:
cannam@86: B.: [new_A] = [M]-[A]

cannam@86:

else cannam@86:

cannam@86: A.: [new_A] = [M] cannam@86:
cannam@86: B.: [new_M] = [M]+[A]

cannam@86:

else cannam@86:

cannam@86: i.

if ([A] is greater than zero) cannam@86:

cannam@86: A.: [new_M] = [M] cannam@86:
cannam@86: B.: [new_A] = [M]+[A]

cannam@86:

else cannam@86:

cannam@86: A.: [new_A] = [M] cannam@86:
cannam@86: B.: [new_M] = [M]-[A]

cannam@86:

cannam@86: b)

set scalar value [M] in vector [magnitude_vector] to [new_M] cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86: c)

set scalar value [A] in vector [angle_vector] to [new_A]

cannam@86:

4.3.6. dot product

cannam@86:

For each channel, synthesize the floor curve from the decoded floor information, according to cannam@86: packet type. Note that the vector synthesis length for floor computation is [n]/2. cannam@86:

For each channel, multiply each element of the floor curve by each element of that cannam@86: channel’s residue vector. The result is the dot product of the floor and residue vectors for cannam@86: each channel; the produced vectors are the length [n]/2 audio spectrum for each cannam@86: channel. cannam@86:

cannam@86:

4.3.7. inverse MDCT

cannam@86:

Convert the audio spectrum vector of each channel back into time domain PCM audio via an cannam@86: cannam@86: cannam@86: cannam@86: inverse Modified Discrete Cosine Transform (MDCT). A detailed description of the MDCT is cannam@86: available in [1]. The window function used for the MDCT is the function described cannam@86: earlier. cannam@86:

cannam@86:

4.3.8. overlap_add

cannam@86:

Windowed MDCT output is overlapped and added with the right hand data of the previous cannam@86: window such that the 3/4 point of the previous window is aligned with the 1/4 point of the cannam@86: current window (as illustrated in paragraph 1.3.2, “Window shape decode (long windows cannam@86: only)”). The overlapped portion produced from overlapping the previous and current frame data cannam@86: is finished data to be returned by the decoder. This data spans from the center of cannam@86: the previous window to the center of the current window. In the case of same-sized cannam@86: windows, the amount of data to return is one-half block consisting of and only of the cannam@86: overlapped portions. When overlapping a short and long window, much of the returned cannam@86: range does not actually overlap. This does not damage transform orthogonality. Pay cannam@86: attention however to returning the correct data range; the amount of data to be returned cannam@86: is: cannam@86:

cannam@86:

cannam@86: 1 window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4 cannam@86:

cannam@86:

from the center (element windowsize/2) of the previous window to the center (element cannam@86: windowsize/2-1, inclusive) of the current window. cannam@86:

cannam@86:

4.3.9. output channel order

cannam@86:

Vorbis I specifies only a channel mapping type 0. In mapping type 0, channel mapping is cannam@86: implicitly defined as follows for standard audio applications. As of revision 16781 (20100113), the cannam@86: specification adds defined channel locations for 6.1 and 7.1 surround. Ordering/location for cannam@86: cannam@86: cannam@86: cannam@86: greater-than-eight channels remains ’left to the implementation’. cannam@86:

These channel orderings refer to order within the encoded stream. It is naturally possible for a cannam@86: decoder to produce output with channels in any order. Any such decoder should explicitly cannam@86: document channel reordering behavior. cannam@86:

cannam@86:

cannam@86: one channel: the stream is monophonic cannam@86:
cannam@86: two channels: the stream is stereo. channel order: left, right cannam@86:
cannam@86: three channels: the stream is a 1d-surround encoding. channel order: left, center, right cannam@86:
cannam@86: four channels: the stream is quadraphonic surround. channel order: front left, front right, cannam@86: rear left, rear right cannam@86:
cannam@86: five channels: the stream is five-channel surround. channel order: front left, center, front cannam@86: right, rear left, rear right cannam@86:
cannam@86: six channels: the stream is 5.1 surround. channel order: front left, center, front right, rear cannam@86: left, rear right, LFE cannam@86:
cannam@86: seven channels: the stream is 6.1 surround. channel order: front left, center, front right, cannam@86: side left, side right, rear center, LFE cannam@86:
cannam@86: eight channels: the stream is 7.1 surround. channel order: front left, center, front right, cannam@86: side left, side right, rear left, rear right, LFE cannam@86:
cannam@86: greater than eight channels: channel use and order is defined by the application cannam@86:

cannam@86:

Applications using Vorbis for dedicated purposes may define channel mapping as seen fit. Future cannam@86: channel mappings (such as three and four channel Ambisonics) will make use of channel cannam@86: mappings other than mapping 0. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

5. comment field and header specification

cannam@86:

5.1. Overview

cannam@86:

The Vorbis text comment header is the second (of three) header packets that begin a Vorbis cannam@86: bitstream. It is meant for short text comments, not arbitrary metadata; arbitrary metadata cannam@86: belongs in a separate logical bitstream (usually an XML stream type) that provides greater cannam@86: structure and machine parseability. cannam@86:

The comment field is meant to be used much like someone jotting a quick note on the bottom of cannam@86: a CDR. It should be a little information to remember the disc by and explain it to others; a cannam@86: short, to-the-point text note that need not only be a couple words, but isn’t going to be more cannam@86: than a short paragraph. The essentials, in other words, whatever they turn out to be, cannam@86: eg: cannam@86:

cannam@86:

Honest Bob and the Factory-to-Dealer-Incentives, “I’m Still Around”, opening cannam@86: for Moxy Früvous, 1997.

cannam@86:

5.2. Comment encoding

cannam@86:

5.2.1. Structure

cannam@86:

The comment header is logically a list of eight-bit-clean vectors; the number of vectors is cannam@86: bounded to 2³² − 1 and the length of each vector is limited to 2³² − 1 bytes. The vector length is cannam@86: cannam@86: cannam@86: cannam@86: encoded; the vector contents themselves are not null terminated. In addition to the vector list, cannam@86: there is a single vector for vendor name (also 8 bit clean, length encoded in 32 bits). For cannam@86: example, the 1.0 release of libvorbis set the vendor string to “Xiph.Org libVorbis I cannam@86: 20020717”. cannam@86:

The vector lengths and number of vectors are stored lsb first, according to the bit cannam@86: packing conventions of the vorbis codec. However, since data in the comment header cannam@86: is octet-aligned, they can simply be read as unaligned 32 bit little endian unsigned cannam@86: integers. cannam@86:

The comment header is decoded as follows: cannam@86:

cannam@86:

cannam@86: 1    1) [vendor\_length] = read an unsigned integer of 32 bits cannam@86:
2    2) [vendor\_string] = read a UTF-8 vector as [vendor\_length] octets cannam@86:
3    3) [user\_comment\_list\_length] = read an unsigned integer of 32 bits cannam@86:
4    4) iterate [user\_comment\_list\_length] times { cannam@86:
5         5) [length] = read an unsigned integer of 32 bits cannam@86:
6         6) this iteration’s user comment = read a UTF-8 vector as [length] octets cannam@86:
7       } cannam@86:
8    7) [framing\_bit] = read a single bit as boolean cannam@86:
9    8) if ( [framing\_bit] unset or end-of-packet ) then ERROR cannam@86:
10    9) done. cannam@86:

cannam@86:

5.2.2. Content vector format

cannam@86:

The comment vectors are structured similarly to a UNIX environment variable. That is, cannam@86: comment fields consist of a field name and a corresponding value and look like: cannam@86:

cannam@86:

cannam@86: 1 comment[0]="ARTIST=me"; cannam@86:
2 comment[1]="TITLE=the sound of Vorbis"; cannam@86:

cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

The field name is case-insensitive and may consist of ASCII 0x20 through 0x7D, 0x3D (’=’) cannam@86: excluded. ASCII 0x41 through 0x5A inclusive (characters A-Z) is to be considered equivalent to cannam@86: ASCII 0x61 through 0x7A inclusive (characters a-z). cannam@86:

The field name is immediately followed by ASCII 0x3D (’=’); this equals sign is used to cannam@86: terminate the field name. cannam@86:

0x3D is followed by 8 bit clean UTF-8 encoded value of the field contents to the end of the cannam@86: field. cannam@86:

Field names cannam@86: Below is a proposed, minimal list of standard field names with a description of intended use. No cannam@86: single or group of field names is mandatory; a comment header may contain one, all or none of cannam@86: the names in this list. cannam@86:

cannam@86:

cannam@86: TITLE: Track/Work name cannam@86:
cannam@86: VERSION: The version field may be used to differentiate multiple versions of the same cannam@86: track title in a single collection. (e.g. remix info) cannam@86:
cannam@86: ALBUM: The collection name to which this track belongs cannam@86:
cannam@86: TRACKNUMBER: The track number of this piece if part of a specific larger collection or cannam@86: album cannam@86:
cannam@86: ARTIST: The artist generally considered responsible for the work. In popular music this is cannam@86: usually the performing band or singer. For classical music it would be the composer. cannam@86: For an audio book it would be the author of the original text. cannam@86:
cannam@86: PERFORMER: The artist(s) who performed the work. In classical music this would be the cannam@86: conductor, orchestra, soloists. In an audio book it would be the actor who did the cannam@86: reading. In popular music this is typically the same as the ARTIST and is omitted. cannam@86:
cannam@86: COPYRIGHT: Copyright attribution, e.g., ’2001 Nobody’s Band’ or ’1999 Jack Moffitt’ cannam@86:
cannam@86: cannam@86: cannam@86: cannam@86: LICENSE: License information, eg, ’All Rights Reserved’, ’Any Use Permitted’, a URL to cannam@86: a license such as a Creative cannam@86: Commons license (”www.creativecommons.org/blahblah/license.html”) or the EFF cannam@86: Open Audio License (’distributed under the terms of the Open Audio License. see cannam@86: http://www.eff.org/IP/Open_licenses/eff_oal.html for details’), etc. cannam@86:
cannam@86: ORGANIZATION: Name of the organization producing the track (i.e. the ’record label’) cannam@86:
cannam@86: DESCRIPTION: A short text description of the contents cannam@86:
cannam@86: GENRE: A short text indication of music genre cannam@86:
cannam@86: DATE: Date the track was recorded cannam@86:
cannam@86: LOCATION: Location where track was recorded cannam@86:
cannam@86: CONTACT: Contact information for the creators or distributors of the track. This could cannam@86: be a URL, an email address, the physical address of the producing label. cannam@86:
cannam@86: ISRC: International Standard Recording Code for the track; see the ISRC intro page for cannam@86: more information on ISRC numbers. cannam@86:

cannam@86:

Implications cannam@86: Field names should not be ’internationalized’; this is a concession to simplicity not cannam@86: an attempt to exclude the majority of the world that doesn’t speak English. Field cannam@86: contents, however, use the UTF-8 character encoding to allow easy representation of any cannam@86: language. cannam@86:

We have the length of the entirety of the field and restrictions on the field name so that cannam@86: the field name is bounded in a known way. Thus we also have the length of the field cannam@86: contents. cannam@86:

Individual ’vendors’ may use non-standard field names within reason. The proper cannam@86: use of comment fields should be clear through context at this point. Abuse will be cannam@86: discouraged. cannam@86: cannam@86: cannam@86: cannam@86:

There is no vendor-specific prefix to ’nonstandard’ field names. Vendors should make some effort cannam@86: to avoid arbitrarily polluting the common namespace. We will generally collect the more useful cannam@86: tags here to help with standardization. cannam@86:

Field names are not required to be unique (occur once) within a comment header. As an cannam@86: example, assume a track was recorded by three well know artists; the following is permissible, cannam@86: and encouraged: cannam@86:

cannam@86:

cannam@86: 1  ARTIST=Dizzy Gillespie cannam@86:
2  ARTIST=Sonny Rollins cannam@86:
3  ARTIST=Sonny Stitt cannam@86:

cannam@86:

5.2.3. Encoding

cannam@86:

The comment header comprises the entirety of the second bitstream header packet. Unlike the cannam@86: first bitstream header packet, it is not generally the only packet on the second page and may not cannam@86: be restricted to within the second bitstream page. The length of the comment header packet is cannam@86: (practically) unbounded. The comment header packet is not optional; it must be present in the cannam@86: bitstream even if it is effectively empty. cannam@86:

The comment header is encoded as follows (as per Ogg’s standard bitstream mapping which cannam@86: renders least-significant-bit of the word to be coded into the least significant available bit of the cannam@86: current bitstream octet first): cannam@86:

cannam@86:

cannam@86: 1.: Vendor string length (32 bit unsigned quantity specifying number of octets) cannam@86:
cannam@86: 2.: Vendor string ([vendor string length] octets coded from beginning of string to end of cannam@86: string, not null terminated) cannam@86: cannam@86: cannam@86: cannam@86:
cannam@86: 3.: Number of comment fields (32 bit unsigned quantity specifying number of fields) cannam@86:
cannam@86: 4.: Comment field 0 length (if [Number of comment fields] > 0; 32 bit unsigned quantity cannam@86: specifying number of octets) cannam@86:
cannam@86: 5.: Comment field 0 ([Comment field 0 length] octets coded from beginning of string to cannam@86: end of string, not null terminated) cannam@86:
cannam@86: 6.: Comment field 1 length (if [Number of comment fields] > 1...)... cannam@86:

cannam@86:

This is actually somewhat easier to describe in code; implementation of the above can be found cannam@86: in vorbis/lib/info.c, _vorbis_pack_comment() and _vorbis_unpack_comment(). cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

6. Floor type 0 setup and decode

cannam@86:

6.1. Overview

cannam@86:

Vorbis floor type zero uses Line Spectral Pair (LSP, also alternately known as Line Spectral cannam@86: Frequency or LSF) representation to encode a smooth spectral envelope curve as the frequency cannam@86: response of the LSP filter. This representation is equivalent to a traditional all-pole infinite cannam@86: impulse response filter as would be used in linear predictive coding; LSP representation may be cannam@86: converted to LPC representation and vice-versa. cannam@86:

cannam@86:

6.2. Floor 0 format

cannam@86:

Floor zero configuration consists of six integer fields and a list of VQ codebooks for use in cannam@86: coding/decoding the LSP filter coefficient values used by each frame. cannam@86:

cannam@86:

6.2.1. header decode

cannam@86:

Configuration information for instances of floor zero decodes from the codec setup header (third cannam@86: packet). configuration decode proceeds as follows: cannam@86:

cannam@86:

cannam@86: 1    1) [floor0_order] = read an unsigned integer of 8 bits cannam@86:
2    2) [floor0_rate] = read an unsigned integer of 16 bits cannam@86:
3    3) [floor0_bark_map_size] = read an unsigned integer of 16 bits cannam@86:
4    4) [floor0_amplitude_bits] = read an unsigned integer of six bits cannam@86:
5    5) [floor0_amplitude_offset] = read an unsigned integer of eight bits cannam@86:
6    6) [floor0_number_of_books] = read an unsigned integer of four bits and add 1 cannam@86:
7    7) array [floor0_book_list] = read a list of [floor0_number_of_books] unsigned integers of eight bits each; cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

An end-of-packet condition during any of these bitstream reads renders this stream undecodable. cannam@86: In addition, any element of the array [floor0_book_list] that is greater than the maximum cannam@86: codebook number for this bitstream is an error condition that also renders the stream cannam@86: undecodable. cannam@86:

cannam@86:

6.2.2. packet decode

cannam@86:

Extracting a floor0 curve from an audio packet consists of first decoding the curve cannam@86: amplitude and [floor0_order] LSP coefficient values from the bitstream, and then cannam@86: computing the floor curve, which is defined as the frequency response of the decoded LSP cannam@86: filter. cannam@86:

Packet decode proceeds as follows: cannam@86:

cannam@86: 1    1) [amplitude] = read an unsigned integer of [floor0_amplitude_bits] bits cannam@86:
2    2) if ( [amplitude] is greater than zero ) { cannam@86:
3         3) [coefficients] is an empty, zero length vector cannam@86:
4         4) [booknumber] = read an unsigned integer of ilog( [floor0_number_of_books] ) bits cannam@86:
5         5) if ( [booknumber] is greater than the highest number decode codebook ) then packet is undecodable cannam@86:
6         6) [last] = zero; cannam@86:
7         7) vector [temp_vector] = read vector from bitstream using codebook number [floor0_book_list] element [booknumber] in VQ context. cannam@86:
8         8) add the scalar value [last] to each scalar in vector [temp_vector] cannam@86:
9         9) [last] = the value of the last scalar in vector [temp_vector] cannam@86:
10        10) concatenate [temp_vector] onto the end of the [coefficients] vector cannam@86:
11        11) if (length of vector [coefficients] is less than [floor0_order], continue at step 6 cannam@86:
12   cannam@86:
13       } cannam@86:
14   cannam@86:
15   12) done. cannam@86:
16   cannam@86:

cannam@86:

Take note of the following properties of decode: cannam@86:

An [amplitude] value of zero must result in a return code that indicates this channel cannam@86: is unused in this frame (the output of the channel will be all-zeroes in synthesis). cannam@86: Several later stages of decode don’t occur for an unused channel. cannam@86:
An end-of-packet condition during decode should be considered a nominal occruence; cannam@86: if end-of-packet is reached during any read operation above, floor decode is to return cannam@86: ’unused’ status as if the [amplitude] value had read zero at the beginning of decode. cannam@86:
The book number used for decode can, in fact, be stored in the bitstream in ilog( cannam@86: cannam@86: cannam@86: cannam@86: [floor0_number_of_books] - 1 ) bits. Nevertheless, the above specification is correct cannam@86: and values greater than the maximum possible book value are reserved. cannam@86:
The number of scalars read into the vector [coefficients] may be greater cannam@86: than [floor0_order], the number actually required for curve computation. For cannam@86: example, if the VQ codebook used for the floor currently being decoded has a cannam@86: [codebook_dimensions] value of three and [floor0_order] is ten, the only way to cannam@86: fill all the needed scalars in [coefficients] is to to read a total of twelve scalars cannam@86: as four vectors of three scalars each. This is not an error condition, and care must cannam@86: be taken not to allow a buffer overflow in decode. The extra values are not used and cannam@86: may be ignored or discarded.

cannam@86:

6.2.3. curve computation

cannam@86:

Given an [amplitude] integer and [coefficients] vector from packet decode as well as cannam@86: the [floor0_order], [floor0_rate], [floor0_bark_map_size], [floor0_amplitude_bits] and cannam@86: [floor0_amplitude_offset] values from floor setup, and an output vector size [n] specified by the cannam@86: decode process, we compute a floor output vector. cannam@86:

If the value [amplitude] is zero, the return value is a length [n] vector with all-zero cannam@86: scalars. Otherwise, begin by assuming the following definitions for the given vector to be cannam@86: synthesized: cannam@86:

{ cannam@86: min (floor0_bark_map_size − 1,foobar ) for i ∈ [0,n − 1 ] cannam@86: mapi = − 1 for i = n cannam@86:

cannam@86:

where cannam@86:

⌊ ⌋ cannam@86: (floor0_rate ⋅ i) floor0_bark_map_size cannam@86: foobar = bark -------2n------- ⋅-bark(.5 ⋅ floor0_rate-) cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

and cannam@86:

2 cannam@86: bark(x) = 13.1arctan (.00074x ) + 2.24 arctan(.0000000185x + .0001x ) cannam@86:

cannam@86:

The above is used to synthesize the LSP curve on a Bark-scale frequency axis, then map the cannam@86: result to a linear-scale frequency axis. Similarly, the below calculation synthesizes the output cannam@86: LSP curve [output] on a log (dB) amplitude scale, mapping it to linear amplitude in the last cannam@86: step: cannam@86:

cannam@86:

cannam@86: 1.

[i] = 0 cannam@86:

cannam@86: 2.

[ω] = π * map element [i] / [floor0_bark_map_size] cannam@86:

cannam@86: 3.

if ( [floor0_order] is odd ) cannam@86:

cannam@86: a): calculate [p] and [q] according to:
cannam@86: $floor0_order−3 cannam@86: 2 ∏2 2 cannam@86: p = (1 − cos ω) 4(cos([coefficients ]2j+1) − cosω ) cannam@86: floor0_order−1 j=0 cannam@86: 1 ----∏2---- cannam@86: q = -- 4(cos([coefficients ]2j) − cosω )2 cannam@86: 4 j=0 cannam@86: cannam@86: cannam@86: cannam@86:$ cannam@86:
cannam@86:

cannam@86:

else [floor0_order] is even cannam@86:

cannam@86: b): calculate [p] and [q] according to:
cannam@86: $floor0_order−2 cannam@86: (1-−-cosω-) ∏2 2 cannam@86: p = 2 4(cos([coefficients ]2j+1) − cosω) cannam@86: j=0 cannam@86: floor0_∏o2rder−-2 cannam@86: q = (1-+-cosω-) 4(cos([coefficients ]2j) − cos ω)2 cannam@86: 2 j=0 cannam@86:$ cannam@86:
cannam@86:

cannam@86:

cannam@86: 4.

calculate [linear_floor_value] according to: cannam@86:

( ( )) cannam@86: exp .11512925 amplitude---⋅ floor0_amplitute_√offset---− floor0_amplitude_offset cannam@86: (2floor0_amplitude_bits − 1) p + q cannam@86:

cannam@86:

cannam@86: 5.

[iteration_condition] = map element [i] cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86: 6.

[output] element [i] = [linear_floor_value] cannam@86:

cannam@86: 7.

increment [i] cannam@86:

cannam@86: 8.

if ( map element [i] is equal to [iteration_condition] ) continue at step cannam@86: 5 cannam@86:

cannam@86: 9.

if ( [i] is less than [n] ) continue at step 2 cannam@86:

cannam@86: 10.

done

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

7. Floor type 1 setup and decode

cannam@86:

7.1. Overview

cannam@86:

Vorbis floor type one uses a piecewise straight-line representation to encode a spectral envelope cannam@86: curve. The representation plots this curve mechanically on a linear frequency axis and a cannam@86: logarithmic (dB) amplitude axis. The integer plotting algorithm used is similar to Bresenham’s cannam@86: algorithm. cannam@86:

cannam@86:

7.2. Floor 1 format

cannam@86:

7.2.1. model

cannam@86:

Floor type one represents a spectral curve as a series of line segments. Synthesis constructs a cannam@86: floor curve using iterative prediction in a process roughly equivalent to the following simplified cannam@86: description: cannam@86:

the first line segment (base case) is a logical line spanning from x˙0,y˙0 to x˙1,y˙1 cannam@86: where in the base case x˙0=0 and x˙1=[n], the full range of the spectral floor to be cannam@86: computed. cannam@86:
the induction step chooses a point x˙new within an existing logical line segment and cannam@86: produces a y˙new value at that point computed from the existing line’s y value at cannam@86: x˙new (as plotted by the line) and a difference value decoded from the bitstream cannam@86: packet. cannam@86: cannam@86: cannam@86: cannam@86:
floor computation produces two new line segments, one running from x˙0,y˙0 to cannam@86: x˙new,y˙new and from x˙new,y˙new to x˙1,y˙1. This step is performed logically even if cannam@86: y˙new represents no change to the amplitude value at x˙new so that later refinement cannam@86: is additionally bounded at x˙new. cannam@86:
the induction step repeats, using a list of x values specified in the codec setup header cannam@86: at floor 1 initialization time. Computation is completed at the end of the x value list. cannam@86:

cannam@86:

Consider the following example, with values chosen for ease of understanding rather than cannam@86: representing typical configuration: cannam@86:

For the below example, we assume a floor setup with an [n] of 128. The list of selected X values cannam@86: in increasing order is 0,16,32,48,64,80,96,112 and 128. In list order, the values interleave as 0, cannam@86: 128, 64, 32, 96, 16, 48, 80 and 112. The corresponding list-order Y values as decoded from an cannam@86: example packet are 110, 20, -5, -45, 0, -25, -10, 30 and -10. We compute the floor in the following cannam@86: way, beginning with the first line: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 7: graph of example floor

cannam@86:

We now draw new logical lines to reflect the correction to new˙Y, and iterate for X positions 32 cannam@86: and 96: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 8: graph of example floor

cannam@86:

Although the new Y value at X position 96 is unchanged, it is still used later as an endpoint for cannam@86: further refinement. From here on, the pattern should be clear; we complete the floor computation cannam@86: as follows: cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 9: graph of example floor

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 10: graph of example floor

cannam@86:

A more efficient algorithm with carefully defined integer rounding behavior is used for actual cannam@86: decode, as described later. The actual algorithm splits Y value computation and line plotting cannam@86: into two steps with modifications to the above algorithm to eliminate noise accumulation cannam@86: through integer roundoff/truncation. cannam@86:

cannam@86:

7.2.2. header decode

cannam@86:

A list of floor X values is stored in the packet header in interleaved format (used in list order cannam@86: during packet decode and synthesis). This list is split into partitions, and each partition is cannam@86: assigned to a partition class. X positions 0 and [n] are implicit and do not belong to an explicit cannam@86: partition or partition class. cannam@86:

A partition class consists of a representation vector width (the number of Y values which cannam@86: the partition class encodes at once), a ’subclass’ value representing the number of cannam@86: alternate entropy books the partition class may use in representing Y values, the list of cannam@86: [subclass] books and a master book used to encode which alternate books were chosen cannam@86: for representation in a given packet. The master/subclass mechanism is meant to be cannam@86: used as a flexible representation cascade while still using codebooks only in a scalar cannam@86: context. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: 1   cannam@86:
2    1) [floor1_partitions] = read 5 bits as unsigned integer cannam@86:
3    2) [maximum_class] = -1 cannam@86:
4    3) iterate [i] over the range 0 ... [floor1_partitions]-1 { cannam@86:
5   cannam@86:
6          4) vector [floor1_partition_class_list] element [i] = read 4 bits as unsigned integer cannam@86:
7   cannam@86:
8       } cannam@86:
9   cannam@86:
10    5) [maximum_class] = largest integer scalar value in vector [floor1_partition_class_list] cannam@86:
11    6) iterate [i] over the range 0 ... [maximum_class] { cannam@86:
12   cannam@86:
13          7) vector [floor1_class_dimensions] element [i] = read 3 bits as unsigned integer and add 1 cannam@86:
14   8) vector [floor1_class_subclasses] element [i] = read 2 bits as unsigned integer cannam@86:
15          9) if ( vector [floor1_class_subclasses] element [i] is nonzero ) { cannam@86:
16   cannam@86:
17               10) vector [floor1_class_masterbooks] element [i] = read 8 bits as unsigned integer cannam@86:
18   cannam@86:
19             } cannam@86:
20   cannam@86:
21         11) iterate [j] over the range 0 ... (2 exponent [floor1_class_subclasses] element [i]) - 1 { cannam@86:
22   cannam@86:
23               12) array [floor1_subclass_books] element [i],[j] = cannam@86:
24                   read 8 bits as unsigned integer and subtract one cannam@86:
25             } cannam@86:
26        } cannam@86:
27   cannam@86:
28   13) [floor1_multiplier] = read 2 bits as unsigned integer and add one cannam@86:
29   14) [rangebits] = read 4 bits as unsigned integer cannam@86:
30   15) vector [floor1_X_list] element [0] = 0 cannam@86:
31   16) vector [floor1_X_list] element [1] = 2 exponent [rangebits]; cannam@86:
32   17) [floor1_values] = 2 cannam@86:
33   18) iterate [i] over the range 0 ... [floor1_partitions]-1 { cannam@86:
34   cannam@86:
35         19) [current_class_number] = vector [floor1_partition_class_list] element [i] cannam@86:
36         20) iterate [j] over the range 0 ... ([floor1_class_dimensions] element [current_class_number])-1 { cannam@86:
37               21) vector [floor1_X_list] element ([floor1_values]) = cannam@86:
38                   read [rangebits] bits as unsigned integer cannam@86:
39               22) increment [floor1_values] by one cannam@86:
40             } cannam@86:
41       } cannam@86:
42   cannam@86:
43   23) done cannam@86:

cannam@86:

An end-of-packet condition while reading any aspect of a floor 1 configuration during cannam@86: setup renders a stream undecodable. In addition, a [floor1_class_masterbooks] or cannam@86: [floor1_subclass_books] scalar element greater than the highest numbered codebook cannam@86: configured in this stream is an error condition that renders the stream undecodable. Vector cannam@86: [floor1_x_list] is limited to a maximum length of 65 elements; a setup indicating more than 65 cannam@86: total elements (including elements 0 and 1 set prior to the read loop) renders the stream cannam@86: undecodable. All vector [floor1_x_list] element values must be unique within the vector; a cannam@86: non-unique value renders the stream undecodable. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

7.2.3. packet decode

cannam@86:

Packet decode begins by checking the [nonzero] flag: cannam@86:

cannam@86:

cannam@86: 1 1) [nonzero] = read 1 bit as boolean cannam@86:

cannam@86:

If [nonzero] is unset, that indicates this channel contained no audio energy in this frame. cannam@86: Decode immediately returns a status indicating this floor curve (and thus this channel) is unused cannam@86: this frame. (A return status of ’unused’ is different from decoding a floor that has all cannam@86: points set to minimum representation amplitude, which happens to be approximately cannam@86: -140dB). cannam@86:

Assuming [nonzero] is set, decode proceeds as follows: cannam@86:

cannam@86:

cannam@86: 1    1) [range] = vector { 256, 128, 86, 64 } element ([floor1_multiplier]-1) cannam@86:
2    2) vector [floor1_Y] element [0] = read ilog([range]-1) bits as unsigned integer cannam@86:
3    3) vector [floor1_Y] element [1] = read ilog([range]-1) bits as unsigned integer cannam@86:
4    4) [offset] = 2; cannam@86:
5    5) iterate [i] over the range 0 ... [floor1_partitions]-1 { cannam@86:
6   cannam@86:
7         6) [class] = vector [floor1_partition_class]  element [i] cannam@86:
8         7) [cdim]  = vector [floor1_class_dimensions] element [class] cannam@86:
9         8) [cbits] = vector [floor1_class_subclasses] element [class] cannam@86:
10         9) [csub]  = (2 exponent [cbits])-1 cannam@86:
11        10) [cval]  = 0 cannam@86:
12        11) if ( [cbits] is greater than zero ) { cannam@86:
13   cannam@86:
14               12) [cval] = read from packet using codebook number cannam@86:
15                   (vector [floor1_class_masterbooks] element [class]) in scalar context cannam@86:
16            } cannam@86:
17   cannam@86:
18        13) iterate [j] over the range 0 ... [cdim]-1 { cannam@86:
19   cannam@86:
20               14) [book] = array [floor1_subclass_books] element [class],([cval] bitwise AND [csub]) cannam@86:
21               15) [cval] = [cval] right shifted [cbits] bits cannam@86:
22        16) if ( [book] is not less than zero ) { cannam@86:
23   cannam@86:
24              17) vector [floor1_Y] element ([j]+[offset]) = read from packet using codebook cannam@86:
25                         [book] in scalar context cannam@86:
26   cannam@86:
27                   } else [book] is less than zero { cannam@86:
28   cannam@86:
29              18) vector [floor1_Y] element ([j]+[offset]) = 0 cannam@86:
30   cannam@86:
31                   } cannam@86:
32            } cannam@86:
33   cannam@86:
34        19) [offset] = [offset] + [cdim] cannam@86: cannam@86: cannam@86: cannam@86:
35   cannam@86:
36       } cannam@86:
37   cannam@86:
38   20) done cannam@86:

cannam@86:

An end-of-packet condition during curve decode should be considered a nominal occurrence; if cannam@86: end-of-packet is reached during any read operation above, floor decode is to return ’unused’ cannam@86: status as if the [nonzero] flag had been unset at the beginning of decode. cannam@86:

Vector [floor1_Y] contains the values from packet decode needed for floor 1 synthesis. cannam@86:

cannam@86:

7.2.4. curve computation

cannam@86:

Curve computation is split into two logical steps; the first step derives final Y amplitude values cannam@86: from the encoded, wrapped difference values taken from the bitstream. The second step cannam@86: plots the curve lines. Also, although zero-difference values are used in the iterative cannam@86: prediction to find final Y values, these points are conditionally skipped during final cannam@86: line computation in step two. Skipping zero-difference values allows a smoother line cannam@86: fit. cannam@86:

Although some aspects of the below algorithm look like inconsequential optimizations, cannam@86: implementors are warned to follow the details closely. Deviation from implementing a strictly cannam@86: equivalent algorithm can result in serious decoding errors. cannam@86:

Additional note: Although [floor1_final_Y] values in the prediction loop and at the end of cannam@86: step 1 are inherently limited by the prediction algorithm to [0, [range]), it is possible to abuse cannam@86: the setup and codebook machinery to produce negative or over-range results. We suggest that cannam@86: decoder implementations guard the values in vector [floor1_final_Y] by clamping each cannam@86: element to [0, [range]) after step 1. Variants of this suggestion are acceptable as valid floor1 cannam@86: setups cannot produce out of range values. cannam@86:

cannam@86:

cannam@86: step 1: amplitude value synthesis

cannam@86:

Unwrap the always-positive-or-zero values read from the packet into +/- difference cannam@86: values, then apply to line prediction. cannam@86:

cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: 1    1) [range] = vector { 256, 128, 86, 64 } element ([floor1_multiplier]-1) cannam@86:
2    2) vector [floor1_step2_flag] element [0] = set cannam@86:
3    3) vector [floor1_step2_flag] element [1] = set cannam@86:
4    4) vector [floor1_final_Y] element [0] = vector [floor1_Y] element [0] cannam@86:
5    5) vector [floor1_final_Y] element [1] = vector [floor1_Y] element [1] cannam@86:
6    6) iterate [i] over the range 2 ... [floor1_values]-1 { cannam@86:
7   cannam@86:
8         7) [low_neighbor_offset] = low_neighbor([floor1_X_list],[i]) cannam@86:
9         8) [high_neighbor_offset] = high_neighbor([floor1_X_list],[i]) cannam@86:
10   cannam@86:
11         9) [predicted] = render_point( vector [floor1_X_list] element [low_neighbor_offset], cannam@86:
12         vector [floor1_final_Y] element [low_neighbor_offset], cannam@86:
13                                        vector [floor1_X_list] element [high_neighbor_offset], cannam@86:
14         vector [floor1_final_Y] element [high_neighbor_offset], cannam@86:
15                                        vector [floor1_X_list] element [i] ) cannam@86:
16   cannam@86:
17        10) [val] = vector [floor1_Y] element [i] cannam@86:
18        11) [highroom] = [range] - [predicted] cannam@86:
19        12) [lowroom]  = [predicted] cannam@86:
20        13) if ( [highroom] is less than [lowroom] ) { cannam@86:
21   cannam@86:
22              14) [room] = [highroom] * 2 cannam@86:
23   cannam@86:
24            } else [highroom] is not less than [lowroom] { cannam@86:
25   cannam@86:
26              15) [room] = [lowroom] * 2 cannam@86:
27   cannam@86:
28            } cannam@86:
29   cannam@86:
30        16) if ( [val] is nonzero ) { cannam@86:
31   cannam@86:
32              17) vector [floor1_step2_flag] element [low_neighbor_offset] = set cannam@86:
33              18) vector [floor1_step2_flag] element [high_neighbor_offset] = set cannam@86:
34              19) vector [floor1_step2_flag] element [i] = set cannam@86:
35              20) if ( [val] is greater than or equal to [room] ) { cannam@86:
36   cannam@86:
37                    21) if ( [highroom] is greater than [lowroom] ) { cannam@86:
38   cannam@86:
39                          22) vector [floor1_final_Y] element [i] = [val] - [lowroom] + [predicted] cannam@86:
40   cannam@86:
41         } else [highroom] is not greater than [lowroom] { cannam@86:
42   cannam@86:
43                          23) vector [floor1_final_Y] element [i] = [predicted] - [val] + [highroom] - 1 cannam@86:
44   cannam@86:
45                        } cannam@86:
46   cannam@86:
47                  } else [val] is less than [room] { cannam@86:
48   cannam@86:
49                      24) if ([val] is odd) { cannam@86:
50   cannam@86:
51                          25) vector [floor1_final_Y] element [i] = cannam@86:
52                              [predicted] - (([val] + 1) divided by  2 using integer division) cannam@86:
53   cannam@86:
54                        } else [val] is even { cannam@86:
55   cannam@86:
56                          26) vector [floor1_final_Y] element [i] = cannam@86:
57                              [predicted] + ([val] / 2 using integer division) cannam@86:
58   cannam@86:
59                        } cannam@86:
60   cannam@86:
61                  } cannam@86:
62   cannam@86:
63            } else [val] is zero { cannam@86:
64   cannam@86:
65              27) vector [floor1_step2_flag] element [i] = unset cannam@86: cannam@86: cannam@86: cannam@86:
66              28) vector [floor1_final_Y] element [i] = [predicted] cannam@86:
67   cannam@86:
68            } cannam@86:
69   cannam@86:
70       } cannam@86:
71   cannam@86:
72   29) done cannam@86:
73   cannam@86:

cannam@86:

cannam@86: step 2: curve synthesis

cannam@86:

Curve synthesis generates a return vector [floor] of length [n] (where [n] is provided by cannam@86: the decode process calling to floor decode). Floor 1 curve synthesis makes use of the cannam@86: [floor1_X_list], [floor1_final_Y] and [floor1_step2_flag] vectors, as well as cannam@86: [floor1_multiplier] and [floor1_values] values. cannam@86:

Decode begins by sorting the scalars from vectors [floor1_X_list], [floor1_final_Y] and cannam@86: [floor1_step2_flag] together into new vectors [floor1_X_list]’, [floor1_final_Y]’ cannam@86: and [floor1_step2_flag]’ according to ascending sort order of the values in cannam@86: [floor1_X_list]. That is, sort the values of [floor1_X_list] and then apply the same cannam@86: permutation to elements of the other two vectors so that the X, Y and step2_flag values cannam@86: still match. cannam@86:

Then compute the final curve in one pass: cannam@86:

cannam@86:

cannam@86: 1    1) [hx] = 0 cannam@86:
2    2) [lx] = 0 cannam@86:
3    3) [ly] = vector [floor1_final_Y]’ element [0] * [floor1_multiplier] cannam@86:
4    4) iterate [i] over the range 1 ... [floor1_values]-1 { cannam@86:
5   cannam@86:
6         5) if ( [floor1_step2_flag]’ element [i] is set ) { cannam@86:
7   cannam@86:
8               6) [hy] = [floor1_final_Y]’ element [i] * [floor1_multiplier] cannam@86:
9         7) [hx] = [floor1_X_list]’ element [i] cannam@86:
10               8) render_line( [lx], [ly], [hx], [hy], [floor] ) cannam@86:
11               9) [lx] = [hx] cannam@86:
12       10) [ly] = [hy] cannam@86:
13            } cannam@86:
14       } cannam@86:
15   cannam@86:
16   11) if ( [hx] is less than [n] ) { cannam@86:
17   cannam@86:
18          12) render_line( [hx], [hy], [n], [hy], [floor] ) cannam@86:
19   cannam@86:
20       } cannam@86:
21   cannam@86:
22   13) if ( [hx] is greater than [n] ) { cannam@86:
23   cannam@86:
24              14) truncate vector [floor] to [n] elements cannam@86:
25   cannam@86:
26       } cannam@86:
27   cannam@86: cannam@86: cannam@86: cannam@86:
28   15) for each scalar in vector [floor], perform a lookup substitution using cannam@86:
29       the scalar value from [floor] as an offset into the vector [floor1_inverse_dB_static_table] cannam@86:
30   cannam@86:
31   16) done cannam@86:
32   cannam@86:

cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

8. Residue setup and decode

cannam@86:

8.1. Overview

cannam@86:

A residue vector represents the fine detail of the audio spectrum of one channel in an audio frame cannam@86: after the encoder subtracts the floor curve and performs any channel coupling. A residue vector cannam@86: may represent spectral lines, spectral magnitude, spectral phase or hybrids as mixed by channel cannam@86: coupling. The exact semantic content of the vector does not matter to the residue cannam@86: abstraction. cannam@86:

Whatever the exact qualities, the Vorbis residue abstraction codes the residue vectors into the cannam@86: bitstream packet, and then reconstructs the vectors during decode. Vorbis makes use of three cannam@86: different encoding variants (numbered 0, 1 and 2) of the same basic vector encoding cannam@86: abstraction. cannam@86:

cannam@86:

8.2. Residue format

cannam@86:

Residue format partitions each vector in the vector bundle into chunks, classifies each cannam@86: chunk, encodes the chunk classifications and finally encodes the chunks themselves cannam@86: using the the specific VQ arrangement defined for each selected classification. The cannam@86: exact interleaving and partitioning vary by residue encoding number, however the cannam@86: high-level process used to classify and encode the residue vector is the same in all three cannam@86: variants. cannam@86:

A set of coded residue vectors are all of the same length. High level coding structure, ignoring for cannam@86: the moment exactly how a partition is encoded and simply trusting that it is, is as cannam@86: follows: cannam@86:

Each vector is partitioned into multiple equal sized chunks according to configuration cannam@86: specified. If we have a vector size of n, a partition size residue_partition_size, cannam@86: and a total of ch residue vectors, the total number of partitioned chunks coded cannam@86: cannam@86: cannam@86: cannam@86: is n/residue_partition_size*ch. It is important to note that the integer division cannam@86: truncates. In the below example, we assume an example residue_partition_size of 8. cannam@86:
Each partition in each vector has a classification number that specifies which of cannam@86: multiple configured VQ codebook setups are used to decode that partition. The cannam@86: classification numbers of each partition can be thought of as forming a vector in cannam@86: their own right, as in the illustration below. Just as the residue vectors are coded cannam@86: in grouped partitions to increase encoding efficiency, the classification vector is also cannam@86: partitioned into chunks. The integer elements of each scalar in a classification chunk cannam@86: are built into a single scalar that represents the classification numbers in that chunk. cannam@86: In the below example, the classification codeword encodes two classification numbers. cannam@86:
The values in a residue vector may be encoded monolithically in a single pass through cannam@86: the residue vector, but more often efficient codebook design dictates that each vector cannam@86: is encoded as the additive sum of several passes through the residue vector using cannam@86: more than one VQ codebook. Thus, each residue value potentially accumulates values cannam@86: from multiple decode passes. The classification value associated with a partition is cannam@86: the same in each pass, thus the classification codeword is coded only in the first pass. cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 11: illustration of residue vector format

cannam@86:

8.3. residue 0

cannam@86:

Residue 0 and 1 differ only in the way the values within a residue partition are interleaved during cannam@86: partition encoding (visually treated as a black box–or cyan box or brown box–in the above cannam@86: figure). cannam@86:

Residue encoding 0 interleaves VQ encoding according to the dimension of the codebook used to cannam@86: cannam@86: cannam@86: cannam@86: encode a partition in a specific pass. The dimension of the codebook need not be the same in cannam@86: multiple passes, however the partition size must be an even multiple of the codebook cannam@86: dimension. cannam@86:

As an example, assume a partition vector of size eight, to be encoded by residue 0 using cannam@86: codebook sizes of 8, 4, 2 and 1: cannam@86:

cannam@86:

cannam@86: 1   cannam@86:
2              original residue vector: [ 0 1 2 3 4 5 6 7 ] cannam@86:
3   cannam@86:
4  codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ] cannam@86:
5   cannam@86:
6  codebook dimensions = 4  encoded as: [ 0 2 4 6 ], [ 1 3 5 7 ] cannam@86:
7   cannam@86:
8  codebook dimensions = 2  encoded as: [ 0 4 ], [ 1 5 ], [ 2 6 ], [ 3 7 ] cannam@86:
9   cannam@86:
10  codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ] cannam@86:
11   cannam@86:

cannam@86:

It is worth mentioning at this point that no configurable value in the residue coding setup is cannam@86: restricted to a power of two. cannam@86:

cannam@86:

8.4. residue 1

cannam@86:

Residue 1 does not interleave VQ encoding. It represents partition vector scalars in order. As cannam@86: with residue 0, however, partition length must be an integer multiple of the codebook dimension, cannam@86: although dimension may vary from pass to pass. cannam@86:

As an example, assume a partition vector of size eight, to be encoded by residue 0 using cannam@86: codebook sizes of 8, 4, 2 and 1: cannam@86:

cannam@86:

cannam@86: 1   cannam@86:
2              original residue vector: [ 0 1 2 3 4 5 6 7 ] cannam@86:
3   cannam@86:
4  codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ] cannam@86:
5   cannam@86:
6  codebook dimensions = 4  encoded as: [ 0 1 2 3 ], [ 4 5 6 7 ] cannam@86:
7   cannam@86:
8  codebook dimensions = 2  encoded as: [ 0 1 ], [ 2 3 ], [ 4 5 ], [ 6 7 ] cannam@86:
9   cannam@86:
10  codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ] cannam@86:
11   cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

8.5. residue 2

cannam@86:

Residue type two can be thought of as a variant of residue type 1. Rather than encoding multiple cannam@86: passed-in vectors as in residue type 1, the ch passed in vectors of length n are first interleaved cannam@86: and flattened into a single vector of length ch*n. Encoding then proceeds as in type 1. Decoding cannam@86: is as in type 1 with decode interleave reversed. If operating on a single vector to begin with, cannam@86: residue type 1 and type 2 are equivalent. cannam@86:

cannam@86:

cannam@86: cannam@86:

cannam@86:

Figure 12: illustration of residue type 2

cannam@86:

8.6. Residue decode

cannam@86:

8.6.1. header decode

cannam@86:

Header decode for all three residue types is identical. cannam@86:

cannam@86: 1    1) [residue\_begin] = read 24 bits as unsigned integer cannam@86:
2    2) [residue\_end] = read 24 bits as unsigned integer cannam@86:
3    3) [residue\_partition\_size] = read 24 bits as unsigned integer and add one cannam@86:
4    4) [residue\_classifications] = read 6 bits as unsigned integer and add one cannam@86: cannam@86: cannam@86: cannam@86:
5    5) [residue\_classbook] = read 8 bits as unsigned integer cannam@86:

cannam@86:

[residue_begin] and [residue_end] select the specific sub-portion of each vector that is cannam@86: actually coded; it implements akin to a bandpass where, for coding purposes, the vector cannam@86: effectively begins at element [residue_begin] and ends at [residue_end]. Preceding and cannam@86: following values in the unpacked vectors are zeroed. Note that for residue type 2, these cannam@86: values as well as [residue_partition_size]apply to the interleaved vector, not the cannam@86: individual vectors before interleave. [residue_partition_size] is as explained above, cannam@86: [residue_classifications] is the number of possible classification to which a partition can cannam@86: belong and [residue_classbook] is the codebook number used to code classification cannam@86: codewords. The number of dimensions in book [residue_classbook] determines how cannam@86: many classification values are grouped into a single classification codeword. Note that cannam@86: the number of entries and dimensions in book [residue_classbook], along with cannam@86: [residue_classifications], overdetermines to possible number of classification cannam@86: codewords. If [residue_classifications]ˆ[residue_classbook].dimensions exceeds cannam@86: [residue_classbook].entries, the bitstream should be regarded to be undecodable. cannam@86:

Next we read a bitmap pattern that specifies which partition classes code values in which cannam@86: passes. cannam@86:

cannam@86:

cannam@86: 1    1) iterate [i] over the range 0 ... [residue\_classifications]-1 { cannam@86:
2   cannam@86:
3         2) [high\_bits] = 0 cannam@86:
4         3) [low\_bits] = read 3 bits as unsigned integer cannam@86:
5         4) [bitflag] = read one bit as boolean cannam@86:
6         5) if ( [bitflag] is set ) then [high\_bits] = read five bits as unsigned integer cannam@86:
7         6) vector [residue\_cascade] element [i] = [high\_bits] * 8 + [low\_bits] cannam@86:
8       } cannam@86:
9    7) done cannam@86:

cannam@86:

Finally, we read in a list of book numbers, each corresponding to specific bit set in the cascade cannam@86: bitmap. We loop over the possible codebook classifications and the maximum possible number of cannam@86: encoding stages (8 in Vorbis I, as constrained by the elements of the cascade bitmap being eight cannam@86: bits): cannam@86:

cannam@86:

cannam@86: 1    1) iterate [i] over the range 0 ... [residue\_classifications]-1 { cannam@86:
2   cannam@86:
3         2) iterate [j] over the range 0 ... 7 { cannam@86:
4   cannam@86:
5              3) if ( vector [residue\_cascade] element [i] bit [j] is set ) { cannam@86:
6   cannam@86:
7                   4) array [residue\_books] element [i][j] = read 8 bits as unsigned integer cannam@86:
8   cannam@86:
9                 } else { cannam@86: cannam@86: cannam@86: cannam@86:
10   cannam@86:
11                   5) array [residue\_books] element [i][j] = unused cannam@86:
12   cannam@86:
13                 } cannam@86:
14            } cannam@86:
15        } cannam@86:
16   cannam@86:
17    6) done cannam@86:

cannam@86:

An end-of-packet condition at any point in header decode renders the stream undecodable. cannam@86: In addition, any codebook number greater than the maximum numbered codebook cannam@86: set up in this stream also renders the stream undecodable. All codebooks in array cannam@86: [residue_books] are required to have a value mapping. The presence of codebook in array cannam@86: [residue_books] without a value mapping (maptype equals zero) renders the stream cannam@86: undecodable. cannam@86:

cannam@86:

8.6.2. packet decode

cannam@86:

Format 0 and 1 packet decode is identical except for specific partition interleave. Format 2 packet cannam@86: decode can be built out of the format 1 decode process. Thus we describe first the decode cannam@86: infrastructure identical to all three formats. cannam@86:

In addition to configuration information, the residue decode process is passed the number of cannam@86: vectors in the submap bundle and a vector of flags indicating if any of the vectors are not to be cannam@86: decoded. If the passed in number of vectors is 3 and vector number 1 is marked ’do not decode’, cannam@86: decode skips vector 1 during the decode loop. However, even ’do not decode’ vectors are cannam@86: allocated and zeroed. cannam@86:

Depending on the values of [residue_begin] and [residue_end], it is obvious that the cannam@86: encoded portion of a residue vector may be the entire possible residue vector or some other strict cannam@86: subset of the actual residue vector size with zero padding at either uncoded end. However, it is cannam@86: also possible to set [residue_begin] and [residue_end] to specify a range partially or wholly cannam@86: beyond the maximum vector size. Before beginning residue decode, limit [residue_begin] cannam@86: and [residue_end] to the maximum possible vector size as follows. We assume that cannam@86: the number of vectors being encoded, [ch] is provided by the higher level decoding cannam@86: process. cannam@86:

cannam@86:

cannam@86: 1    1) [actual\_size] = current blocksize/2; cannam@86:
2    2) if residue encoding is format 2 cannam@86:
3         3) [actual\_size] = [actual\_size] * [ch]; cannam@86: cannam@86: cannam@86: cannam@86:
4    4) [limit\_residue\_begin] = maximum of ([residue\_begin],[actual\_size]); cannam@86:
5    5) [limit\_residue\_end] = maximum of ([residue\_end],[actual\_size]); cannam@86:

cannam@86:

The following convenience values are conceptually useful to clarifying the decode process: cannam@86:

cannam@86:

cannam@86: 1    1) [classwords\_per\_codeword] = [codebook\_dimensions] value of codebook [residue\_classbook] cannam@86:
2    2) [n\_to\_read] = [limit\_residue\_end] - [limit\_residue\_begin] cannam@86:
3    3) [partitions\_to\_read] = [n\_to\_read] / [residue\_partition\_size] cannam@86:

cannam@86:

Packet decode proceeds as follows, matching the description offered earlier in the document. cannam@86:

cannam@86: 1    1) allocate and zero all vectors that will be returned. cannam@86:
2    2) if ([n\_to\_read] is zero), stop; there is no residue to decode. cannam@86:
3    3) iterate [pass] over the range 0 ... 7 { cannam@86:
4   cannam@86:
5         4) [partition\_count] = 0 cannam@86:
6   cannam@86:
7         5) while [partition\_count] is less than [partitions\_to\_read] cannam@86:
8   cannam@86:
9              6) if ([pass] is zero) { cannam@86:
10   cannam@86:
11                   7) iterate [j] over the range 0 .. [ch]-1 { cannam@86:
12   cannam@86:
13                        8) if vector [j] is not marked ’do not decode’ { cannam@86:
14   cannam@86:
15                             9) [temp] = read from packet using codebook [residue\_classbook] in scalar context cannam@86:
16                            10) iterate [i] descending over the range [classwords\_per\_codeword]-1 ... 0 { cannam@86:
17   cannam@86:
18                                 11) array [classifications] element [j],([i]+[partition\_count]) = cannam@86:
19                                     [temp] integer modulo [residue\_classifications] cannam@86:
20                                 12) [temp] = [temp] / [residue\_classifications] using integer division cannam@86:
21   cannam@86:
22                                } cannam@86:
23   cannam@86:
24                           } cannam@86:
25   cannam@86:
26                      } cannam@86:
27   cannam@86:
28                 } cannam@86:
29   cannam@86:
30             13) iterate [i] over the range 0 .. ([classwords\_per\_codeword] - 1) while [partition\_count] cannam@86:
31                 is also less than [partitions\_to\_read] { cannam@86:
32   cannam@86:
33                   14) iterate [j] over the range 0 .. [ch]-1 { cannam@86:
34   cannam@86:
35                        15) if vector [j] is not marked ’do not decode’ { cannam@86:
36   cannam@86:
37                             16) [vqclass] = array [classifications] element [j],[partition\_count] cannam@86:
38                             17) [vqbook] = array [residue\_books] element [vqclass],[pass] cannam@86:
39                             18) if ([vqbook] is not ’unused’) { cannam@86:
40   cannam@86:
41                                  19) decode partition into output vector number [j], starting at scalar cannam@86:
42                                      offset [limit\_residue\_begin]+[partition\_count]*[residue\_partition\_size] using cannam@86:
43                                      codebook number [vqbook] in VQ context cannam@86:
44                            } cannam@86: cannam@86: cannam@86: cannam@86:
45                       } cannam@86:
46   cannam@86:
47                   20) increment [partition\_count] by one cannam@86:
48   cannam@86:
49                 } cannam@86:
50            } cannam@86:
51       } cannam@86:
52   cannam@86:
53   21) done cannam@86:
54   cannam@86:

cannam@86:

An end-of-packet condition during packet decode is to be considered a nominal occurrence. cannam@86: Decode returns the result of vector decode up to that point. cannam@86:

cannam@86:

8.6.3. format 0 specifics

cannam@86:

Format zero decodes partitions exactly as described earlier in the ’Residue Format: residue 0’ cannam@86: section. The following pseudocode presents the same algorithm. Assume: cannam@86:

[n] is the value in [residue_partition_size] cannam@86:
[v] is the residue vector cannam@86:
[offset] is the beginning read offset in [v]

cannam@86:

cannam@86: 1   1) [step] = [n] / [codebook\_dimensions] cannam@86:
2   2) iterate [i] over the range 0 ... [step]-1 { cannam@86:
3   cannam@86:
4        3) vector [entry\_temp] = read vector from packet using current codebook in VQ context cannam@86:
5        4) iterate [j] over the range 0 ... [codebook\_dimensions]-1 { cannam@86:
6   cannam@86:
7             5) vector [v] element ([offset]+[i]+[j]*[step]) = cannam@86:
8           vector [v] element ([offset]+[i]+[j]*[step]) + cannam@86:
9                  vector [entry\_temp] element [j] cannam@86:
10   cannam@86:
11           } cannam@86:
12   cannam@86:
13      } cannam@86:
14   cannam@86:
15    6) done cannam@86:
16   cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

8.6.4. format 1 specifics

cannam@86:

Format 1 decodes partitions exactly as described earlier in the ’Residue Format: residue 1’ cannam@86: section. The following pseudocode presents the same algorithm. Assume: cannam@86:

[n] is the value in [residue_partition_size] cannam@86:
[v] is the residue vector cannam@86:
[offset] is the beginning read offset in [v]

cannam@86:

cannam@86: 1   1) [i] = 0 cannam@86:
2   2) vector [entry\_temp] = read vector from packet using current codebook in VQ context cannam@86:
3   3) iterate [j] over the range 0 ... [codebook\_dimensions]-1 { cannam@86:
4   cannam@86:
5        4) vector [v] element ([offset]+[i]) = cannam@86:
6     vector [v] element ([offset]+[i]) + cannam@86:
7            vector [entry\_temp] element [j] cannam@86:
8        5) increment [i] cannam@86:
9   cannam@86:
10      } cannam@86:
11   cannam@86:
12    6) if ( [i] is less than [n] ) continue at step 2 cannam@86:
13    7) done cannam@86:

cannam@86:

8.6.5. format 2 specifics

cannam@86:

Format 2 is reducible to format 1. It may be implemented as an additional step prior to and an cannam@86: additional post-decode step after a normal format 1 decode. cannam@86: cannam@86: cannam@86: cannam@86:

Format 2 handles ’do not decode’ vectors differently than residue 0 or 1; if all vectors are marked cannam@86: ’do not decode’, no decode occurrs. However, if at least one vector is to be decoded, all cannam@86: the vectors are decoded. We then request normal format 1 to decode a single vector cannam@86: representing all output channels, rather than a vector for each channel. After decode, cannam@86: deinterleave the vector into independent vectors, one for each output channel. That cannam@86: is: cannam@86:

cannam@86:

cannam@86: 1.: If all vectors 0 through ch-1 are marked ’do not decode’, allocate and clear a single cannam@86: vector [v]of length ch*n and skip step 2 below; proceed directly to the post-decode cannam@86: step. cannam@86:
cannam@86: 2.: Rather than performing format 1 decode to produce ch vectors of length n each, call cannam@86: format 1 decode to produce a single vector [v] of length ch*n. cannam@86:
cannam@86: 3.: Post decode: Deinterleave the single vector [v] returned by format 1 decode as cannam@86: described above into ch independent vectors, one for each outputchannel, according cannam@86: to: cannam@86:
cannam@86: 1    1) iterate [i] over the range 0 ... [n]-1 { cannam@86:
2   cannam@86:
3         2) iterate [j] over the range 0 ... [ch]-1 { cannam@86:
4   cannam@86:
5              3) output vector number [j] element [i] = vector [v] element ([i] * [ch] + [j]) cannam@86:
6   cannam@86:
7            } cannam@86:
8       } cannam@86:
9   cannam@86:
10    4) done cannam@86:
cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

9. Helper equations

cannam@86:

9.1. Overview

cannam@86:

The equations below are used in multiple places by the Vorbis codec specification. Rather than cannam@86: cluttering up the main specification documents, they are defined here and referenced where cannam@86: appropriate. cannam@86:

cannam@86:

9.2. Functions

cannam@86:

9.2.1. ilog

cannam@86:

The ”ilog(x)” function returns the position number (1 through n) of the highest set bit in the cannam@86: two’s complement integer value [x]. Values of [x] less than zero are defined to return cannam@86: zero. cannam@86:

cannam@86:

cannam@86: 1    1) [return\_value] = 0; cannam@86:
2    2) if ( [x] is greater than zero ) { cannam@86:
3   cannam@86:
4         3) increment [return\_value]; cannam@86:
5         4) logical shift [x] one bit to the right, padding the MSb with zero cannam@86:
6         5) repeat at step 2) cannam@86:
7   cannam@86:
8       } cannam@86:
9   cannam@86:
10     6) done cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

Examples: cannam@86:

ilog(0) = 0; cannam@86:
ilog(1) = 1; cannam@86:
ilog(2) = 2; cannam@86:
ilog(3) = 2; cannam@86:
ilog(4) = 3; cannam@86:
ilog(7) = 3; cannam@86:
ilog(negative number) = 0;

cannam@86:

9.2.2. float32_unpack

cannam@86:

”float32_unpack(x)” is intended to translate the packed binary representation of a Vorbis cannam@86: codebook float value into the representation used by the decoder for floating point numbers. For cannam@86: purposes of this example, we will unpack a Vorbis float32 into a host-native floating point cannam@86: number. cannam@86:

cannam@86:

cannam@86: 1    1) [mantissa] = [x] bitwise AND 0x1fffff (unsigned result) cannam@86:
2    2) [sign] = [x] bitwise AND 0x80000000 (unsigned result) cannam@86:
3    3) [exponent] = ( [x] bitwise AND 0x7fe00000) shifted right 21 bits (unsigned result) cannam@86:
4    4) if ( [sign] is nonzero ) then negate [mantissa] cannam@86:
5    5) return [mantissa] * ( 2 ^ ( [exponent] - 788 ) ) cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

9.2.3. lookup1_values

cannam@86:

”lookup1_values(codebook_entries,codebook_dimensions)” is used to compute the cannam@86: correct length of the value index for a codebook VQ lookup table of lookup type 1. cannam@86: The values on this list are permuted to construct the VQ vector lookup table of size cannam@86: [codebook_entries]. cannam@86:

The return value for this function is defined to be ’the greatest integer value for which cannam@86: [return_value] to the power of [codebook_dimensions] is less than or equal to cannam@86: [codebook_entries]’. cannam@86:

cannam@86:

9.2.4. low_neighbor

cannam@86:

”low_neighbor(v,x)” finds the position n in vector [v] of the greatest value scalar element for cannam@86: which n is less than [x] and vector [v] element n is less than vector [v] element cannam@86: [x]. cannam@86:

cannam@86:

9.2.5. high_neighbor

cannam@86:

”high_neighbor(v,x)” finds the position n in vector [v] of the lowest value scalar element for cannam@86: which n is less than [x] and vector [v] element n is greater than vector [v] element cannam@86: [x]. cannam@86:

cannam@86:

9.2.6. render_point

cannam@86:

”render_point(x0,y0,x1,y1,X)” is used to find the Y value at point X along the line specified by cannam@86: x0, x1, y0 and y1. This function uses an integer algorithm to solve for the point directly without cannam@86: calculating intervening values along the line. cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: 1    1)  [dy] = [y1] - [y0] cannam@86:
2    2) [adx] = [x1] - [x0] cannam@86:
3    3) [ady] = absolute value of [dy] cannam@86:
4    4) [err] = [ady] * ([X] - [x0]) cannam@86:
5    5) [off] = [err] / [adx] using integer division cannam@86:
6    6) if ( [dy] is less than zero ) { cannam@86:
7   cannam@86:
8         7) [Y] = [y0] - [off] cannam@86:
9   cannam@86:
10       } else { cannam@86:
11   cannam@86:
12         8) [Y] = [y0] + [off] cannam@86:
13   cannam@86:
14       } cannam@86:
15   cannam@86:
16    9) done cannam@86:

cannam@86:

9.2.7. render_line

cannam@86:

Floor decode type one uses the integer line drawing algorithm of ”render_line(x0, y0, x1, y1, v)” cannam@86: to construct an integer floor curve for contiguous piecewise line segments. Note that it has not cannam@86: been relevant elsewhere, but here we must define integer division as rounding division of both cannam@86: positive and negative numbers toward zero. cannam@86:

cannam@86:

cannam@86: 1    1)   [dy] = [y1] - [y0] cannam@86:
2    2)  [adx] = [x1] - [x0] cannam@86:
3    3)  [ady] = absolute value of [dy] cannam@86:
4    4) [base] = [dy] / [adx] using integer division cannam@86:
5    5)    [x] = [x0] cannam@86:
6    6)    [y] = [y0] cannam@86:
7    7)  [err] = 0 cannam@86:
8   cannam@86:
9    8) if ( [dy] is less than 0 ) { cannam@86:
10   cannam@86:
11          9) [sy] = [base] - 1 cannam@86:
12   cannam@86:
13       } else { cannam@86:
14   cannam@86:
15         10) [sy] = [base] + 1 cannam@86:
16   cannam@86:
17       } cannam@86:
18   cannam@86:
19   11) [ady] = [ady] - (absolute value of [base]) * [adx] cannam@86:
20   12) vector [v] element [x] = [y] cannam@86:
21   cannam@86:
22   13) iterate [x] over the range [x0]+1 ... [x1]-1 { cannam@86: cannam@86: cannam@86: cannam@86:
23   cannam@86:
24         14) [err] = [err] + [ady]; cannam@86:
25         15) if ( [err] >= [adx] ) { cannam@86:
26   cannam@86:
27               16) [err] = [err] - [adx] cannam@86:
28               17)   [y] = [y] + [sy] cannam@86:
29   cannam@86:
30             } else { cannam@86:
31   cannam@86:
32               18) [y] = [y] + [base] cannam@86:
33   cannam@86:
34             } cannam@86:
35   cannam@86:
36         19) vector [v] element [x] = [y] cannam@86:
37   cannam@86:
38       } cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

10. Tables

cannam@86:

10.1. floor1_inverse_dB_table

cannam@86:

The vector [floor1_inverse_dB_table] is a 256 element static lookup table consiting of the cannam@86: following values (read left to right then top to bottom): cannam@86:

cannam@86:

cannam@86: 1    1.0649863e-07, 1.1341951e-07, 1.2079015e-07, 1.2863978e-07, cannam@86:
2    1.3699951e-07, 1.4590251e-07, 1.5538408e-07, 1.6548181e-07, cannam@86:
3    1.7623575e-07, 1.8768855e-07, 1.9988561e-07, 2.1287530e-07, cannam@86:
4    2.2670913e-07, 2.4144197e-07, 2.5713223e-07, 2.7384213e-07, cannam@86:
5    2.9163793e-07, 3.1059021e-07, 3.3077411e-07, 3.5226968e-07, cannam@86:
6    3.7516214e-07, 3.9954229e-07, 4.2550680e-07, 4.5315863e-07, cannam@86:
7    4.8260743e-07, 5.1396998e-07, 5.4737065e-07, 5.8294187e-07, cannam@86:
8    6.2082472e-07, 6.6116941e-07, 7.0413592e-07, 7.4989464e-07, cannam@86:
9    7.9862701e-07, 8.5052630e-07, 9.0579828e-07, 9.6466216e-07, cannam@86:
10    1.0273513e-06, 1.0941144e-06, 1.1652161e-06, 1.2409384e-06, cannam@86:
11    1.3215816e-06, 1.4074654e-06, 1.4989305e-06, 1.5963394e-06, cannam@86:
12    1.7000785e-06, 1.8105592e-06, 1.9282195e-06, 2.0535261e-06, cannam@86:
13    2.1869758e-06, 2.3290978e-06, 2.4804557e-06, 2.6416497e-06, cannam@86:
14    2.8133190e-06, 2.9961443e-06, 3.1908506e-06, 3.3982101e-06, cannam@86:
15    3.6190449e-06, 3.8542308e-06, 4.1047004e-06, 4.3714470e-06, cannam@86:
16    4.6555282e-06, 4.9580707e-06, 5.2802740e-06, 5.6234160e-06, cannam@86:
17    5.9888572e-06, 6.3780469e-06, 6.7925283e-06, 7.2339451e-06, cannam@86:
18    7.7040476e-06, 8.2047000e-06, 8.7378876e-06, 9.3057248e-06, cannam@86:
19    9.9104632e-06, 1.0554501e-05, 1.1240392e-05, 1.1970856e-05, cannam@86:
20    1.2748789e-05, 1.3577278e-05, 1.4459606e-05, 1.5399272e-05, cannam@86:
21    1.6400004e-05, 1.7465768e-05, 1.8600792e-05, 1.9809576e-05, cannam@86:
22    2.1096914e-05, 2.2467911e-05, 2.3928002e-05, 2.5482978e-05, cannam@86:
23    2.7139006e-05, 2.8902651e-05, 3.0780908e-05, 3.2781225e-05, cannam@86:
24    3.4911534e-05, 3.7180282e-05, 3.9596466e-05, 4.2169667e-05, cannam@86:
25    4.4910090e-05, 4.7828601e-05, 5.0936773e-05, 5.4246931e-05, cannam@86:
26    5.7772202e-05, 6.1526565e-05, 6.5524908e-05, 6.9783085e-05, cannam@86:
27    7.4317983e-05, 7.9147585e-05, 8.4291040e-05, 8.9768747e-05, cannam@86:
28    9.5602426e-05, 0.00010181521, 0.00010843174, 0.00011547824, cannam@86:
29    0.00012298267, 0.00013097477, 0.00013948625, 0.00014855085, cannam@86:
30    0.00015820453, 0.00016848555, 0.00017943469, 0.00019109536, cannam@86:
31    0.00020351382, 0.00021673929, 0.00023082423, 0.00024582449, cannam@86:
32    0.00026179955, 0.00027881276, 0.00029693158, 0.00031622787, cannam@86:
33    0.00033677814, 0.00035866388, 0.00038197188, 0.00040679456, cannam@86:
34    0.00043323036, 0.00046138411, 0.00049136745, 0.00052329927, cannam@86:
35    0.00055730621, 0.00059352311, 0.00063209358, 0.00067317058, cannam@86:
36    0.00071691700, 0.00076350630, 0.00081312324, 0.00086596457, cannam@86:
37    0.00092223983, 0.00098217216, 0.0010459992,  0.0011139742, cannam@86:
38    0.0011863665,  0.0012634633,  0.0013455702,  0.0014330129, cannam@86:
39    0.0015261382,  0.0016253153,  0.0017309374,  0.0018434235, cannam@86:
40    0.0019632195,  0.0020908006,  0.0022266726,  0.0023713743, cannam@86:
41    0.0025254795,  0.0026895994,  0.0028643847,  0.0030505286, cannam@86:
42    0.0032487691,  0.0034598925,  0.0036847358,  0.0039241906, cannam@86: cannam@86: cannam@86: cannam@86:
43    0.0041792066,  0.0044507950,  0.0047400328,  0.0050480668, cannam@86:
44    0.0053761186,  0.0057254891,  0.0060975636,  0.0064938176, cannam@86:
45    0.0069158225,  0.0073652516,  0.0078438871,  0.0083536271, cannam@86:
46    0.0088964928,  0.009474637,   0.010090352,   0.010746080, cannam@86:
47    0.011444421,   0.012188144,   0.012980198,   0.013823725, cannam@86:
48    0.014722068,   0.015678791,   0.016697687,   0.017782797, cannam@86:
49    0.018938423,   0.020169149,   0.021479854,   0.022875735, cannam@86:
50    0.024362330,   0.025945531,   0.027631618,   0.029427276, cannam@86:
51    0.031339626,   0.033376252,   0.035545228,   0.037855157, cannam@86:
52    0.040315199,   0.042935108,   0.045725273,   0.048696758, cannam@86:
53    0.051861348,   0.055231591,   0.058820850,   0.062643361, cannam@86:
54    0.066714279,   0.071049749,   0.075666962,   0.080584227, cannam@86:
55    0.085821044,   0.091398179,   0.097337747,   0.10366330, cannam@86:
56    0.11039993,    0.11757434,    0.12521498,    0.13335215, cannam@86:
57    0.14201813,    0.15124727,    0.16107617,    0.17154380, cannam@86:
58    0.18269168,    0.19456402,    0.20720788,    0.22067342, cannam@86:
59    0.23501402,    0.25028656,    0.26655159,    0.28387361, cannam@86:
60    0.30232132,    0.32196786,    0.34289114,    0.36517414, cannam@86:
61    0.38890521,    0.41417847,    0.44109412,    0.46975890, cannam@86:
62    0.50028648,    0.53279791,    0.56742212,    0.60429640, cannam@86:
63    0.64356699,    0.68538959,    0.72993007,    0.77736504, cannam@86:
64    0.82788260,    0.88168307,    0.9389798,     1. cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

A. Embedding Vorbis into an Ogg stream

cannam@86:

A.1. Overview

cannam@86:

This document describes using Ogg logical and physical transport streams to encapsulate Vorbis cannam@86: compressed audio packet data into file form. cannam@86:

The Section 1, “Introduction and Description” provides an overview of the construction of cannam@86: Vorbis audio packets. cannam@86:

The Ogg bitstream overview and Ogg logical bitstream and framing spec provide detailed cannam@86: descriptions of Ogg transport streams. This specification document assumes a working cannam@86: knowledge of the concepts covered in these named backround documents. Please read them cannam@86: first. cannam@86:

cannam@86:

A.1.1. Restrictions

cannam@86:

The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis streams use Ogg transport cannam@86: streams in degenerate, unmultiplexed form only. That is: cannam@86:

A meta-headerless Ogg file encapsulates the Vorbis I packets cannam@86:
The Ogg stream may be chained, i.e., contain multiple, contigous logical streams cannam@86: (links). cannam@86:
The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, cannam@86: per link) cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

This is not to say that it is not currently possible to multiplex Vorbis with other media cannam@86: types into a multi-stream Ogg file. At the time this document was written, Ogg was cannam@86: becoming a popular container for low-bitrate movies consisting of DivX video and Vorbis cannam@86: audio. However, a ’Vorbis I audio file’ is taken to imply Vorbis audio existing alone cannam@86: within a degenerate Ogg stream. A compliant ’Vorbis audio player’ is not required to cannam@86: implement Ogg support beyond the specific support of Vorbis within a degenrate Ogg cannam@86: stream (naturally, application authors are encouraged to support full multiplexed Ogg cannam@86: handling). cannam@86:

cannam@86:

A.1.2. MIME type

cannam@86:

The MIME type of Ogg files depend on the context. Specifically, complex multimedia and cannam@86: applications should use application/ogg, while visual media should use video/ogg, and audio cannam@86: audio/ogg. Vorbis data encapsulated in Ogg may appear in any of those types. RTP cannam@86: encapsulated Vorbis should use audio/vorbis + audio/vorbis-config. cannam@86:

cannam@86:

A.2. Encapsulation

cannam@86:

Ogg encapsulation of a Vorbis packet stream is straightforward. cannam@86:

The first Vorbis packet (the identification header), which uniquely identifies a stream cannam@86: as Vorbis audio, is placed alone in the first page of the logical Ogg stream. This cannam@86: results in a first Ogg page of exactly 58 bytes at the very beginning of the logical cannam@86: stream. cannam@86:
This first page is marked ’beginning of stream’ in the page flags. cannam@86:
The second and third vorbis packets (comment and setup headers) may span one or cannam@86: more pages beginning on the second page of the logical stream. However many pages cannam@86: they span, the third header packet finishes the page on which it ends. The next (first cannam@86: audio) packet must begin on a fresh page. cannam@86: cannam@86: cannam@86: cannam@86:
The granule position of these first pages containing only headers is zero. cannam@86:
The first audio packet of the logical stream begins a fresh Ogg page. cannam@86:
Packets are placed into ogg pages in order until the end of stream. cannam@86:
The last page is marked ’end of stream’ in the page flags. cannam@86:
Vorbis packets may span page boundaries. cannam@86:
The granule position of pages containing Vorbis audio is in units of PCM audio cannam@86: samples (per channel; a stereo stream’s granule position does not increment at twice cannam@86: the speed of a mono stream). cannam@86:
The granule position of a page represents the end PCM sample position of the last cannam@86: packet completed on that page. The ’last PCM sample’ is the last complete sample cannam@86: returned by decode, not an internal sample awaiting lapping with a subsequent block. cannam@86: A page that is entirely spanned by a single packet (that completes on a subsequent cannam@86: page) has no granule position, and the granule position is set to ’-1’. cannam@86:
Note that the last decoded (fully lapped) PCM sample from a packet is not cannam@86: necessarily the middle sample from that block. If, eg, the current Vorbis packet cannam@86: encodes a ”long block” and the next Vorbis packet encodes a ”short block”, the last cannam@86: decodable sample from the current packet be at position (3*long_block_length/4) - cannam@86: (short_block_length/4). cannam@86:
The granule (PCM) position of the first page need not indicate that the stream cannam@86: started at position zero. Although the granule position belongs to the last completed cannam@86: packet on the page and a valid granule position must be positive, by inference it may cannam@86: indicate that the PCM position of the beginning of audio is positive or negative. cannam@86:
- A positive starting value simply indicates that this stream begins at some cannam@86: positive time offset, potentially within a larger program. This is a common case cannam@86: when connecting to the middle of broadcast stream. cannam@86:
- A negative value indicates that output samples preceeding time zero should be cannam@86: cannam@86: cannam@86: cannam@86: discarded during decoding; this technique is used to allow sample-granularity cannam@86: editing of the stream start time of already-encoded Vorbis streams. The number cannam@86: of samples to be discarded must not exceed the overlap-add span of the first two cannam@86: audio packets. cannam@86:
cannam@86:
In both of these cases in which the initial audio PCM starting offset is nonzero, the cannam@86: second finished audio packet must flush the page on which it appears and the cannam@86: third packet begin a fresh page. This allows the decoder to always be able to cannam@86: perform PCM position adjustments before needing to return any PCM data from cannam@86: synthesis, resulting in correct positioning information without any aditional seeking cannam@86: logic. cannam@86:
Note: Failure to do so should, at worst, cause a decoder implementation to return cannam@86: incorrect positioning information for seeking operations at the very beginning of the cannam@86: stream. cannam@86:
A granule position on the final page in a stream that indicates less audio data than the cannam@86: final packet would normally return is used to end the stream on other than even frame cannam@86: boundaries. The difference between the actual available data returned and the cannam@86: declared amount indicates how many trailing samples to discard from the decoding cannam@86: process. cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86:

B. Vorbis encapsulation in RTP

cannam@86:

Please consult RFC 5215 “RTP Payload Format for Vorbis Encoded Audio” for description of cannam@86: how to embed Vorbis audio in an RTP stream. cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

Colophon

cannam@86:

Ogg is a Xiph.Org Foundation effort to protect essential tenets of Internet multimedia from cannam@86: corporate hostage-taking; Open Source is the net’s greatest tool to keep everyone honest. See cannam@86: About the Xiph.Org Foundation for details. cannam@86:

Ogg Vorbis is the first Ogg audio CODEC. Anyone may freely use and distribute the Ogg and cannam@86: Vorbis specification, whether in a private, public or corporate capacity. However, the Xiph.Org cannam@86: Foundation and the Ogg project (xiph.org) reserve the right to set the Ogg Vorbis specification cannam@86: and certify specification compliance. cannam@86:

Xiph.Org’s Vorbis software CODEC implementation is distributed under a BSD-like license. This cannam@86: does not restrict third parties from distributing independent implementations of Vorbis software cannam@86: under other licenses. cannam@86:

Ogg, Vorbis, Xiph.Org Foundation and their logos are trademarks (tm) of the Xiph.Org cannam@86: Foundation. These pages are copyright (C) 1994-2007 Xiph.Org Foundation. All rights cannam@86: reserved. cannam@86:

This document is set using LATEX. cannam@86: cannam@86: cannam@86: cannam@86:

References

cannam@86:

cannam@86: [1] T. Sporer, K. Brandenburg and cannam@86: B. Edler, The use of multirate filter banks for coding of high quality digital audio, cannam@86: http://www.iocon.com/resource/docs/ps/eusipco_corrected.ps. cannam@86:

cannam@86:

cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: