cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: Ogg Vorbis Documentation cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: cannam@86:

cannam@86:

cannam@86: cannam@86:

Ogg logical bitstream framing

cannam@86: cannam@86:

Ogg bitstreams

cannam@86: cannam@86:

The Ogg transport bitstream is designed to provide framing, error cannam@86: protection and seeking structure for higher-level codec streams that cannam@86: consist of raw, unencapsulated data packets, such as the Vorbis audio cannam@86: codec or Theora video codec.

cannam@86: cannam@86:

Application example: Vorbis

cannam@86: cannam@86:

Vorbis encodes short-time blocks of PCM data into raw packets of cannam@86: bit-packed data. These raw packets may be used directly by transport cannam@86: mechanisms that provide their own framing and packet-separation cannam@86: mechanisms (such as UDP datagrams). For stream based storage (such as cannam@86: files) and transport (such as TCP streams or pipes), Vorbis uses the cannam@86: Ogg bitstream format to provide framing/sync, sync recapture cannam@86: after error, landmarks during seeking, and enough information to cannam@86: properly separate data back into packets at the original packet cannam@86: boundaries without relying on decoding to find packet boundaries.

cannam@86: cannam@86:

Design constraints for Ogg bitstreams

cannam@86: cannam@86:

True streaming; we must not need to seek to build a 100% cannam@86: complete bitstream.
Use no more than approximately 1-2% of bitstream bandwidth for cannam@86: packet boundary marking, high-level framing, sync and seeking.
Specification of absolute position within the original sample cannam@86: stream.
Simple mechanism to ease limited editing, such as a simplified cannam@86: concatenation mechanism.
Detection of corruption, recapture after error and direct, random cannam@86: access to data at arbitrary positions in the bitstream.

cannam@86: cannam@86:

Logical and Physical Bitstreams

cannam@86: cannam@86:

A logical Ogg bitstream is a contiguous stream of cannam@86: sequential pages belonging only to the logical bitstream. A cannam@86: physical Ogg bitstream is constructed from one or more cannam@86: than one logical Ogg bitstream (the simplest physical bitstream cannam@86: is simply a single logical bitstream). We describe below the exact cannam@86: formatting of an Ogg logical bitstream. Combining logical cannam@86: bitstreams into more complex physical bitstreams is described in the cannam@86: Ogg bitstream overview. The exact cannam@86: mapping of raw Vorbis packets into a valid Ogg Vorbis physical cannam@86: bitstream is described in the Vorbis I Specification.

cannam@86: cannam@86:

Bitstream structure

cannam@86: cannam@86:

An Ogg stream is structured by dividing incoming packets into cannam@86: segments of up to 255 bytes and then wrapping a group of contiguous cannam@86: packet segments into a variable length page preceded by a page cannam@86: header. Both the header size and page size are variable; the page cannam@86: header contains sizing information and checksum data to determine cannam@86: header/page size and data integrity.

cannam@86: cannam@86:

The bitstream is captured (or recaptured) by looking for the beginning cannam@86: of a page, specifically the capture pattern. Once the capture pattern cannam@86: is found, the decoder verifies page sync and integrity by computing cannam@86: and comparing the checksum. At that point, the decoder can extract the cannam@86: packets themselves.

cannam@86: cannam@86:

Packet segmentation

cannam@86: cannam@86:

Packets are logically divided into multiple segments before encoding cannam@86: into a page. Note that the segmentation and fragmentation process is a cannam@86: logical one; it's used to compute page header values and the original cannam@86: page data need not be disturbed, even when a packet spans page cannam@86: boundaries.

cannam@86: cannam@86:

The raw packet is logically divided into [n] 255 byte segments and a cannam@86: last fractional segment of < 255 bytes. A packet size may well cannam@86: consist only of the trailing fractional segment, and a fractional cannam@86: segment may be zero length. These values, called "lacing values" are cannam@86: then saved and placed into the header segment table.

cannam@86: cannam@86:

An example should make the basic concept clear:

cannam@86: cannam@86:

cannam@86: 
cannam@86: raw packet:
cannam@86:   ___________________________________________
cannam@86:  |______________packet data__________________| 753 bytes
cannam@86: 
cannam@86: lacing values for page header segment table: 255,255,243
cannam@86: 
cannam@86:

cannam@86: cannam@86:

We simply add the lacing values for the total size; the last lacing cannam@86: value for a packet is always the value that is less than 255. Note cannam@86: that this encoding both avoids imposing a maximum packet size as well cannam@86: as imposing minimum overhead on small packets (as opposed to, eg, cannam@86: simply using two bytes at the head of every packet and having a max cannam@86: packet size of 32k. Small packets (<255, the typical case) are cannam@86: penalized with twice the segmentation overhead). Using the lacing cannam@86: values as suggested, small packets see the minimum possible cannam@86: byte-aligned overheade (1 byte) and large packets, over 512 bytes or cannam@86: so, see a fairly constant ~.5% overhead on encoding space.

cannam@86: cannam@86:

Note that a lacing value of 255 implies that a second lacing value cannam@86: follows in the packet, and a value of < 255 marks the end of the cannam@86: packet after that many additional bytes. A packet of 255 bytes (or a cannam@86: multiple of 255 bytes) is terminated by a lacing value of 0:

cannam@86: cannam@86:


cannam@86: raw packet:
cannam@86:   _______________________________
cannam@86:  |________packet data____________|          255 bytes
cannam@86: 
cannam@86: lacing values: 255, 0
cannam@86:

cannam@86: cannam@86:

Note also that a 'nil' (zero length) packet is not an error; it cannam@86: consists of nothing more than a lacing value of zero in the header.

cannam@86: cannam@86:

Packets spanning pages

cannam@86: cannam@86:

Packets are not restricted to beginning and ending within a page, cannam@86: although individual segments are, by definition, required to do so. cannam@86: Packets are not restricted to a maximum size, although excessively cannam@86: large packets in the data stream are discouraged; the Ogg cannam@86: bitstream specification strongly recommends nominal page size of cannam@86: approximately 4-8kB (large packets are foreseen as being useful for cannam@86: initialization data at the beginning of a logical bitstream).

cannam@86: cannam@86:

After segmenting a packet, the encoder may decide not to place all the cannam@86: resulting segments into the current page; to do so, the encoder places cannam@86: the lacing values of the segments it wishes to belong to the current cannam@86: page into the current segment table, then finishes the page. The next cannam@86: page is begun with the first value in the segment table belonging to cannam@86: the next packet segment, thus continuing the packet (data in the cannam@86: packet body must also correspond properly to the lacing values in the cannam@86: spanned pages. The segment data in the first packet corresponding to cannam@86: the lacing values of the first page belong in that page; packet cannam@86: segments listed in the segment table of the following page must begin cannam@86: the page body of the subsequent page).

cannam@86: cannam@86:

The last mechanic to spanning a page boundary is to set the header cannam@86: flag in the new page to indicate that the first lacing value in the cannam@86: segment table continues rather than begins a packet; a header flag of cannam@86: 0x01 is set to indicate a continued packet. Although mandatory, it cannam@86: is not actually algorithmically necessary; one could inspect the cannam@86: preceding segment table to determine if the packet is new or cannam@86: continued. Adding the information to the packet_header flag allows a cannam@86: simpler design (with no overhead) that needs only inspect the current cannam@86: page header after frame capture. This also allows faster error cannam@86: recovery in the event that the packet originates in a corrupt cannam@86: preceding page, implying that the previous page's segment table cannam@86: cannot be trusted.

cannam@86: cannam@86:

Note that a packet can span an arbitrary number of pages; the above cannam@86: spanning process is repeated for each spanned page boundary. Also a cannam@86: 'zero termination' on a packet size that is an even multiple of 255 cannam@86: must appear even if the lacing value appears in the next page as a cannam@86: zero-length continuation of the current packet. The header flag cannam@86: should be set to 0x01 to indicate that the packet spanned, even though cannam@86: the span is a nil case as far as data is concerned.

cannam@86: cannam@86:

The encoding looks odd, but is properly optimized for speed and the cannam@86: expected case of the majority of packets being between 50 and 200 cannam@86: bytes (note that it is designed such that packets of wildly different cannam@86: sizes can be handled within the model; placing packet size cannam@86: restrictions on the encoder would have only slightly simplified design cannam@86: in page generation and increased overall encoder complexity).

cannam@86: cannam@86:

The main point behind tracking individual packets (and packet cannam@86: segments) is to allow more flexible encoding tricks that requiring cannam@86: explicit knowledge of packet size. An example is simple bandwidth cannam@86: limiting, implemented by simply truncating packets in the nominal case cannam@86: if the packet is arranged so that the least sensitive portion of the cannam@86: data comes last.

cannam@86: cannam@86:

Page header

cannam@86: cannam@86:

The headering mechanism is designed to avoid copying and re-assembly cannam@86: of the packet data (ie, making the packet segmentation process a cannam@86: logical one); the header can be generated directly from incoming cannam@86: packet data. The encoder buffers packet data until it finishes a cannam@86: complete page at which point it writes the header followed by the cannam@86: buffered packet segments.

cannam@86: cannam@86:

capture_pattern

cannam@86: cannam@86:

A header begins with a capture pattern that simplifies identifying cannam@86: pages; once the decoder has found the capture pattern it can do a more cannam@86: intensive job of verifying that it has in fact found a page boundary cannam@86: (as opposed to an inadvertent coincidence in the byte stream).

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:   0  0x4f 'O'
cannam@86:   1  0x67 'g'
cannam@86:   2  0x67 'g'
cannam@86:   3  0x53 'S'  
cannam@86:

cannam@86: cannam@86:

stream_structure_version

cannam@86: cannam@86:

The capture pattern is followed by the stream structure revision:

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:   4  0x00
cannam@86:

cannam@86: cannam@86:

header_type_flag

cannam@86: cannam@86:

The header type flag identifies this page's context in the bitstream:

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:   5  bitflags: 0x01: unset = fresh packet
cannam@86: 	               set = continued packet
cannam@86: 	       0x02: unset = not first page of logical bitstream
cannam@86:                        set = first page of logical bitstream (bos)
cannam@86: 	       0x04: unset = not last page of logical bitstream
cannam@86:                        set = last page of logical bitstream (eos)
cannam@86:

cannam@86: cannam@86:

absolute granule position

cannam@86: cannam@86:

(This is packed in the same way the rest of Ogg data is packed; LSb cannam@86: of LSB first. Note that the 'position' data specifies a 'sample' cannam@86: number (eg, in a CD quality sample is four octets, 16 bits for left cannam@86: and 16 bits for right; in video it would likely be the frame number. cannam@86: It is up to the specific codec in use to define the semantic meaning cannam@86: of the granule position value). The position specified is the total cannam@86: samples encoded after including all packets finished on this page cannam@86: (packets begun on this page but continuing on to the next page do not cannam@86: count). The rationale here is that the position specified in the cannam@86: frame header of the last page tells how long the data coded by the cannam@86: bitstream is. A truncated stream will still return the proper number cannam@86: of samples that can be decoded fully.

cannam@86: cannam@86:

A special value of '-1' (in two's complement) indicates that no packets cannam@86: finish on this page.

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:   6  0xXX LSB
cannam@86:   7  0xXX
cannam@86:   8  0xXX
cannam@86:   9  0xXX
cannam@86:  10  0xXX
cannam@86:  11  0xXX
cannam@86:  12  0xXX
cannam@86:  13  0xXX MSB
cannam@86:

cannam@86: cannam@86:

stream serial number

cannam@86: cannam@86:

Ogg allows for separate logical bitstreams to be mixed at page cannam@86: granularity in a physical bitstream. The most common case would be cannam@86: sequential arrangement, but it is possible to interleave pages for cannam@86: two separate bitstreams to be decoded concurrently. The serial cannam@86: number is the means by which pages physical pages are associated with cannam@86: a particular logical stream. Each logical stream must have a unique cannam@86: serial number within a physical stream:

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:  14  0xXX LSB
cannam@86:  15  0xXX
cannam@86:  16  0xXX
cannam@86:  17  0xXX MSB
cannam@86:

cannam@86: cannam@86:

page sequence no

cannam@86: cannam@86:

Page counter; lets us know if a page is lost (useful where packets cannam@86: span page boundaries).

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:  18  0xXX LSB
cannam@86:  19  0xXX
cannam@86:  20  0xXX
cannam@86:  21  0xXX MSB
cannam@86:

cannam@86: cannam@86:

page checksum

cannam@86: cannam@86:

32 bit CRC value (direct algorithm, initial val and final XOR = 0, cannam@86: generator polynomial=0x04c11db7). The value is computed over the cannam@86: entire header (with the CRC field in the header set to zero) and then cannam@86: continued over the page. The CRC field is then filled with the cannam@86: computed value.

cannam@86: cannam@86:

(A thorough discussion of CRC algorithms can be found in "A cannam@86: Painless Guide to CRC Error Detection Algorithms" by Ross cannam@86: Williams ross@ross.net.)

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:  22  0xXX LSB
cannam@86:  23  0xXX
cannam@86:  24  0xXX
cannam@86:  25  0xXX MSB
cannam@86:

cannam@86: cannam@86:

page_segments

cannam@86: cannam@86:

The number of segment entries to appear in the segment table. The cannam@86: maximum number of 255 segments (255 bytes each) sets the maximum cannam@86: possible physical page size at 65307 bytes or just under 64kB (thus cannam@86: we know that a header corrupted so as destroy sizing/alignment cannam@86: information will not cause a runaway bitstream. We'll read in the cannam@86: page according to the corrupted size information that's guaranteed to cannam@86: be a reasonable size regardless, notice the checksum mismatch, drop cannam@86: sync and then look for recapture).

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:  26 0x00-0xff (0-255)
cannam@86:

cannam@86: cannam@86:

segment_table (containing packet lacing values)

cannam@86: cannam@86:

The lacing values for each packet segment physically appearing in cannam@86: this page are listed in contiguous order.

cannam@86: cannam@86:


cannam@86:  byte value
cannam@86: 
cannam@86:  27 0x00-0xff (0-255)
cannam@86:  [...]
cannam@86:  n  0x00-0xff (0-255, n=page_segments+26)
cannam@86:

cannam@86: cannam@86:

Total page size is calculated directly from the known header size and cannam@86: lacing values in the segment table. Packet data segments follow cannam@86: immediately after the header.

cannam@86: cannam@86:

Page headers typically impose a flat .25-.5% space overhead assuming cannam@86: nominal ~8k page sizes. The segmentation table needed for exact cannam@86: packet recovery in the streaming layer adds approximately .5-1% cannam@86: nominal assuming expected encoder behavior in the 44.1kHz, 128kbps cannam@86: stereo encodings.

cannam@86: cannam@86:

cannam@86: cannam@86: cannam@86: