Chris@1: Chris@1: Chris@1:
Chris@1: Chris@1: Chris@1:The Ogg transport bitstream is designed to provide framing, error Chris@1: protection and seeking structure for higher-level codec streams that Chris@1: consist of raw, unencapsulated data packets, such as the Vorbis audio Chris@1: codec or Theora video codec.
Chris@1: Chris@1:Vorbis encodes short-time blocks of PCM data into raw packets of Chris@1: bit-packed data. These raw packets may be used directly by transport Chris@1: mechanisms that provide their own framing and packet-separation Chris@1: mechanisms (such as UDP datagrams). For stream based storage (such as Chris@1: files) and transport (such as TCP streams or pipes), Vorbis uses the Chris@1: Ogg bitstream format to provide framing/sync, sync recapture Chris@1: after error, landmarks during seeking, and enough information to Chris@1: properly separate data back into packets at the original packet Chris@1: boundaries without relying on decoding to find packet boundaries.
Chris@1: Chris@1:A logical Ogg bitstream is a contiguous stream of Chris@1: sequential pages belonging only to the logical bitstream. A Chris@1: physical Ogg bitstream is constructed from one or more Chris@1: than one logical Ogg bitstream (the simplest physical bitstream Chris@1: is simply a single logical bitstream). We describe below the exact Chris@1: formatting of an Ogg logical bitstream. Combining logical Chris@1: bitstreams into more complex physical bitstreams is described in the Chris@1: Ogg bitstream overview. The exact Chris@1: mapping of raw Vorbis packets into a valid Ogg Vorbis physical Chris@1: bitstream is described in the Vorbis I Specification.
Chris@1: Chris@1:An Ogg stream is structured by dividing incoming packets into Chris@1: segments of up to 255 bytes and then wrapping a group of contiguous Chris@1: packet segments into a variable length page preceded by a page Chris@1: header. Both the header size and page size are variable; the page Chris@1: header contains sizing information and checksum data to determine Chris@1: header/page size and data integrity.
Chris@1: Chris@1:The bitstream is captured (or recaptured) by looking for the beginning Chris@1: of a page, specifically the capture pattern. Once the capture pattern Chris@1: is found, the decoder verifies page sync and integrity by computing Chris@1: and comparing the checksum. At that point, the decoder can extract the Chris@1: packets themselves.
Chris@1: Chris@1:Packets are logically divided into multiple segments before encoding Chris@1: into a page. Note that the segmentation and fragmentation process is a Chris@1: logical one; it's used to compute page header values and the original Chris@1: page data need not be disturbed, even when a packet spans page Chris@1: boundaries.
Chris@1: Chris@1:The raw packet is logically divided into [n] 255 byte segments and a Chris@1: last fractional segment of < 255 bytes. A packet size may well Chris@1: consist only of the trailing fractional segment, and a fractional Chris@1: segment may be zero length. These values, called "lacing values" are Chris@1: then saved and placed into the header segment table.
Chris@1: Chris@1:An example should make the basic concept clear:
Chris@1: Chris@1:Chris@1: Chris@1: raw packet: Chris@1: ___________________________________________ Chris@1: |______________packet data__________________| 753 bytes Chris@1: Chris@1: lacing values for page header segment table: 255,255,243 Chris@1: Chris@1:Chris@1: Chris@1:
We simply add the lacing values for the total size; the last lacing Chris@1: value for a packet is always the value that is less than 255. Note Chris@1: that this encoding both avoids imposing a maximum packet size as well Chris@1: as imposing minimum overhead on small packets (as opposed to, eg, Chris@1: simply using two bytes at the head of every packet and having a max Chris@1: packet size of 32k. Small packets (<255, the typical case) are Chris@1: penalized with twice the segmentation overhead). Using the lacing Chris@1: values as suggested, small packets see the minimum possible Chris@1: byte-aligned overhead (1 byte) and large packets, over 512 bytes or Chris@1: so, see a fairly constant ~.5% overhead on encoding space.
Chris@1: Chris@1:Note that a lacing value of 255 implies that a second lacing value Chris@1: follows in the packet, and a value of < 255 marks the end of the Chris@1: packet after that many additional bytes. A packet of 255 bytes (or a Chris@1: multiple of 255 bytes) is terminated by a lacing value of 0:
Chris@1: Chris@1:Chris@1: raw packet: Chris@1: _______________________________ Chris@1: |________packet data____________| 255 bytes Chris@1: Chris@1: lacing values: 255, 0 Chris@1:Chris@1: Chris@1:
Note also that a 'nil' (zero length) packet is not an error; it Chris@1: consists of nothing more than a lacing value of zero in the header.
Chris@1: Chris@1:Packets are not restricted to beginning and ending within a page, Chris@1: although individual segments are, by definition, required to do so. Chris@1: Packets are not restricted to a maximum size, although excessively Chris@1: large packets in the data stream are discouraged.
Chris@1: Chris@1:After segmenting a packet, the encoder may decide not to place all the Chris@1: resulting segments into the current page; to do so, the encoder places Chris@1: the lacing values of the segments it wishes to belong to the current Chris@1: page into the current segment table, then finishes the page. The next Chris@1: page is begun with the first value in the segment table belonging to Chris@1: the next packet segment, thus continuing the packet (data in the Chris@1: packet body must also correspond properly to the lacing values in the Chris@1: spanned pages. The segment data in the first packet corresponding to Chris@1: the lacing values of the first page belong in that page; packet Chris@1: segments listed in the segment table of the following page must begin Chris@1: the page body of the subsequent page).
Chris@1: Chris@1:The last mechanic to spanning a page boundary is to set the header Chris@1: flag in the new page to indicate that the first lacing value in the Chris@1: segment table continues rather than begins a packet; a header flag of Chris@1: 0x01 is set to indicate a continued packet. Although mandatory, it Chris@1: is not actually algorithmically necessary; one could inspect the Chris@1: preceding segment table to determine if the packet is new or Chris@1: continued. Adding the information to the packet_header flag allows a Chris@1: simpler design (with no overhead) that needs only inspect the current Chris@1: page header after frame capture. This also allows faster error Chris@1: recovery in the event that the packet originates in a corrupt Chris@1: preceding page, implying that the previous page's segment table Chris@1: cannot be trusted.
Chris@1: Chris@1:Note that a packet can span an arbitrary number of pages; the above Chris@1: spanning process is repeated for each spanned page boundary. Also a Chris@1: 'zero termination' on a packet size that is an even multiple of 255 Chris@1: must appear even if the lacing value appears in the next page as a Chris@1: zero-length continuation of the current packet. The header flag Chris@1: should be set to 0x01 to indicate that the packet spanned, even though Chris@1: the span is a nil case as far as data is concerned.
Chris@1: Chris@1:The encoding looks odd, but is properly optimized for speed and the Chris@1: expected case of the majority of packets being between 50 and 200 Chris@1: bytes (note that it is designed such that packets of wildly different Chris@1: sizes can be handled within the model; placing packet size Chris@1: restrictions on the encoder would have only slightly simplified design Chris@1: in page generation and increased overall encoder complexity).
Chris@1: Chris@1:The main point behind tracking individual packets (and packet Chris@1: segments) is to allow more flexible encoding tricks that requiring Chris@1: explicit knowledge of packet size. An example is simple bandwidth Chris@1: limiting, implemented by simply truncating packets in the nominal case Chris@1: if the packet is arranged so that the least sensitive portion of the Chris@1: data comes last.
Chris@1: Chris@1: Chris@1:The headering mechanism is designed to avoid copying and re-assembly Chris@1: of the packet data (ie, making the packet segmentation process a Chris@1: logical one); the header can be generated directly from incoming Chris@1: packet data. The encoder buffers packet data until it finishes a Chris@1: complete page at which point it writes the header followed by the Chris@1: buffered packet segments.
Chris@1: Chris@1:A header begins with a capture pattern that simplifies identifying Chris@1: pages; once the decoder has found the capture pattern it can do a more Chris@1: intensive job of verifying that it has in fact found a page boundary Chris@1: (as opposed to an inadvertent coincidence in the byte stream).
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 0 0x4f 'O' Chris@1: 1 0x67 'g' Chris@1: 2 0x67 'g' Chris@1: 3 0x53 'S' Chris@1:Chris@1: Chris@1:
The capture pattern is followed by the stream structure revision:
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 4 0x00 Chris@1:Chris@1: Chris@1:
The header type flag identifies this page's context in the bitstream:
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 5 bitflags: 0x01: unset = fresh packet Chris@1: set = continued packet Chris@1: 0x02: unset = not first page of logical bitstream Chris@1: set = first page of logical bitstream (bos) Chris@1: 0x04: unset = not last page of logical bitstream Chris@1: set = last page of logical bitstream (eos) Chris@1:Chris@1: Chris@1:
(This is packed in the same way the rest of Ogg data is packed; LSb Chris@1: of LSB first. Note that the 'position' data specifies a 'sample' Chris@1: number (eg, in a CD quality sample is four octets, 16 bits for left Chris@1: and 16 bits for right; in video it would likely be the frame number. Chris@1: It is up to the specific codec in use to define the semantic meaning Chris@1: of the granule position value). The position specified is the total Chris@1: samples encoded after including all packets finished on this page Chris@1: (packets begun on this page but continuing on to the next page do not Chris@1: count). The rationale here is that the position specified in the Chris@1: frame header of the last page tells how long the data coded by the Chris@1: bitstream is. A truncated stream will still return the proper number Chris@1: of samples that can be decoded fully.
Chris@1: Chris@1:A special value of '-1' (in two's complement) indicates that no packets Chris@1: finish on this page.
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 6 0xXX LSB Chris@1: 7 0xXX Chris@1: 8 0xXX Chris@1: 9 0xXX Chris@1: 10 0xXX Chris@1: 11 0xXX Chris@1: 12 0xXX Chris@1: 13 0xXX MSB Chris@1:Chris@1: Chris@1:
Ogg allows for separate logical bitstreams to be mixed at page Chris@1: granularity in a physical bitstream. The most common case would be Chris@1: sequential arrangement, but it is possible to interleave pages for Chris@1: two separate bitstreams to be decoded concurrently. The serial Chris@1: number is the means by which pages physical pages are associated with Chris@1: a particular logical stream. Each logical stream must have a unique Chris@1: serial number within a physical stream:
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 14 0xXX LSB Chris@1: 15 0xXX Chris@1: 16 0xXX Chris@1: 17 0xXX MSB Chris@1:Chris@1: Chris@1:
Page counter; lets us know if a page is lost (useful where packets Chris@1: span page boundaries).
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 18 0xXX LSB Chris@1: 19 0xXX Chris@1: 20 0xXX Chris@1: 21 0xXX MSB Chris@1:Chris@1: Chris@1:
32 bit CRC value (direct algorithm, initial val and final XOR = 0, Chris@1: generator polynomial=0x04c11db7). The value is computed over the Chris@1: entire header (with the CRC field in the header set to zero) and then Chris@1: continued over the page. The CRC field is then filled with the Chris@1: computed value.
Chris@1: Chris@1:(A thorough discussion of CRC algorithms can be found in "A Chris@1: Painless Guide to CRC Error Detection Algorithms" by Ross Chris@1: Williams ross@ross.net.)
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 22 0xXX LSB Chris@1: 23 0xXX Chris@1: 24 0xXX Chris@1: 25 0xXX MSB Chris@1:Chris@1: Chris@1:
The number of segment entries to appear in the segment table. The Chris@1: maximum number of 255 segments (255 bytes each) sets the maximum Chris@1: possible physical page size at 65307 bytes or just under 64kB (thus Chris@1: we know that a header corrupted so as destroy sizing/alignment Chris@1: information will not cause a runaway bitstream. We'll read in the Chris@1: page according to the corrupted size information that's guaranteed to Chris@1: be a reasonable size regardless, notice the checksum mismatch, drop Chris@1: sync and then look for recapture).
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 26 0x00-0xff (0-255) Chris@1:Chris@1: Chris@1:
The lacing values for each packet segment physically appearing in Chris@1: this page are listed in contiguous order.
Chris@1: Chris@1:Chris@1: byte value Chris@1: Chris@1: 27 0x00-0xff (0-255) Chris@1: [...] Chris@1: n 0x00-0xff (0-255, n=page_segments+26) Chris@1:Chris@1: Chris@1:
Total page size is calculated directly from the known header size and Chris@1: lacing values in the segment table. Packet data segments follow Chris@1: immediately after the header.
Chris@1: Chris@1:Page headers typically impose a flat .25-.5% space overhead assuming Chris@1: nominal ~8k page sizes. The segmentation table needed for exact Chris@1: packet recovery in the streaming layer adds approximately .5-1% Chris@1: nominal assuming expected encoder behavior in the 44.1kHz, 128kbps Chris@1: stereo encodings.
Chris@1: Chris@1: