cannam@86: cannam@86: cannam@86:
cannam@86: cannam@86: cannam@86:This document serves as starting point for understanding the design cannam@86: and implementation of the Ogg container format. If you're new to Ogg cannam@86: or merely want a high-level technical overview, start reading here. cannam@86: Other documents linked from the index page cannam@86: give distilled technical descriptions and references of the container cannam@86: mechanisms. This document is intended to aid understanding. cannam@86: cannam@86:
Ogg is intended to be a simplest-possible container, concerned only cannam@86: with framing, ordering, and interleave. It can be used as a stream delivery cannam@86: mechanism, for media file storage, or as a building block toward cannam@86: implementing a more complex, non-linear container (for example, see cannam@86: the Skeleton or Annodex/CMML). cannam@86: cannam@86:
The Ogg container is not intended to be a monolithic cannam@86: 'kitchen-sink'. It exists only to frame and deliver in-order stream cannam@86: data and as such is vastly simpler than most other containers. cannam@86: Elementary and multiplexed streams are both constructed entirely from a cannam@86: single building block (an Ogg page) comprised of eight fields cannam@86: totalling twenty-eight bytes (the page header) a list of packet lengths cannam@86: (up to 255 bytes) and payload data (up to 65025 bytes). The structure cannam@86: of every page is the same. There are no optional fields or alternate cannam@86: encodings. cannam@86: cannam@86:
Stream and media metadata is contained in Ogg and not built into cannam@86: the Ogg container itself. Metadata is thus compartmentalized and cannam@86: layered rather than part of a monolithic design, an especially good cannam@86: idea as no two groups seem able to agree on what a complete or cannam@86: complete-enough metadata set should be. In this way, the container and cannam@86: container implementation are isolated from unnecessary metadata design cannam@86: flux. cannam@86: cannam@86:
The Ogg container is primarily a streaming format, cannam@86: encapsulating chronological, time-linear mixed media into a single cannam@86: delivery stream or file. The design is such that an application can cannam@86: always encode and/or decode all features of a bitstream in one pass cannam@86: with no seeking and minimal buffering. Seeking to provide optimized cannam@86: encoding (such as two-pass encoding) or interactive decoding (such as cannam@86: scrubbing or instant replay) is not disallowed or discouraged, however cannam@86: no container feature requires nonlinear access of the bitstream. cannam@86: cannam@86:
Ogg is designed to contain any size data payload with bounded, cannam@86: predictable efficiency. Ogg packets have no maximum size and a cannam@86: zero-byte minimum size. There is no restriction on size changes from cannam@86: packet to packet. Variable size packets do not require the use of any cannam@86: optional or additional container features. There is no optimal cannam@86: suggested packet size, though special consideration was paid to make cannam@86: sure 50-200 byte packets were no less efficient than larger packet cannam@86: sizes. The original design criteria was a 2% overhead at 50 byte cannam@86: packets, dropping to a maximum working overhead of 1% with larger cannam@86: packets, and a typical working overhead of .5-.7% for most practical cannam@86: uses. cannam@86: cannam@86:
Ogg is a byte-aligned container with no context-dependent, optional cannam@86: or variable-length fields. Ogg requires no repacking of codec data. cannam@86: The page structure is written out in-line as packet data is submitted cannam@86: to the streaming abstraction. In addition, it is possible to cannam@86: implement both Ogg mux and demux as MT-hot zero-copy abstractions (as cannam@86: is done in the Tremor sourcebase). cannam@86: cannam@86:
Ogg is designed for efficient and immediate stream capture with cannam@86: high confidence. Although packets have no size limit in Ogg, pages cannam@86: are a maximum of just under 64kB meaning that any Ogg stream can be cannam@86: captured with confidence after seeing 128kB of data or less [worst cannam@86: case; typical figure is 6kB] from any random starting point in the cannam@86: stream. cannam@86: cannam@86:
Ogg implements simple coarse- and fine-grained seeking by design. cannam@86: cannam@86:
Coarse seeking may be performed by simply 'moving the tone arm' to a cannam@86: new position and 'dropping the needle'. Rapid capture with cannam@86: accompanying timecode from any location in an Ogg file is guaranteed cannam@86: by the stream design. From the acquisition of the first timecode, cannam@86: all data needed to play back from that time code forward is ahead of cannam@86: the stream cursor. cannam@86: cannam@86:
Ogg implements full sample-granularity seeking using an cannam@86: interpolated bisection search built on the capture and timecode cannam@86: mechanisms used by coarse seeking. As above, once a search finds cannam@86: the desired timecode, all data needed to play back from that time code cannam@86: forward is ahead of the stream cursor. cannam@86: cannam@86:
Both coarse and fine seeking use the page structure and sequencing cannam@86: inherent to the Ogg format. All Ogg streams are fully seekable from cannam@86: creation; seekability is unaffected by truncation or missing data, and cannam@86: is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor cannam@86: heuristic. cannam@86: cannam@86:
Seeking without use of an index is a major point of the Ogg cannam@86: design. There two primary reasons why Ogg transport forgoes an index: cannam@86: cannam@86:
In addition, it must be possible to create an Ogg stream in a cannam@86: single pass. Although an optional index can simply be tacked on the cannam@86: end of the created stream, some software groups object to cannam@86: end-positioned indexes and claim to be unwilling to support indexes cannam@86: not located at the stream beginning. cannam@86: cannam@86:
All this said, it's become clear that an optional index is a cannam@86: demanded feature. For this reason, the OggSkeleton now defines a cannam@86: proposed index. cannam@86: cannam@86:
Ogg multiplexes streams by interleaving pages from multiple elementary streams into a cannam@86: multiplexed stream in time order. The multiplexed pages are not cannam@86: altered. Muxing an Ogg AV stream out of separate audio, cannam@86: video and data streams is akin to shuffling several decks of cards cannam@86: together into a single deck; the cards themselves remain unchanged. cannam@86: Demultiplexing is similarly simple (as the cards are marked). cannam@86: cannam@86:
The goal of this design is to make the mux/demux operation as cannam@86: trivial as possible to allow live streaming systems to build and cannam@86: rebuild streams on the fly with minimal CPU usage and no additional cannam@86: storage or latency requirements. cannam@86: cannam@86:
Ogg streams belong to one of two categories, "Continuous" streams and cannam@86: "Discontinuous" streams. cannam@86: cannam@86:
A stream that provides a gapless, time-continuous media type with a cannam@86: fine-grained timebase is considered to be 'Continuous'. A continuous cannam@86: stream should never be starved of data. Examples of continuous data cannam@86: types include broadcast audio and video. cannam@86: cannam@86:
A stream that delivers data in a potentially irregular pattern or cannam@86: with widely spaced timing gaps is considered to be 'Discontinuous'. A cannam@86: discontinuous stream may be best thought of as data representing cannam@86: scattered events; although they happen in order, they are typically cannam@86: unconnected data often located far apart. One example of a cannam@86: discontinuous stream types would be captioning such as Ogg Kate. Although it's cannam@86: possible to design captions as a continuous stream type, it's most cannam@86: natural to think of captions as widely spaced pieces of text with cannam@86: little happening between. cannam@86: cannam@86:
The fundamental reason for distinction between continuous and cannam@86: discontinuous streams concerns buffering. cannam@86: cannam@86:
A continuous stream is, by definition, gapless. Ogg buffering is based cannam@86: on the simple premise of never allowing an active continuous stream cannam@86: to starve for data during decode; buffering works ahead until all cannam@86: continuous streams in a physical stream have data ready and no further. cannam@86: cannam@86:
Discontinuous stream data is not assumed to be predictable. The cannam@86: buffering design takes discontinuous data 'as it comes' rather than cannam@86: working ahead to look for future discontinuous data for a potentially cannam@86: unbounded period. Thus, the buffering process makes no attempt to fill cannam@86: discontinuous stream buffers; their pages simply 'fall out' of the cannam@86: stream when continuous streams are handled properly. cannam@86: cannam@86:
Buffering requirements in this design need not be explicitly cannam@86: declared or managed in the encoded stream. The decoder simply reads as cannam@86: much data as is necessary to keep all continuous stream types gapless cannam@86: and no more, with discontinuous data processed as it arrives in the cannam@86: continuous data. Buffering is implicitly optimal for the given cannam@86: stream. Because all pages of all data types are stamped with absolute cannam@86: timing information within the stream, inter-stream synchronization cannam@86: timing is always maintained without the need for explicitly declared cannam@86: buffer-ahead hinting. cannam@86: cannam@86:
Ogg does not replicate codec-specific metadata into the mux layer cannam@86: in an attempt to make the mux and codec layer implementations 'fully cannam@86: separable'. Things like specific timebase, keyframing strategy, frame cannam@86: duration, etc, do not appear in the Ogg container. The mux layer is, cannam@86: instead, expected to query a codec through a centralized interface, cannam@86: left to the implementation, for this data when it is needed. cannam@86: cannam@86:
Though modern design wisdom usually prefers to predict all possible cannam@86: needs of current and future codecs then embed these dependencies and cannam@86: the required metadata into the container itself, this strategy cannam@86: increases container specification complexity, fragility, and rigidity. cannam@86: The mux and codec code becomes more independent, but the cannam@86: specifications become logically less independent. A codec can't do cannam@86: what a container hasn't already provided for. Novel codecs are harder cannam@86: to support, and you can do fewer useful things with the ones you've cannam@86: already got (eg, try to make a good splitter without using any codecs. cannam@86: Such a splitter is limited to splitting at keyframes only, or building cannam@86: yet another new mechanism into the container layer to mark what frames cannam@86: to skip displaying). cannam@86: cannam@86:
Ogg's design goes the opposite direction, where the specification cannam@86: is to be as simple, easy to understand, and 'proofed' against novel cannam@86: codecs as possible. When an Ogg mux layer requires codec-specific cannam@86: information, it queries the codec (or a codec stub). This trades a cannam@86: more complex implementation for a simpler, more flexible cannam@86: specification. cannam@86: cannam@86:
The Ogg container itself does not define a metadata system for cannam@86: declaring the structure and interrelations between multiple media cannam@86: types in a muxed stream. That is, the Ogg container itself does not cannam@86: specify data like 'which steam is the subtitle stream?' or 'which cannam@86: video stream is the primary angle?'. This metadata still exists, but cannam@86: is stored by the Ogg container rather than being built into the Ogg cannam@86: container itself. Xiph specifies the 'Skeleton' metadata format for Ogg cannam@86: streams, but this decoupling of container and stream structure cannam@86: metadata means it is possible to use Ogg with any metadata cannam@86: specification without altering the container itself, or without stream cannam@86: structure metadata at all. cannam@86: cannam@86:
Every Ogg page is stamped with a 64 bit 'granule position' that cannam@86: serves as an absolute timestamp for mux and seeking. A few nifty cannam@86: little tricks are usually also embedded in the granpos state, but cannam@86: we'll leave those aside for the moment (strictly speaking, they're cannam@86: part of each codec's mapping, not Ogg). cannam@86: cannam@86:
As previously mentioned above, granule positions are mapped into cannam@86: absolute timestamps by the codec, rather than being a hard timestamp. cannam@86: This allows maximally efficient use of the available 64 bits to cannam@86: address every sample/frame position without approximation while cannam@86: supporting new and previously unknown timebase encodings without cannam@86: needing to extend or update the mux layer. When a codec needs a novel cannam@86: timebase, it simply brings the code for that mapping along with it. cannam@86: This is not a theoretical curiosity; new, wholly novel timebases were cannam@86: deployed with the adoption of both Theora and Dirac. "Rolling INTRA" cannam@86: (keyframeless video) also benefits from novel use of the granule cannam@86: position. cannam@86: cannam@86:
Ogg codecs place raw compressed data into packets. cannam@86: Packets are octet payloads containing the data needed for a single cannam@86: decompressed unit, eg, one video frame. Packets have no maximum size cannam@86: and may be zero length. They do not generally have any framing cannam@86: information; strung together, the unframed packets form a logical cannam@86: bitstream of codec data with no internal landmarks. cannam@86: cannam@86:
Packets of raw codec data are not typically internally framed. cannam@86: When they are strung together into a stream without any container to cannam@86: provide framing, they lose their individual boundaries. Seek and cannam@86: capture are not possible within an unframed stream, and for many cannam@86: codecs with variable length payloads and/or early-packet termination cannam@86: (such as Vorbis), it may become impossible to recover the original cannam@86: frame boundaries even if the stream is scanned linearly from cannam@86: beginning to end. cannam@86: cannam@86:
Logical bitstream packets are grouped and framed into Ogg pages cannam@86: along with a unique stream serial number to produce a cannam@86: physical bitstream. An elementary stream is a cannam@86: physical bitstream containing only a single logical bitstream. Each cannam@86: page is a self contained entity, although a packet may be split and cannam@86: encoded across one or more pages. The page decode mechanism is cannam@86: designed to recognize, verify and handle single pages at a time from cannam@86: the overall bitstream. cannam@86: cannam@86:
The primary purpose of a container is to provide framing for raw cannam@86: packets, marking the packet boundaries so the exact packets can be cannam@86: retrieved for decode later. The container also provides secondary cannam@86: functions such as capture, timestamping, sequencing, stream cannam@86: identification and so on. Not all of these functions are represented in the diagram. cannam@86: cannam@86:
In the Ogg container, pages do not necessarily contain cannam@86: integer numbers of packets. Packets may span across page boundaries cannam@86: or even multiple pages. This is necessary as pages have a maximum cannam@86: possible size in order to provide capture guarantees, but packet cannam@86: size is unbounded. cannam@86:
Ogg Bitstream Framing specifies cannam@86: the page format of an Ogg bitstream, the packet coding process cannam@86: and elementary bitstreams in detail. cannam@86: cannam@86:
Multiple logical/elementary bitstreams can be combined into a single cannam@86: multiplexed bitstream by interleaving whole pages from each cannam@86: contributing elementary stream in time order. The result is a single cannam@86: physical stream that multiplexes and frames multiple logical streams. cannam@86: Each logical stream is identified by the unique stream serial number cannam@86: stamped in its pages. A physical stream may include a 'meta-header' cannam@86: (such as the Ogg Skeleton) comprising its cannam@86: own Ogg page at the beginning of the physical stream. A decoder cannam@86: recovers the original logical/elementary bitstreams out of the cannam@86: physical bitstream by taking the pages in order from the physical cannam@86: bitstream and redirecting them into the appropriate logical decoding cannam@86: entity. cannam@86: cannam@86:
Multiple media types are mutliplexed into a single Ogg stream by cannam@86: interleaving the pages from each elementary physical stream. cannam@86: cannam@86:
Ogg Bitstream Multiplexing specifies cannam@86: proper multiplexing of an Ogg bitstream in detail. cannam@86: cannam@86:
Multiple Ogg physical bitstreams may be concatenated into a single new cannam@86: stream; this is chaining. The bitstreams do not overlap; the cannam@86: final page of a given logical bitstream is immediately followed by the cannam@86: initial page of the next.
cannam@86: cannam@86:Each logical bitstream in a chain must have a unique serial number cannam@86: within the scope of the full physical bitstream, not only within a cannam@86: particular link or segment of the chain.
cannam@86: cannam@86:Within Ogg, each stream must be declared (by the codec) to be cannam@86: continuous- or discontinuous-time. Most codecs treat all streams they cannam@86: use as either inherently continuous- or discontinuous-time, although cannam@86: this is not a requirement. A codec may, as part of its mapping, choose cannam@86: according to data in the initial header. cannam@86: cannam@86:
Continuous-time pages are stamped by end-time, discontinuous pages cannam@86: are stamped by begin-time. Pages in a multiplexed stream are cannam@86: interleaved in order of the time stamp regardless of stream type. cannam@86: Both continuous and discontinuous logical streams are used to seek cannam@86: within a physical stream, however only continuous streams are used to cannam@86: determine buffering depth; because discontinuous streams are stamped cannam@86: by start time, they will always 'fall out' at the proper time when cannam@86: buffering the continuous streams. See 'Examples' for an illustration cannam@86: of the buffering mechanism. cannam@86: cannam@86:
Multiplexing requirements within Ogg are straightforward. When cannam@86: constructing a single-link (unchained) physical bitstream consisting cannam@86: of multiple elementary streams: cannam@86: cannam@86:
The initial header for each stream appears in sequence, each cannam@86: header on a single page. All initial headers must appear with no cannam@86: intervening data (no auxiliary header pages or packets, no data pages cannam@86: or packets). Order of the initial headers is unspecified. The cannam@86: 'beginning of stream' flag is set on each initial header. cannam@86: cannam@86:
All auxiliary headers for all streams must follow. Order cannam@86: is unspecified. The final auxiliary header of each stream must flush cannam@86: its page. cannam@86: cannam@86:
Data pages for each stream follow, interleaved in time order. cannam@86: cannam@86:
The final page of each stream sets the 'end of stream' flag. cannam@86: Unlike initial pages, terminal pages for the logical bitstreams need cannam@86: not occur contiguously; indeed it may not be possible for them to do so. cannam@86:
Each grouped bitstream must have a unique serial number within the cannam@86: scope of the physical bitstream.
cannam@86: cannam@86:Multiplexed and/or unmultiplexed bitstreams may be chained cannam@86: consecutively. Such a physical bitstream obeys all the rules of both cannam@86: chained and multiplexed streams. Each link, when unchained, must cannam@86: stand on its own as a valid physical bitstream. Chained streams do cannam@86: not mix or interleave; a new segment may not begin until all streams cannam@86: in the preceding segment have terminated.
cannam@86: cannam@86:Each codec is allowed some freedom in deciding how its logical cannam@86: bitstream is encapsulated into an Ogg bitstream (even if it is a cannam@86: trivial mapping, eg, 'plop the packets in and go'). This is the cannam@86: codec's mapping. Ogg imposes a few mapping requirements cannam@86: on any codec. cannam@86: cannam@86:
The framing specification defines cannam@86: 'beginning of stream' and 'end of stream' page markers via a header cannam@86: flag (it is possible for a stream to consist of a single page). A cannam@86: correct stream always consists of an integer number of pages, an easy cannam@86: requirement given the variable size nature of pages.
cannam@86: cannam@86:The first page of an elementary Ogg bitstream consists of a single, cannam@86: small 'initial header' packet that must include sufficient information cannam@86: to identify the exact CODEC type. From this initial header, the codec cannam@86: must also be able to determine its timebase and whether or not it is a cannam@86: continuous- or discontinuous-time stream. The initial header must fit cannam@86: on a single page. If a codec makes use of auxiliary headers (for cannam@86: example, Vorbis uses two auxiliary headers), these headers must follow cannam@86: the initial header immediately. The last header finishes its page; cannam@86: data begins on a fresh page. cannam@86: cannam@86:
As an example, Ogg Vorbis places the name and revision of the cannam@86: Vorbis CODEC, the audio rate and the audio quality into this initial cannam@86: header. Vorbis comments and detailed codec setup appears in the larger cannam@86: auxiliary headers.
cannam@86: cannam@86:Granule positions must be translatable to an exact absolute cannam@86: time value. As described above, the mux layer is permitted to query a cannam@86: codec or codec stub plugin to perform this mapping. It is not cannam@86: necessary for an absolute time to be mappable into a single unique cannam@86: granule position value. cannam@86: cannam@86:
Codecs are not required to use a fixed duration-per-packet (for cannam@86: example, Vorbis does not). the mux layer is permitted to query a cannam@86: codec or codec stub plugin for the time duration of a packet. cannam@86: cannam@86:
Although an absolute time need not be translatable to a unique cannam@86: granule position, a codec must be able to determine the unique granule cannam@86: position of the current packet using the granule position of a cannam@86: preceeding packet. cannam@86: cannam@86:
Packets and pages must be arranged in ascending cannam@86: granule-position and time order. cannam@86: cannam@86:
Below, we present an example of a multiplexed and chained bitstream:
cannam@86: cannam@86:In this example, we see pages from five total logical bitstreams cannam@86: multiplexed into a physical bitstream. Note the following cannam@86: characteristics:
cannam@86: cannam@86: