Chris@1: Chris@1: Chris@1:
Chris@1: Chris@1: Chris@1:This document serves as starting point for understanding the design Chris@1: and implementation of the Ogg container format. If you're new to Ogg Chris@1: or merely want a high-level technical overview, start reading here. Chris@1: Other documents linked from the index page Chris@1: give distilled technical descriptions and references of the container Chris@1: mechanisms. This document is intended to aid understanding. Chris@1: Chris@1:
Ogg is intended to be a simplest-possible container, concerned only Chris@1: with framing, ordering, and interleave. It can be used as a stream delivery Chris@1: mechanism, for media file storage, or as a building block toward Chris@1: implementing a more complex, non-linear container (for example, see Chris@1: the Skeleton or Annodex/CMML). Chris@1: Chris@1:
The Ogg container is not intended to be a monolithic Chris@1: 'kitchen-sink'. It exists only to frame and deliver in-order stream Chris@1: data and as such is vastly simpler than most other containers. Chris@1: Elementary and multiplexed streams are both constructed entirely from a Chris@1: single building block (an Ogg page) comprised of eight fields Chris@1: totalling twenty-eight bytes (the page header) a list of packet lengths Chris@1: (up to 255 bytes) and payload data (up to 65025 bytes). The structure Chris@1: of every page is the same. There are no optional fields or alternate Chris@1: encodings. Chris@1: Chris@1:
Stream and media metadata is contained in Ogg and not built into Chris@1: the Ogg container itself. Metadata is thus compartmentalized and Chris@1: layered rather than part of a monolithic design, an especially good Chris@1: idea as no two groups seem able to agree on what a complete or Chris@1: complete-enough metadata set should be. In this way, the container and Chris@1: container implementation are isolated from unnecessary metadata design Chris@1: flux. Chris@1: Chris@1:
The Ogg container is primarily a streaming format, Chris@1: encapsulating chronological, time-linear mixed media into a single Chris@1: delivery stream or file. The design is such that an application can Chris@1: always encode and/or decode all features of a bitstream in one pass Chris@1: with no seeking and minimal buffering. Seeking to provide optimized Chris@1: encoding (such as two-pass encoding) or interactive decoding (such as Chris@1: scrubbing or instant replay) is not disallowed or discouraged, however Chris@1: no container feature requires nonlinear access of the bitstream. Chris@1: Chris@1:
Ogg is designed to contain any size data payload with bounded, Chris@1: predictable efficiency. Ogg packets have no maximum size and a Chris@1: zero-byte minimum size. There is no restriction on size changes from Chris@1: packet to packet. Variable size packets do not require the use of any Chris@1: optional or additional container features. There is no optimal Chris@1: suggested packet size, though special consideration was paid to make Chris@1: sure 50-200 byte packets were no less efficient than larger packet Chris@1: sizes. The original design criteria was a 2% overhead at 50 byte Chris@1: packets, dropping to a maximum working overhead of 1% with larger Chris@1: packets, and a typical working overhead of .5-.7% for most practical Chris@1: uses. Chris@1: Chris@1:
Ogg is a byte-aligned container with no context-dependent, optional Chris@1: or variable-length fields. Ogg requires no repacking of codec data. Chris@1: The page structure is written out in-line as packet data is submitted Chris@1: to the streaming abstraction. In addition, it is possible to Chris@1: implement both Ogg mux and demux as MT-hot zero-copy abstractions (as Chris@1: is done in the Tremor sourcebase). Chris@1: Chris@1:
Ogg is designed for efficient and immediate stream capture with Chris@1: high confidence. Although packets have no size limit in Ogg, pages Chris@1: are a maximum of just under 64kB meaning that any Ogg stream can be Chris@1: captured with confidence after seeing 128kB of data or less [worst Chris@1: case; typical figure is 6kB] from any random starting point in the Chris@1: stream. Chris@1: Chris@1:
Ogg implements simple coarse- and fine-grained seeking by design. Chris@1: Chris@1:
Coarse seeking may be performed by simply 'moving the tone arm' to a Chris@1: new position and 'dropping the needle'. Rapid capture with Chris@1: accompanying timecode from any location in an Ogg file is guaranteed Chris@1: by the stream design. From the acquisition of the first timecode, Chris@1: all data needed to play back from that time code forward is ahead of Chris@1: the stream cursor. Chris@1: Chris@1:
Ogg implements full sample-granularity seeking using an Chris@1: interpolated bisection search built on the capture and timecode Chris@1: mechanisms used by coarse seeking. As above, once a search finds Chris@1: the desired timecode, all data needed to play back from that time code Chris@1: forward is ahead of the stream cursor. Chris@1: Chris@1:
Both coarse and fine seeking use the page structure and sequencing Chris@1: inherent to the Ogg format. All Ogg streams are fully seekable from Chris@1: creation; seekability is unaffected by truncation or missing data, and Chris@1: is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor Chris@1: heuristic. Chris@1: Chris@1:
Seeking without use of an index is a major point of the Ogg Chris@1: design. There two primary reasons why Ogg transport forgoes an index: Chris@1: Chris@1:
In addition, it must be possible to create an Ogg stream in a Chris@1: single pass. Although an optional index can simply be tacked on the Chris@1: end of the created stream, some software groups object to Chris@1: end-positioned indexes and claim to be unwilling to support indexes Chris@1: not located at the stream beginning. Chris@1: Chris@1:
All this said, it's become clear that an optional index is a Chris@1: demanded feature. For this reason, the OggSkeleton now defines a Chris@1: proposed index. Chris@1: Chris@1:
Ogg multiplexes streams by interleaving pages from multiple elementary streams into a Chris@1: multiplexed stream in time order. The multiplexed pages are not Chris@1: altered. Muxing an Ogg AV stream out of separate audio, Chris@1: video and data streams is akin to shuffling several decks of cards Chris@1: together into a single deck; the cards themselves remain unchanged. Chris@1: Demultiplexing is similarly simple (as the cards are marked). Chris@1: Chris@1:
The goal of this design is to make the mux/demux operation as Chris@1: trivial as possible to allow live streaming systems to build and Chris@1: rebuild streams on the fly with minimal CPU usage and no additional Chris@1: storage or latency requirements. Chris@1: Chris@1:
Ogg streams belong to one of two categories, "Continuous" streams and Chris@1: "Discontinuous" streams. Chris@1: Chris@1:
A stream that provides a gapless, time-continuous media type with a Chris@1: fine-grained timebase is considered to be 'Continuous'. A continuous Chris@1: stream should never be starved of data. Examples of continuous data Chris@1: types include broadcast audio and video. Chris@1: Chris@1:
A stream that delivers data in a potentially irregular pattern or Chris@1: with widely spaced timing gaps is considered to be 'Discontinuous'. A Chris@1: discontinuous stream may be best thought of as data representing Chris@1: scattered events; although they happen in order, they are typically Chris@1: unconnected data often located far apart. One example of a Chris@1: discontinuous stream types would be captioning such as Ogg Kate. Although it's Chris@1: possible to design captions as a continuous stream type, it's most Chris@1: natural to think of captions as widely spaced pieces of text with Chris@1: little happening between. Chris@1: Chris@1:
The fundamental reason for distinction between continuous and Chris@1: discontinuous streams concerns buffering. Chris@1: Chris@1:
A continuous stream is, by definition, gapless. Ogg buffering is based Chris@1: on the simple premise of never allowing an active continuous stream Chris@1: to starve for data during decode; buffering works ahead until all Chris@1: continuous streams in a physical stream have data ready and no further. Chris@1: Chris@1:
Discontinuous stream data is not assumed to be predictable. The Chris@1: buffering design takes discontinuous data 'as it comes' rather than Chris@1: working ahead to look for future discontinuous data for a potentially Chris@1: unbounded period. Thus, the buffering process makes no attempt to fill Chris@1: discontinuous stream buffers; their pages simply 'fall out' of the Chris@1: stream when continuous streams are handled properly. Chris@1: Chris@1:
Buffering requirements in this design need not be explicitly Chris@1: declared or managed in the encoded stream. The decoder simply reads as Chris@1: much data as is necessary to keep all continuous stream types gapless Chris@1: and no more, with discontinuous data processed as it arrives in the Chris@1: continuous data. Buffering is implicitly optimal for the given Chris@1: stream. Because all pages of all data types are stamped with absolute Chris@1: timing information within the stream, inter-stream synchronization Chris@1: timing is always maintained without the need for explicitly declared Chris@1: buffer-ahead hinting. Chris@1: Chris@1:
Ogg does not replicate codec-specific metadata into the mux layer Chris@1: in an attempt to make the mux and codec layer implementations 'fully Chris@1: separable'. Things like specific timebase, keyframing strategy, frame Chris@1: duration, etc, do not appear in the Ogg container. The mux layer is, Chris@1: instead, expected to query a codec through a centralized interface, Chris@1: left to the implementation, for this data when it is needed. Chris@1: Chris@1:
Though modern design wisdom usually prefers to predict all possible Chris@1: needs of current and future codecs then embed these dependencies and Chris@1: the required metadata into the container itself, this strategy Chris@1: increases container specification complexity, fragility, and rigidity. Chris@1: The mux and codec code becomes more independent, but the Chris@1: specifications become logically less independent. A codec can't do Chris@1: what a container hasn't already provided for. Novel codecs are harder Chris@1: to support, and you can do fewer useful things with the ones you've Chris@1: already got (eg, try to make a good splitter without using any codecs. Chris@1: Such a splitter is limited to splitting at keyframes only, or building Chris@1: yet another new mechanism into the container layer to mark what frames Chris@1: to skip displaying). Chris@1: Chris@1:
Ogg's design goes the opposite direction, where the specification Chris@1: is to be as simple, easy to understand, and 'proofed' against novel Chris@1: codecs as possible. When an Ogg mux layer requires codec-specific Chris@1: information, it queries the codec (or a codec stub). This trades a Chris@1: more complex implementation for a simpler, more flexible Chris@1: specification. Chris@1: Chris@1:
The Ogg container itself does not define a metadata system for Chris@1: declaring the structure and interrelations between multiple media Chris@1: types in a muxed stream. That is, the Ogg container itself does not Chris@1: specify data like 'which steam is the subtitle stream?' or 'which Chris@1: video stream is the primary angle?'. This metadata still exists, but Chris@1: is stored by the Ogg container rather than being built into the Ogg Chris@1: container itself. Xiph specifies the 'Skeleton' metadata format for Ogg Chris@1: streams, but this decoupling of container and stream structure Chris@1: metadata means it is possible to use Ogg with any metadata Chris@1: specification without altering the container itself, or without stream Chris@1: structure metadata at all. Chris@1: Chris@1:
Every Ogg page is stamped with a 64 bit 'granule position' that Chris@1: serves as an absolute timestamp for mux and seeking. A few nifty Chris@1: little tricks are usually also embedded in the granpos state, but Chris@1: we'll leave those aside for the moment (strictly speaking, they're Chris@1: part of each codec's mapping, not Ogg). Chris@1: Chris@1:
As previously mentioned above, granule positions are mapped into Chris@1: absolute timestamps by the codec, rather than being a hard timestamp. Chris@1: This allows maximally efficient use of the available 64 bits to Chris@1: address every sample/frame position without approximation while Chris@1: supporting new and previously unknown timebase encodings without Chris@1: needing to extend or update the mux layer. When a codec needs a novel Chris@1: timebase, it simply brings the code for that mapping along with it. Chris@1: This is not a theoretical curiosity; new, wholly novel timebases were Chris@1: deployed with the adoption of both Theora and Dirac. "Rolling INTRA" Chris@1: (keyframeless video) also benefits from novel use of the granule Chris@1: position. Chris@1: Chris@1:
Ogg codecs place raw compressed data into packets. Chris@1: Packets are octet payloads containing the data needed for a single Chris@1: decompressed unit, eg, one video frame. Packets have no maximum size Chris@1: and may be zero length. They do not generally have any framing Chris@1: information; strung together, the unframed packets form a logical Chris@1: bitstream of codec data with no internal landmarks. Chris@1: Chris@1:
Packets of raw codec data are not typically internally framed. Chris@1: When they are strung together into a stream without any container to Chris@1: provide framing, they lose their individual boundaries. Seek and Chris@1: capture are not possible within an unframed stream, and for many Chris@1: codecs with variable length payloads and/or early-packet termination Chris@1: (such as Vorbis), it may become impossible to recover the original Chris@1: frame boundaries even if the stream is scanned linearly from Chris@1: beginning to end. Chris@1: Chris@1:
Logical bitstream packets are grouped and framed into Ogg pages Chris@1: along with a unique stream serial number to produce a Chris@1: physical bitstream. An elementary stream is a Chris@1: physical bitstream containing only a single logical bitstream. Each Chris@1: page is a self contained entity, although a packet may be split and Chris@1: encoded across one or more pages. The page decode mechanism is Chris@1: designed to recognize, verify and handle single pages at a time from Chris@1: the overall bitstream. Chris@1: Chris@1:
The primary purpose of a container is to provide framing for raw Chris@1: packets, marking the packet boundaries so the exact packets can be Chris@1: retrieved for decode later. The container also provides secondary Chris@1: functions such as capture, timestamping, sequencing, stream Chris@1: identification and so on. Not all of these functions are represented in the diagram. Chris@1: Chris@1:
In the Ogg container, pages do not necessarily contain Chris@1: integer numbers of packets. Packets may span across page boundaries Chris@1: or even multiple pages. This is necessary as pages have a maximum Chris@1: possible size in order to provide capture guarantees, but packet Chris@1: size is unbounded. Chris@1:
Ogg Bitstream Framing specifies Chris@1: the page format of an Ogg bitstream, the packet coding process Chris@1: and elementary bitstreams in detail. Chris@1: Chris@1:
Multiple logical/elementary bitstreams can be combined into a single Chris@1: multiplexed bitstream by interleaving whole pages from each Chris@1: contributing elementary stream in time order. The result is a single Chris@1: physical stream that multiplexes and frames multiple logical streams. Chris@1: Each logical stream is identified by the unique stream serial number Chris@1: stamped in its pages. A physical stream may include a 'meta-header' Chris@1: (such as the Ogg Skeleton) comprising its Chris@1: own Ogg page at the beginning of the physical stream. A decoder Chris@1: recovers the original logical/elementary bitstreams out of the Chris@1: physical bitstream by taking the pages in order from the physical Chris@1: bitstream and redirecting them into the appropriate logical decoding Chris@1: entity. Chris@1: Chris@1:
Multiple media types are mutliplexed into a single Ogg stream by Chris@1: interleaving the pages from each elementary physical stream. Chris@1: Chris@1:
Ogg Bitstream Multiplexing specifies Chris@1: proper multiplexing of an Ogg bitstream in detail. Chris@1: Chris@1:
Multiple Ogg physical bitstreams may be concatenated into a single new Chris@1: stream; this is chaining. The bitstreams do not overlap; the Chris@1: final page of a given logical bitstream is immediately followed by the Chris@1: initial page of the next.
Chris@1: Chris@1:Each logical bitstream in a chain must have a unique serial number Chris@1: within the scope of the full physical bitstream, not only within a Chris@1: particular link or segment of the chain.
Chris@1: Chris@1:Within Ogg, each stream must be declared (by the codec) to be Chris@1: continuous- or discontinuous-time. Most codecs treat all streams they Chris@1: use as either inherently continuous- or discontinuous-time, although Chris@1: this is not a requirement. A codec may, as part of its mapping, choose Chris@1: according to data in the initial header. Chris@1: Chris@1:
Continuous-time pages are stamped by end-time, discontinuous pages Chris@1: are stamped by begin-time. Pages in a multiplexed stream are Chris@1: interleaved in order of the time stamp regardless of stream type. Chris@1: Both continuous and discontinuous logical streams are used to seek Chris@1: within a physical stream, however only continuous streams are used to Chris@1: determine buffering depth; because discontinuous streams are stamped Chris@1: by start time, they will always 'fall out' at the proper time when Chris@1: buffering the continuous streams. See 'Examples' for an illustration Chris@1: of the buffering mechanism. Chris@1: Chris@1:
Multiplexing requirements within Ogg are straightforward. When Chris@1: constructing a single-link (unchained) physical bitstream consisting Chris@1: of multiple elementary streams: Chris@1: Chris@1:
The initial header for each stream appears in sequence, each Chris@1: header on a single page. All initial headers must appear with no Chris@1: intervening data (no auxiliary header pages or packets, no data pages Chris@1: or packets). Order of the initial headers is unspecified. The Chris@1: 'beginning of stream' flag is set on each initial header. Chris@1: Chris@1:
All auxiliary headers for all streams must follow. Order Chris@1: is unspecified. The final auxiliary header of each stream must flush Chris@1: its page. Chris@1: Chris@1:
Data pages for each stream follow, interleaved in time order. Chris@1: Chris@1:
The final page of each stream sets the 'end of stream' flag. Chris@1: Unlike initial pages, terminal pages for the logical bitstreams need Chris@1: not occur contiguously; indeed it may not be possible for them to do so. Chris@1:
Each grouped bitstream must have a unique serial number within the Chris@1: scope of the physical bitstream.
Chris@1: Chris@1:Multiplexed and/or unmultiplexed bitstreams may be chained Chris@1: consecutively. Such a physical bitstream obeys all the rules of both Chris@1: chained and multiplexed streams. Each link, when unchained, must Chris@1: stand on its own as a valid physical bitstream. Chained streams do Chris@1: not mix or interleave; a new segment may not begin until all streams Chris@1: in the preceding segment have terminated.
Chris@1: Chris@1:Each codec is allowed some freedom in deciding how its logical Chris@1: bitstream is encapsulated into an Ogg bitstream (even if it is a Chris@1: trivial mapping, eg, 'plop the packets in and go'). This is the Chris@1: codec's mapping. Ogg imposes a few mapping requirements Chris@1: on any codec. Chris@1: Chris@1:
The framing specification defines Chris@1: 'beginning of stream' and 'end of stream' page markers via a header Chris@1: flag (it is possible for a stream to consist of a single page). A Chris@1: correct stream always consists of an integer number of pages, an easy Chris@1: requirement given the variable size nature of pages.
Chris@1: Chris@1:The first page of an elementary Ogg bitstream consists of a single, Chris@1: small 'initial header' packet that must include sufficient information Chris@1: to identify the exact CODEC type. From this initial header, the codec Chris@1: must also be able to determine its timebase and whether or not it is a Chris@1: continuous- or discontinuous-time stream. The initial header must fit Chris@1: on a single page. If a codec makes use of auxiliary headers (for Chris@1: example, Vorbis uses two auxiliary headers), these headers must follow Chris@1: the initial header immediately. The last header finishes its page; Chris@1: data begins on a fresh page. Chris@1: Chris@1:
As an example, Ogg Vorbis places the name and revision of the Chris@1: Vorbis CODEC, the audio rate and the audio quality into this initial Chris@1: header. Vorbis comments and detailed codec setup appears in the larger Chris@1: auxiliary headers.
Chris@1: Chris@1:Granule positions must be translatable to an exact absolute Chris@1: time value. As described above, the mux layer is permitted to query a Chris@1: codec or codec stub plugin to perform this mapping. It is not Chris@1: necessary for an absolute time to be mappable into a single unique Chris@1: granule position value. Chris@1: Chris@1:
Codecs are not required to use a fixed duration-per-packet (for Chris@1: example, Vorbis does not). the mux layer is permitted to query a Chris@1: codec or codec stub plugin for the time duration of a packet. Chris@1: Chris@1:
Although an absolute time need not be translatable to a unique Chris@1: granule position, a codec must be able to determine the unique granule Chris@1: position of the current packet using the granule position of a Chris@1: preceeding packet. Chris@1: Chris@1:
Packets and pages must be arranged in ascending Chris@1: granule-position and time order. Chris@1: Chris@1:
Below, we present an example of a multiplexed and chained bitstream:
Chris@1: Chris@1:In this example, we see pages from five total logical bitstreams Chris@1: multiplexed into a physical bitstream. Note the following Chris@1: characteristics:
Chris@1: Chris@1: