Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Ogg Documentation Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: Chris@1:

Page Multiplexing and Ordering in a Physical Ogg Stream

Chris@1: Chris@1:

The low-level mechanisms of an Ogg stream (as described in the Ogg Chris@1: Bitstream Overview) provide means for mixing multiple logical streams Chris@1: and media types into a single linear-chronological stream. This Chris@1: document specifies the high-level arrangement and use of page Chris@1: structure to multiplex multiple streams of mixed media type within a Chris@1: physical Ogg stream.

Chris@1: Chris@1:

Design Elements

Chris@1: Chris@1:

The design and arrangement of the Ogg container format is governed by Chris@1: several high-level design decisions that form the reasoning behind Chris@1: specific low-level design decisions.

Chris@1: Chris@1:

Linear media

Chris@1: Chris@1:

The Ogg bitstream is intended to encapsulate chronological, Chris@1: time-linear mixed media into a single delivery stream or file. The Chris@1: design is such that an application can always encode and/or decode a Chris@1: full-featured bitstream in one pass with no seeking and minimal Chris@1: buffering. Seeking to provide optimized encoding (such as two-pass Chris@1: encoding) or interactive decoding (such as scrubbing or instant Chris@1: replay) is not disallowed or discouraged, however no bitstream feature Chris@1: must require nonlinear operation on the bitstream.

Chris@1: Chris@1:

Multiplexing

Chris@1: Chris@1:

Ogg bitstreams multiplex multiple logical streams into a single Chris@1: physical stream at the page level. Each page contains an abstract Chris@1: time stamp (the Granule Position) that represents an absolute time Chris@1: landmark within the stream. After the pages representing stream Chris@1: headers (all logical stream headers occur at the beginning of a Chris@1: physical bitstream section before any logical stream data), logical Chris@1: stream data pages are arranged in a physical bitstream in strict Chris@1: non-decreasing order by chronological absolute time as Chris@1: specified by the granule position.

Chris@1: Chris@1:

The only exception to arranging pages in strictly ascending time order Chris@1: by granule position is those pages that do not set the granule Chris@1: position value. This is a special case when exceptionally large Chris@1: packets span multiple pages; the specifics of handling this special Chris@1: case are described later under 'Continuous and Discontinuous Chris@1: Streams'.

Chris@1: Chris@1:

Seeking

Chris@1: Chris@1:

Ogg is designed to use an interpolated bisection search to Chris@1: implement exact positional seeking. Interpolated bisection search is Chris@1: a spec-mandated mechanism.

Chris@1: Chris@1:

An index may improve objective performance, but it seldom Chris@1: improves subjective performance outside of a few high-latency use Chris@1: cases and adds no additional functionality as bisection search Chris@1: delivers the same functionality for both one- and two-pass stream Chris@1: types. For these reasons, use of indexes is discouraged, except in Chris@1: cases where an index provides demonstrable and noticable performance Chris@1: improvement.

Chris@1: Chris@1:

Seek operations are by absolute time; a direct bisection search must Chris@1: find the exact time position requested. Information in the Ogg Chris@1: bitstream is arranged such that all information to be presented for Chris@1: playback from the desired seek point will occur at or after the Chris@1: desired seek point. Seek operations are neither 'fuzzy' nor Chris@1: heuristic.

Chris@1: Chris@1:

Although key frame handling in video appears to be an exception to Chris@1: "all needed playback information lies ahead of a given seek", Chris@1: key frames can still be handled directly within this indexless Chris@1: framework. Seeking to a key frame in video (as well as seeking in other Chris@1: media types with analogous restraints) is handled as two seeks; first Chris@1: a seek to the desired time which extracts state information that Chris@1: decodes to the time of the last key frame, followed by a second seek Chris@1: directly to the key frame. The location of the previous key frame is Chris@1: embedded as state information in the granulepos; this mechanism is Chris@1: described in more detail later.

Chris@1: Chris@1:

Continuous and Discontinuous Streams

Chris@1: Chris@1:

Logical streams within a physical Ogg stream belong to one of two Chris@1: categories, "Continuous" streams and "Discontinuous" streams. Chris@1: Although these are discussed in more detail later, the distinction is Chris@1: important to a high-level understanding of how to buffer an Ogg Chris@1: stream.

Chris@1: Chris@1:

A stream that provides a gapless, time-continuous media type with a Chris@1: fine-grained timebase is considered to be 'Continuous'. A continuous Chris@1: stream should never be starved of data. Clear examples of continuous Chris@1: data types include broadcast audio and video.

Chris@1: Chris@1:

A stream that delivers data in a potentially irregular pattern or with Chris@1: widely spaced timing gaps is considered to be 'Discontinuous'. A Chris@1: discontinuous stream may be best thought of as data representing Chris@1: scattered events; although they happen in order, they are typically Chris@1: unconnected data often located far apart. One possible example of a Chris@1: discontinuous stream types would be captioning. Although it's Chris@1: possible to design captions as a continuous stream type, it's most Chris@1: natural to think of captions as widely spaced pieces of text with Chris@1: little happening between.

Chris@1: Chris@1:

The fundamental design distinction between continuous and Chris@1: discontinuous streams concerns buffering.

Chris@1: Chris@1:

Buffering

Chris@1: Chris@1:

Because a continuous stream is, by definition, gapless, Ogg buffering Chris@1: is based on the simple premise of never allowing any active continuous Chris@1: stream to starve for data during decode; buffering proceeds ahead Chris@1: until all continuous streams in a physical stream have data ready to Chris@1: decode on demand.

Chris@1: Chris@1:

Discontinuous stream data may occur on a fairly regular basis, but the Chris@1: timing of, for example, a specific caption is impossible to predict Chris@1: with certainty in most captioning systems. Thus the buffering system Chris@1: should take discontinuous data 'as it comes' rather than working ahead Chris@1: (for a potentially unbounded period) to look for future discontinuous Chris@1: data. As such, discontinuous streams are ignored when managing Chris@1: buffering; their pages simply 'fall out' of the stream when continuous Chris@1: streams are handled properly.

Chris@1: Chris@1:

Buffering requirements need not be explicitly declared or managed for Chris@1: the encoded stream; the decoder simply reads as much data as is Chris@1: necessary to keep all continuous stream types gapless (also ensuring Chris@1: discontinuous data arrives in time) and no more, resulting in optimum Chris@1: implicit buffer usage for a given stream. Because all pages of all Chris@1: data types are stamped with absolute timing information within the Chris@1: stream, inter-stream synchronization timing is always explicitly Chris@1: maintained without the need for explicitly declared buffer-ahead Chris@1: hinting.

Chris@1: Chris@1:

Further details, mechanisms and reasons for the differing arrangement Chris@1: and behavior of continuous and discontinuous streams is discussed Chris@1: later.

Chris@1: Chris@1:

Whole-stream navigation

Chris@1: Chris@1:

Ogg is designed so that the simplest navigation operations treat the Chris@1: physical Ogg stream as a whole summary of its streams, rather than Chris@1: navigating each interleaved stream as a separate entity.

Chris@1: Chris@1:

First Example: seeking to a desired time position in a multiplexed (or Chris@1: unmultiplexed) Ogg stream can be accomplished through a bisection Chris@1: search on time position of all pages in the stream (as encoded in the Chris@1: granule position). More powerful searches (such as a key frame-aware Chris@1: seek within video) are also possible with additional search Chris@1: complexity, but similar computational complexity.

Chris@1: Chris@1:

Second Example: A bitstream section may consist of three multiplexed Chris@1: streams of differing lengths. The result of multiplexing these Chris@1: streams should be thought of as a single mixed stream with a length Chris@1: equal to the longest of the three component streams. Although it is Chris@1: also possible to think of the multiplexed results as three concurrent Chris@1: streams of different lengths and it is possible to recover the three Chris@1: original streams, it will also become obvious that once multiplexed, Chris@1: it isn't possible to find the internal lengths of the component Chris@1: streams without a linear search of the whole bitstream section. Chris@1: However, it is possible to find the length of the whole bitstream Chris@1: section easily (in near-constant time per section) just as it is for a Chris@1: single-media unmultiplexed stream.

Chris@1: Chris@1:

Granule Position

Chris@1: Chris@1:

Description

Chris@1: Chris@1:

The Granule Position is a signed 64 bit field appearing in the header Chris@1: of every Ogg page. Although the granule position represents absolute Chris@1: time within a logical stream, its value does not necessarily directly Chris@1: encode a simple timestamp. It may represent frames elapsed (as in Chris@1: Vorbis), a simple timestamp, or a more complex bit-division encoding Chris@1: (such as in Theora). The exact encoding of the granule position is up Chris@1: to a specific codec.

Chris@1: Chris@1:

The granule position is governed by the following rules:

Chris@1: Chris@1: Chris@1: Chris@1:

Example: timestamp

Chris@1: Chris@1:

In general, a codec/stream type should choose the simplest granule Chris@1: position encoding that addresses its requirements. The examples here Chris@1: are by no means exhaustive of the possibilities within Ogg.

Chris@1: Chris@1:

A simple granule position could encode a timestamp directly. For Chris@1: example, a granule position that encoded milliseconds from beginning Chris@1: of stream would allow a logical stream length of over 100,000,000,000 Chris@1: days before beginning a new logical stream (to avoid the granule Chris@1: position wrapping).

Chris@1: Chris@1:

Example: framestamp

Chris@1: Chris@1:

A simple millisecond timestamp granule encoding might suit many stream Chris@1: types, but a millisecond resolution is inappropriate to, eg, most Chris@1: audio encodings where exact single-sample resolution is generally a Chris@1: requirement. A millisecond is both too large a granule and often does Chris@1: not represent an integer number of samples.

Chris@1: Chris@1:

In the event that audio frames are always encoded as the same number of Chris@1: samples, the granule position could simply be a linear count of frames Chris@1: since beginning of stream. This has the advantages of being exact and Chris@1: efficient. Position in time would simply be [granule_position] * Chris@1: [samples_per_frame] / [samples_per_second].

Chris@1: Chris@1:

Example: samplestamp (Vorbis)

Chris@1: Chris@1:

Frame counting is insufficient in codecs such as Vorbis where an audio Chris@1: frame [packet] encodes a variable number of samples. In Vorbis's Chris@1: case, the granule position is a count of the number of raw samples Chris@1: from the beginning of stream; the absolute time of Chris@1: a granule position is [granule_position] / Chris@1: [samples_per_second].

Chris@1: Chris@1:

Example: bit-divided framestamp (Theora)

Chris@1: Chris@1:

Some video codecs may be able to use the simple framestamp scheme for Chris@1: granule position. However, most modern video codecs introduce at Chris@1: least the following complications:

Chris@1: Chris@1: Chris@1: Chris@1:

The first two points can be handled straightforwardly via the fact Chris@1: that the codec has complete control mapping granule position to Chris@1: absolute time; non-integer frame rates and offsets can be set in the Chris@1: codec's initial header, and the rest is just arithmetic.

Chris@1: Chris@1:

The third point appears trickier at first glance, but it too can be Chris@1: handled through the granule position mapping mechanism. Here we Chris@1: arrange the granule position in such a way that granule positions of Chris@1: key frames are easy to find. Divide the granule position into two Chris@1: fields; the most-significant bits are an absolute frame counter, but Chris@1: it's only updated at each key frame. The least significant bits encode Chris@1: the number of frames since the last key frame. In this way, each Chris@1: granule position both encodes the absolute time of the current frame Chris@1: as well as the absolute time of the last key frame.

Chris@1: Chris@1:

Seeking to a most recent preceding key frame is then accomplished by Chris@1: first seeking to the original desired point, inspecting the granulepos Chris@1: of the resulting video page, extracting from that granulepos the Chris@1: absolute time of the desired key frame, and then seeking directly to Chris@1: that key frame's page. Of course, it's still possible for an Chris@1: application to ignore key frames and use a simpler seeking algorithm Chris@1: (decode would be unable to present decoded video until the next Chris@1: key frame). Surprisingly many player applications do choose the Chris@1: simpler approach.

Chris@1: Chris@1:

granule position, packets and pages

Chris@1: Chris@1:

Although each packet of data in a logical stream theoretically has a Chris@1: specific granule position, only one granule position is encoded Chris@1: per page. It is possible to encode a logical stream such that each Chris@1: page contains only a single packet (so that granule positions are Chris@1: preserved for each packet), however a one-to-one packet/page mapping Chris@1: is not intended to be the general case.

Chris@1: Chris@1:

Because Ogg functions at the page, not packet, level, this Chris@1: once-per-page time information provides Ogg with the finest-grained Chris@1: time information is can use. Ogg passes this granule positioning data Chris@1: to the codec (along with the packets extracted from a page); it is the Chris@1: responsibility of codecs to track timing information at granularities Chris@1: finer than a single page.

Chris@1: Chris@1:

start-time and end-time positioning

Chris@1: Chris@1:

A granule position represents the instantaneous time location Chris@1: between two pages. However, continuous streams and discontinuous Chris@1: streams differ on whether the granulepos represents the end-time of Chris@1: the data on a page or the start-time. Continuous streams are Chris@1: 'end-time' encoded; the granulepos represents the point in time Chris@1: immediately after the last data decoded from a page. Discontinuous Chris@1: streams are 'start-time' encoded; the granulepos represents the point Chris@1: in time of the first data decoded from the page.

Chris@1: Chris@1:

An Ogg stream type is declared continuous or discontinuous by its Chris@1: codec. A given codec may support both continuous and discontinuous Chris@1: operation so long as any given logical stream is continuous or Chris@1: discontinuous for its entirety and the codec is able to ascertain (and Chris@1: inform the Ogg layer) as to which after decoding the initial stream Chris@1: header. The majority of codecs will always be continuous (such as Chris@1: Vorbis) or discontinuous (such as Writ).

Chris@1: Chris@1:

Start- and end-time encoding do not affect multiplexing sort-order; Chris@1: pages are still sorted by the absolute time a given granulepos maps to Chris@1: regardless of whether that granulepos represents start- or Chris@1: end-time.

Chris@1: Chris@1:

Multiplex/Demultiplex Division of Labor

Chris@1: Chris@1:

The Ogg multiplex/demultiplex layer provides mechanisms for encoding Chris@1: raw packets into Ogg pages, decoding Ogg pages back into the original Chris@1: codec packets, determining the logical structure of an Ogg stream, and Chris@1: navigating through and synchronizing with an Ogg stream at a desired Chris@1: stream location. Strict multiplex/demultiplex operations are entirely Chris@1: in the Ogg domain and require no intervention from codecs.

Chris@1: Chris@1:

Implementation of more complex operations does require codec Chris@1: knowledge, however. Unlike other framing systems, Ogg maintains Chris@1: strict separation between framing and the framed bitstream data; Ogg Chris@1: does not replicate codec-specific information in the page/framing Chris@1: data, nor does Ogg blur the line between framing and stream Chris@1: data/metadata. Because Ogg is fully data-agnostic toward the data it Chris@1: frames, operations which require specifics of bitstream data (such as Chris@1: 'seek to key frame') also require interaction with the codec layer Chris@1: (because, in this example, the Ogg layer is not aware of the concept Chris@1: of key frames). This is different from systems that blur the Chris@1: separation between framing and stream data in order to simplify the Chris@1: separation of code. The Ogg system purposely keeps the distinction in Chris@1: data simple so that later codec innovations are not constrained by Chris@1: framing design.

Chris@1: Chris@1:

For this reason, however, complex seeking operations require Chris@1: interaction with the codecs in order to decode the granule position of Chris@1: a given stream type back to absolute time or in order to find Chris@1: 'decodable points' such as key frames in video.

Chris@1: Chris@1:

Unsorted Discussion Points

Chris@1: Chris@1:

flushes around key frames? RFC suggestion: repaginating or building a Chris@1: stream this way is nice but not required

Chris@1: Chris@1:

Appendix A: multiplexing examples

Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: