cannam@86: cannam@86: cannam@86:
cannam@86: cannam@86: cannam@86:The low-level mechanisms of an Ogg stream (as described in the Ogg cannam@86: Bitstream Overview) provide means for mixing multiple logical streams cannam@86: and media types into a single linear-chronological stream. This cannam@86: document specifies the high-level arrangement and use of page cannam@86: structure to multiplex multiple streams of mixed media type within a cannam@86: physical Ogg stream.
cannam@86: cannam@86:The design and arrangement of the Ogg container format is governed by cannam@86: several high-level design decisions that form the reasoning behind cannam@86: specific low-level design decisions.
cannam@86: cannam@86:The Ogg bitstream is intended to encapsulate chronological, cannam@86: time-linear mixed media into a single delivery stream or file. The cannam@86: design is such that an application can always encode and/or decode a cannam@86: full-featured bitstream in one pass with no seeking and minimal cannam@86: buffering. Seeking to provide optimized encoding (such as two-pass cannam@86: encoding) or interactive decoding (such as scrubbing or instant cannam@86: replay) is not disallowed or discouraged, however no bitstream feature cannam@86: must require nonlinear operation on the bitstream.
cannam@86: cannam@86:Ogg bitstreams multiplex multiple logical streams into a single cannam@86: physical stream at the page level. Each page contains an abstract cannam@86: time stamp (the Granule Position) that represents an absolute time cannam@86: landmark within the stream. After the pages representing stream cannam@86: headers (all logical stream headers occur at the beginning of a cannam@86: physical bitstream section before any logical stream data), logical cannam@86: stream data pages are arranged in a physical bitstream in strict cannam@86: non-decreasing order by chronological absolute time as cannam@86: specified by the granule position.
cannam@86: cannam@86:The only exception to arranging pages in strictly ascending time order cannam@86: by granule position is those pages that do not set the granule cannam@86: position value. This is a special case when exceptionally large cannam@86: packets span multiple pages; the specifics of handling this special cannam@86: case are described later under 'Continuous and Discontinuous cannam@86: Streams'.
cannam@86: cannam@86:Ogg is designed to use an interpolated bisection search to cannam@86: implement exact positional seeking. Interpolated bisection search is cannam@86: a spec-mandated mechanism.
cannam@86: cannam@86:An index may improve objective performance, but it seldom cannam@86: improves subjective performance outside of a few high-latency use cannam@86: cases and adds no additional functionality as bisection search cannam@86: delivers the same functionality for both one- and two-pass stream cannam@86: types. For these reasons, use of indexes is discouraged, except in cannam@86: cases where an index provides demonstrable and noticable performance cannam@86: improvement.
cannam@86: cannam@86:Seek operations are by absolute time; a direct bisection search must cannam@86: find the exact time position requested. Information in the Ogg cannam@86: bitstream is arranged such that all information to be presented for cannam@86: playback from the desired seek point will occur at or after the cannam@86: desired seek point. Seek operations are neither 'fuzzy' nor cannam@86: heuristic.
cannam@86: cannam@86:Although key frame handling in video appears to be an exception to cannam@86: "all needed playback information lies ahead of a given seek", cannam@86: key frames can still be handled directly within this indexless cannam@86: framework. Seeking to a key frame in video (as well as seeking in other cannam@86: media types with analogous restraints) is handled as two seeks; first cannam@86: a seek to the desired time which extracts state information that cannam@86: decodes to the time of the last key frame, followed by a second seek cannam@86: directly to the key frame. The location of the previous key frame is cannam@86: embedded as state information in the granulepos; this mechanism is cannam@86: described in more detail later.
cannam@86: cannam@86:Logical streams within a physical Ogg stream belong to one of two cannam@86: categories, "Continuous" streams and "Discontinuous" streams. cannam@86: Although these are discussed in more detail later, the distinction is cannam@86: important to a high-level understanding of how to buffer an Ogg cannam@86: stream.
cannam@86: cannam@86:A stream that provides a gapless, time-continuous media type with a cannam@86: fine-grained timebase is considered to be 'Continuous'. A continuous cannam@86: stream should never be starved of data. Clear examples of continuous cannam@86: data types include broadcast audio and video.
cannam@86: cannam@86:A stream that delivers data in a potentially irregular pattern or with cannam@86: widely spaced timing gaps is considered to be 'Discontinuous'. A cannam@86: discontinuous stream may be best thought of as data representing cannam@86: scattered events; although they happen in order, they are typically cannam@86: unconnected data often located far apart. One possible example of a cannam@86: discontinuous stream types would be captioning. Although it's cannam@86: possible to design captions as a continuous stream type, it's most cannam@86: natural to think of captions as widely spaced pieces of text with cannam@86: little happening between.
cannam@86: cannam@86:The fundamental design distinction between continuous and cannam@86: discontinuous streams concerns buffering.
cannam@86: cannam@86:Because a continuous stream is, by definition, gapless, Ogg buffering cannam@86: is based on the simple premise of never allowing any active continuous cannam@86: stream to starve for data during decode; buffering proceeds ahead cannam@86: until all continuous streams in a physical stream have data ready to cannam@86: decode on demand.
cannam@86: cannam@86:Discontinuous stream data may occur on a fairly regular basis, but the cannam@86: timing of, for example, a specific caption is impossible to predict cannam@86: with certainty in most captioning systems. Thus the buffering system cannam@86: should take discontinuous data 'as it comes' rather than working ahead cannam@86: (for a potentially unbounded period) to look for future discontinuous cannam@86: data. As such, discontinuous streams are ignored when managing cannam@86: buffering; their pages simply 'fall out' of the stream when continuous cannam@86: streams are handled properly.
cannam@86: cannam@86:Buffering requirements need not be explicitly declared or managed for cannam@86: the encoded stream; the decoder simply reads as much data as is cannam@86: necessary to keep all continuous stream types gapless (also ensuring cannam@86: discontinuous data arrives in time) and no more, resulting in optimum cannam@86: implicit buffer usage for a given stream. Because all pages of all cannam@86: data types are stamped with absolute timing information within the cannam@86: stream, inter-stream synchronization timing is always explicitly cannam@86: maintained without the need for explicitly declared buffer-ahead cannam@86: hinting.
cannam@86: cannam@86:Further details, mechanisms and reasons for the differing arrangement cannam@86: and behavior of continuous and discontinuous streams is discussed cannam@86: later.
cannam@86: cannam@86:Ogg is designed so that the simplest navigation operations treat the cannam@86: physical Ogg stream as a whole summary of its streams, rather than cannam@86: navigating each interleaved stream as a separate entity.
cannam@86: cannam@86:First Example: seeking to a desired time position in a multiplexed (or cannam@86: unmultiplexed) Ogg stream can be accomplished through a bisection cannam@86: search on time position of all pages in the stream (as encoded in the cannam@86: granule position). More powerful searches (such as a key frame-aware cannam@86: seek within video) are also possible with additional search cannam@86: complexity, but similar computational complexity.
cannam@86: cannam@86:Second Example: A bitstream section may consist of three multiplexed cannam@86: streams of differing lengths. The result of multiplexing these cannam@86: streams should be thought of as a single mixed stream with a length cannam@86: equal to the longest of the three component streams. Although it is cannam@86: also possible to think of the multiplexed results as three concurrent cannam@86: streams of different lengths and it is possible to recover the three cannam@86: original streams, it will also become obvious that once multiplexed, cannam@86: it isn't possible to find the internal lengths of the component cannam@86: streams without a linear search of the whole bitstream section. cannam@86: However, it is possible to find the length of the whole bitstream cannam@86: section easily (in near-constant time per section) just as it is for a cannam@86: single-media unmultiplexed stream.
cannam@86: cannam@86:The Granule Position is a signed 64 bit field appearing in the header cannam@86: of every Ogg page. Although the granule position represents absolute cannam@86: time within a logical stream, its value does not necessarily directly cannam@86: encode a simple timestamp. It may represent frames elapsed (as in cannam@86: Vorbis), a simple timestamp, or a more complex bit-division encoding cannam@86: (such as in Theora). The exact encoding of the granule position is up cannam@86: to a specific codec.
cannam@86: cannam@86:The granule position is governed by the following rules:
cannam@86: cannam@86:In general, a codec/stream type should choose the simplest granule cannam@86: position encoding that addresses its requirements. The examples here cannam@86: are by no means exhaustive of the possibilities within Ogg.
cannam@86: cannam@86:A simple granule position could encode a timestamp directly. For cannam@86: example, a granule position that encoded milliseconds from beginning cannam@86: of stream would allow a logical stream length of over 100,000,000,000 cannam@86: days before beginning a new logical stream (to avoid the granule cannam@86: position wrapping).
cannam@86: cannam@86:A simple millisecond timestamp granule encoding might suit many stream cannam@86: types, but a millisecond resolution is inappropriate to, eg, most cannam@86: audio encodings where exact single-sample resolution is generally a cannam@86: requirement. A millisecond is both too large a granule and often does cannam@86: not represent an integer number of samples.
cannam@86: cannam@86:In the event that audio frames are always encoded as the same number of cannam@86: samples, the granule position could simply be a linear count of frames cannam@86: since beginning of stream. This has the advantages of being exact and cannam@86: efficient. Position in time would simply be [granule_position] * cannam@86: [samples_per_frame] / [samples_per_second].
cannam@86: cannam@86:Frame counting is insufficient in codecs such as Vorbis where an audio cannam@86: frame [packet] encodes a variable number of samples. In Vorbis's cannam@86: case, the granule position is a count of the number of raw samples cannam@86: from the beginning of stream; the absolute time of cannam@86: a granule position is [granule_position] / cannam@86: [samples_per_second].
cannam@86: cannam@86:Some video codecs may be able to use the simple framestamp scheme for cannam@86: granule position. However, most modern video codecs introduce at cannam@86: least the following complications:
cannam@86: cannam@86:The first two points can be handled straightforwardly via the fact cannam@86: that the codec has complete control mapping granule position to cannam@86: absolute time; non-integer frame rates and offsets can be set in the cannam@86: codec's initial header, and the rest is just arithmetic.
cannam@86: cannam@86:The third point appears trickier at first glance, but it too can be cannam@86: handled through the granule position mapping mechanism. Here we cannam@86: arrange the granule position in such a way that granule positions of cannam@86: key frames are easy to find. Divide the granule position into two cannam@86: fields; the most-significant bits are an absolute frame counter, but cannam@86: it's only updated at each key frame. The least significant bits encode cannam@86: the number of frames since the last key frame. In this way, each cannam@86: granule position both encodes the absolute time of the current frame cannam@86: as well as the absolute time of the last key frame.
cannam@86: cannam@86:Seeking to a most recent preceding key frame is then accomplished by cannam@86: first seeking to the original desired point, inspecting the granulepos cannam@86: of the resulting video page, extracting from that granulepos the cannam@86: absolute time of the desired key frame, and then seeking directly to cannam@86: that key frame's page. Of course, it's still possible for an cannam@86: application to ignore key frames and use a simpler seeking algorithm cannam@86: (decode would be unable to present decoded video until the next cannam@86: key frame). Surprisingly many player applications do choose the cannam@86: simpler approach.
cannam@86: cannam@86:Although each packet of data in a logical stream theoretically has a cannam@86: specific granule position, only one granule position is encoded cannam@86: per page. It is possible to encode a logical stream such that each cannam@86: page contains only a single packet (so that granule positions are cannam@86: preserved for each packet), however a one-to-one packet/page mapping cannam@86: is not intended to be the general case.
cannam@86: cannam@86:Because Ogg functions at the page, not packet, level, this cannam@86: once-per-page time information provides Ogg with the finest-grained cannam@86: time information is can use. Ogg passes this granule positioning data cannam@86: to the codec (along with the packets extracted from a page); it is the cannam@86: responsibility of codecs to track timing information at granularities cannam@86: finer than a single page.
cannam@86: cannam@86:A granule position represents the instantaneous time location cannam@86: between two pages. However, continuous streams and discontinuous cannam@86: streams differ on whether the granulepos represents the end-time of cannam@86: the data on a page or the start-time. Continuous streams are cannam@86: 'end-time' encoded; the granulepos represents the point in time cannam@86: immediately after the last data decoded from a page. Discontinuous cannam@86: streams are 'start-time' encoded; the granulepos represents the point cannam@86: in time of the first data decoded from the page.
cannam@86: cannam@86:An Ogg stream type is declared continuous or discontinuous by its cannam@86: codec. A given codec may support both continuous and discontinuous cannam@86: operation so long as any given logical stream is continuous or cannam@86: discontinuous for its entirety and the codec is able to ascertain (and cannam@86: inform the Ogg layer) as to which after decoding the initial stream cannam@86: header. The majority of codecs will always be continuous (such as cannam@86: Vorbis) or discontinuous (such as Writ).
cannam@86: cannam@86:Start- and end-time encoding do not affect multiplexing sort-order; cannam@86: pages are still sorted by the absolute time a given granulepos maps to cannam@86: regardless of whether that granulepos represents start- or cannam@86: end-time.
cannam@86: cannam@86:The Ogg multiplex/demultiplex layer provides mechanisms for encoding cannam@86: raw packets into Ogg pages, decoding Ogg pages back into the original cannam@86: codec packets, determining the logical structure of an Ogg stream, and cannam@86: navigating through and synchronizing with an Ogg stream at a desired cannam@86: stream location. Strict multiplex/demultiplex operations are entirely cannam@86: in the Ogg domain and require no intervention from codecs.
cannam@86: cannam@86:Implementation of more complex operations does require codec cannam@86: knowledge, however. Unlike other framing systems, Ogg maintains cannam@86: strict separation between framing and the framed bitstream data; Ogg cannam@86: does not replicate codec-specific information in the page/framing cannam@86: data, nor does Ogg blur the line between framing and stream cannam@86: data/metadata. Because Ogg is fully data-agnostic toward the data it cannam@86: frames, operations which require specifics of bitstream data (such as cannam@86: 'seek to key frame') also require interaction with the codec layer cannam@86: (because, in this example, the Ogg layer is not aware of the concept cannam@86: of key frames). This is different from systems that blur the cannam@86: separation between framing and stream data in order to simplify the cannam@86: separation of code. The Ogg system purposely keeps the distinction in cannam@86: data simple so that later codec innovations are not constrained by cannam@86: framing design.
cannam@86: cannam@86:For this reason, however, complex seeking operations require cannam@86: interaction with the codecs in order to decode the granule position of cannam@86: a given stream type back to absolute time or in order to find cannam@86: 'decodable points' such as key frames in video.
cannam@86: cannam@86:flushes around key frames? RFC suggestion: repaginating or building a cannam@86: stream this way is nice but not required
cannam@86: cannam@86: