annotate src/libogg-1.3.0/doc/ogg-multiplex.html @ 1:05aa0afa9217

Bring in flac, ogg, vorbis
author Chris Cannam
date Tue, 19 Mar 2013 17:37:49 +0000
parents
children
rev   line source
Chris@1 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Chris@1 2 <html>
Chris@1 3 <head>
Chris@1 4
Chris@1 5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
Chris@1 6 <title>Ogg Documentation</title>
Chris@1 7
Chris@1 8 <style type="text/css">
Chris@1 9 body {
Chris@1 10 margin: 0 18px 0 18px;
Chris@1 11 padding-bottom: 30px;
Chris@1 12 font-family: Verdana, Arial, Helvetica, sans-serif;
Chris@1 13 color: #333333;
Chris@1 14 font-size: .8em;
Chris@1 15 }
Chris@1 16
Chris@1 17 a {
Chris@1 18 color: #3366cc;
Chris@1 19 }
Chris@1 20
Chris@1 21 img {
Chris@1 22 border: 0;
Chris@1 23 }
Chris@1 24
Chris@1 25 #xiphlogo {
Chris@1 26 margin: 30px 0 16px 0;
Chris@1 27 }
Chris@1 28
Chris@1 29 #content p {
Chris@1 30 line-height: 1.4;
Chris@1 31 }
Chris@1 32
Chris@1 33 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
Chris@1 34 font-weight: bold;
Chris@1 35 color: #ff9900;
Chris@1 36 margin: 1.3em 0 8px 0;
Chris@1 37 }
Chris@1 38
Chris@1 39 h1 {
Chris@1 40 font-size: 1.3em;
Chris@1 41 }
Chris@1 42
Chris@1 43 h2 {
Chris@1 44 font-size: 1.2em;
Chris@1 45 }
Chris@1 46
Chris@1 47 h3 {
Chris@1 48 font-size: 1.1em;
Chris@1 49 }
Chris@1 50
Chris@1 51 li {
Chris@1 52 line-height: 1.4;
Chris@1 53 }
Chris@1 54
Chris@1 55 #copyright {
Chris@1 56 margin-top: 30px;
Chris@1 57 line-height: 1.5em;
Chris@1 58 text-align: center;
Chris@1 59 font-size: .8em;
Chris@1 60 color: #888888;
Chris@1 61 clear: both;
Chris@1 62 }
Chris@1 63 </style>
Chris@1 64
Chris@1 65 </head>
Chris@1 66
Chris@1 67 <body>
Chris@1 68
Chris@1 69 <div id="xiphlogo">
Chris@1 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
Chris@1 71 </div>
Chris@1 72
Chris@1 73 <h1>Page Multiplexing and Ordering in a Physical Ogg Stream</h1>
Chris@1 74
Chris@1 75 <p>The low-level mechanisms of an Ogg stream (as described in the Ogg
Chris@1 76 Bitstream Overview) provide means for mixing multiple logical streams
Chris@1 77 and media types into a single linear-chronological stream. This
Chris@1 78 document specifies the high-level arrangement and use of page
Chris@1 79 structure to multiplex multiple streams of mixed media type within a
Chris@1 80 physical Ogg stream.</p>
Chris@1 81
Chris@1 82 <h2>Design Elements</h2>
Chris@1 83
Chris@1 84 <p>The design and arrangement of the Ogg container format is governed by
Chris@1 85 several high-level design decisions that form the reasoning behind
Chris@1 86 specific low-level design decisions.</p>
Chris@1 87
Chris@1 88 <h3>Linear media</h3>
Chris@1 89
Chris@1 90 <p>The Ogg bitstream is intended to encapsulate chronological,
Chris@1 91 time-linear mixed media into a single delivery stream or file. The
Chris@1 92 design is such that an application can always encode and/or decode a
Chris@1 93 full-featured bitstream in one pass with no seeking and minimal
Chris@1 94 buffering. Seeking to provide optimized encoding (such as two-pass
Chris@1 95 encoding) or interactive decoding (such as scrubbing or instant
Chris@1 96 replay) is not disallowed or discouraged, however no bitstream feature
Chris@1 97 must require nonlinear operation on the bitstream.</p>
Chris@1 98
Chris@1 99 <h3>Multiplexing</h3>
Chris@1 100
Chris@1 101 <p>Ogg bitstreams multiplex multiple logical streams into a single
Chris@1 102 physical stream at the page level. Each page contains an abstract
Chris@1 103 time stamp (the Granule Position) that represents an absolute time
Chris@1 104 landmark within the stream. After the pages representing stream
Chris@1 105 headers (all logical stream headers occur at the beginning of a
Chris@1 106 physical bitstream section before any logical stream data), logical
Chris@1 107 stream data pages are arranged in a physical bitstream in strict
Chris@1 108 non-decreasing order by chronological absolute time as
Chris@1 109 specified by the granule position.</p>
Chris@1 110
Chris@1 111 <p>The only exception to arranging pages in strictly ascending time order
Chris@1 112 by granule position is those pages that do not set the granule
Chris@1 113 position value. This is a special case when exceptionally large
Chris@1 114 packets span multiple pages; the specifics of handling this special
Chris@1 115 case are described later under 'Continuous and Discontinuous
Chris@1 116 Streams'.</p>
Chris@1 117
Chris@1 118 <h3>Seeking</h3>
Chris@1 119
Chris@1 120 <p>Ogg is designed to use an interpolated bisection search to
Chris@1 121 implement exact positional seeking. Interpolated bisection search is
Chris@1 122 a spec-mandated mechanism.</p>
Chris@1 123
Chris@1 124 <p><i>An index may improve objective performance, but it seldom
Chris@1 125 improves subjective performance outside of a few high-latency use
Chris@1 126 cases and adds no additional functionality as bisection search
Chris@1 127 delivers the same functionality for both one- and two-pass stream
Chris@1 128 types. For these reasons, use of indexes is discouraged, except in
Chris@1 129 cases where an index provides demonstrable and noticable performance
Chris@1 130 improvement.</i></p>
Chris@1 131
Chris@1 132 <p>Seek operations are by absolute time; a direct bisection search must
Chris@1 133 find the exact time position requested. Information in the Ogg
Chris@1 134 bitstream is arranged such that all information to be presented for
Chris@1 135 playback from the desired seek point will occur at or after the
Chris@1 136 desired seek point. Seek operations are neither 'fuzzy' nor
Chris@1 137 heuristic.</p>
Chris@1 138
Chris@1 139 <p><i>Although key frame handling in video appears to be an exception to
Chris@1 140 "all needed playback information lies ahead of a given seek",
Chris@1 141 key frames can still be handled directly within this indexless
Chris@1 142 framework. Seeking to a key frame in video (as well as seeking in other
Chris@1 143 media types with analogous restraints) is handled as two seeks; first
Chris@1 144 a seek to the desired time which extracts state information that
Chris@1 145 decodes to the time of the last key frame, followed by a second seek
Chris@1 146 directly to the key frame. The location of the previous key frame is
Chris@1 147 embedded as state information in the granulepos; this mechanism is
Chris@1 148 described in more detail later.</i></p>
Chris@1 149
Chris@1 150 <h3>Continuous and Discontinuous Streams</h3>
Chris@1 151
Chris@1 152 <p>Logical streams within a physical Ogg stream belong to one of two
Chris@1 153 categories, "Continuous" streams and "Discontinuous" streams.
Chris@1 154 Although these are discussed in more detail later, the distinction is
Chris@1 155 important to a high-level understanding of how to buffer an Ogg
Chris@1 156 stream.</p>
Chris@1 157
Chris@1 158 <p>A stream that provides a gapless, time-continuous media type with a
Chris@1 159 fine-grained timebase is considered to be 'Continuous'. A continuous
Chris@1 160 stream should never be starved of data. Clear examples of continuous
Chris@1 161 data types include broadcast audio and video.</p>
Chris@1 162
Chris@1 163 <p>A stream that delivers data in a potentially irregular pattern or with
Chris@1 164 widely spaced timing gaps is considered to be 'Discontinuous'. A
Chris@1 165 discontinuous stream may be best thought of as data representing
Chris@1 166 scattered events; although they happen in order, they are typically
Chris@1 167 unconnected data often located far apart. One possible example of a
Chris@1 168 discontinuous stream types would be captioning. Although it's
Chris@1 169 possible to design captions as a continuous stream type, it's most
Chris@1 170 natural to think of captions as widely spaced pieces of text with
Chris@1 171 little happening between.</p>
Chris@1 172
Chris@1 173 <p>The fundamental design distinction between continuous and
Chris@1 174 discontinuous streams concerns buffering.</p>
Chris@1 175
Chris@1 176 <h3>Buffering</h3>
Chris@1 177
Chris@1 178 <p>Because a continuous stream is, by definition, gapless, Ogg buffering
Chris@1 179 is based on the simple premise of never allowing any active continuous
Chris@1 180 stream to starve for data during decode; buffering proceeds ahead
Chris@1 181 until all continuous streams in a physical stream have data ready to
Chris@1 182 decode on demand.</p>
Chris@1 183
Chris@1 184 <p>Discontinuous stream data may occur on a fairly regular basis, but the
Chris@1 185 timing of, for example, a specific caption is impossible to predict
Chris@1 186 with certainty in most captioning systems. Thus the buffering system
Chris@1 187 should take discontinuous data 'as it comes' rather than working ahead
Chris@1 188 (for a potentially unbounded period) to look for future discontinuous
Chris@1 189 data. As such, discontinuous streams are ignored when managing
Chris@1 190 buffering; their pages simply 'fall out' of the stream when continuous
Chris@1 191 streams are handled properly.</p>
Chris@1 192
Chris@1 193 <p>Buffering requirements need not be explicitly declared or managed for
Chris@1 194 the encoded stream; the decoder simply reads as much data as is
Chris@1 195 necessary to keep all continuous stream types gapless (also ensuring
Chris@1 196 discontinuous data arrives in time) and no more, resulting in optimum
Chris@1 197 implicit buffer usage for a given stream. Because all pages of all
Chris@1 198 data types are stamped with absolute timing information within the
Chris@1 199 stream, inter-stream synchronization timing is always explicitly
Chris@1 200 maintained without the need for explicitly declared buffer-ahead
Chris@1 201 hinting.</p>
Chris@1 202
Chris@1 203 <p>Further details, mechanisms and reasons for the differing arrangement
Chris@1 204 and behavior of continuous and discontinuous streams is discussed
Chris@1 205 later.</p>
Chris@1 206
Chris@1 207 <h3>Whole-stream navigation</h3>
Chris@1 208
Chris@1 209 <p>Ogg is designed so that the simplest navigation operations treat the
Chris@1 210 physical Ogg stream as a whole summary of its streams, rather than
Chris@1 211 navigating each interleaved stream as a separate entity.</p>
Chris@1 212
Chris@1 213 <p>First Example: seeking to a desired time position in a multiplexed (or
Chris@1 214 unmultiplexed) Ogg stream can be accomplished through a bisection
Chris@1 215 search on time position of all pages in the stream (as encoded in the
Chris@1 216 granule position). More powerful searches (such as a key frame-aware
Chris@1 217 seek within video) are also possible with additional search
Chris@1 218 complexity, but similar computational complexity.</p>
Chris@1 219
Chris@1 220 <p>Second Example: A bitstream section may consist of three multiplexed
Chris@1 221 streams of differing lengths. The result of multiplexing these
Chris@1 222 streams should be thought of as a single mixed stream with a length
Chris@1 223 equal to the longest of the three component streams. Although it is
Chris@1 224 also possible to think of the multiplexed results as three concurrent
Chris@1 225 streams of different lengths and it is possible to recover the three
Chris@1 226 original streams, it will also become obvious that once multiplexed,
Chris@1 227 it isn't possible to find the internal lengths of the component
Chris@1 228 streams without a linear search of the whole bitstream section.
Chris@1 229 However, it is possible to find the length of the whole bitstream
Chris@1 230 section easily (in near-constant time per section) just as it is for a
Chris@1 231 single-media unmultiplexed stream.</p>
Chris@1 232
Chris@1 233 <h2>Granule Position</h2>
Chris@1 234
Chris@1 235 <h3>Description</h3>
Chris@1 236
Chris@1 237 <p>The Granule Position is a signed 64 bit field appearing in the header
Chris@1 238 of every Ogg page. Although the granule position represents absolute
Chris@1 239 time within a logical stream, its value does not necessarily directly
Chris@1 240 encode a simple timestamp. It may represent frames elapsed (as in
Chris@1 241 Vorbis), a simple timestamp, or a more complex bit-division encoding
Chris@1 242 (such as in Theora). The exact encoding of the granule position is up
Chris@1 243 to a specific codec.</p>
Chris@1 244
Chris@1 245 <p>The granule position is governed by the following rules:</p>
Chris@1 246
Chris@1 247 <ul>
Chris@1 248
Chris@1 249 <li>Granule Position must always increase forward or remain equal from
Chris@1 250 page to page, be unset, or be zero for a header page. The absolute
Chris@1 251 time to which any correct sequence of granule position maps must
Chris@1 252 similarly always increase forward or remain equal. <i>(A codec may
Chris@1 253 make use of data, such as a control sequence, that only affects codec
Chris@1 254 working state without producing data and thus advancing granule
Chris@1 255 position and time. Although the packet sequence number increases in
Chris@1 256 this case, the granule position, and thus the time position, do
Chris@1 257 not.)</i></li>
Chris@1 258
Chris@1 259 <li>Granule position may only be unset if there no packet defining a
Chris@1 260 time boundary on the page (that is, if no packet in a continuous
Chris@1 261 stream ends on the page, or no packet in a discontinuous stream begins
Chris@1 262 on the page. This will be discussed in more detail under Continuous
Chris@1 263 and Discontinuous streams).</li>
Chris@1 264
Chris@1 265 <li>A codec must be able to translate a given granule position value
Chris@1 266 to a unique, deterministic absolute time value through direct
Chris@1 267 calculation. A codec is not required to be able to translate an
Chris@1 268 absolute time value into a unique granule position value.</li>
Chris@1 269
Chris@1 270 <li>Codecs shall choose a granule position definition that allows that
Chris@1 271 codec means to seek as directly as possible to an immediately
Chris@1 272 decodable point, such as the bit-divided granule position encoding of
Chris@1 273 Theora allows the codec to seek efficiently to key frame without using
Chris@1 274 an index. That is, additional information other than absolute time
Chris@1 275 may be encoded into a granule position value so long as the granule
Chris@1 276 position obeys the above points.</li>
Chris@1 277
Chris@1 278 </ul>
Chris@1 279
Chris@1 280 <h4>Example: timestamp</h4>
Chris@1 281
Chris@1 282 <p>In general, a codec/stream type should choose the simplest granule
Chris@1 283 position encoding that addresses its requirements. The examples here
Chris@1 284 are by no means exhaustive of the possibilities within Ogg.</p>
Chris@1 285
Chris@1 286 <p>A simple granule position could encode a timestamp directly. For
Chris@1 287 example, a granule position that encoded milliseconds from beginning
Chris@1 288 of stream would allow a logical stream length of over 100,000,000,000
Chris@1 289 days before beginning a new logical stream (to avoid the granule
Chris@1 290 position wrapping).</p>
Chris@1 291
Chris@1 292 <h4>Example: framestamp</h4>
Chris@1 293
Chris@1 294 <p>A simple millisecond timestamp granule encoding might suit many stream
Chris@1 295 types, but a millisecond resolution is inappropriate to, eg, most
Chris@1 296 audio encodings where exact single-sample resolution is generally a
Chris@1 297 requirement. A millisecond is both too large a granule and often does
Chris@1 298 not represent an integer number of samples.</p>
Chris@1 299
Chris@1 300 <p>In the event that audio frames are always encoded as the same number of
Chris@1 301 samples, the granule position could simply be a linear count of frames
Chris@1 302 since beginning of stream. This has the advantages of being exact and
Chris@1 303 efficient. Position in time would simply be <tt>[granule_position] *
Chris@1 304 [samples_per_frame] / [samples_per_second]</tt>.</p>
Chris@1 305
Chris@1 306 <h4>Example: samplestamp (Vorbis)</h4>
Chris@1 307
Chris@1 308 <p>Frame counting is insufficient in codecs such as Vorbis where an audio
Chris@1 309 frame [packet] encodes a variable number of samples. In Vorbis's
Chris@1 310 case, the granule position is a count of the number of raw samples
Chris@1 311 from the beginning of stream; the absolute time of
Chris@1 312 a granule position is <tt>[granule_position] /
Chris@1 313 [samples_per_second]</tt>.</p>
Chris@1 314
Chris@1 315 <h4>Example: bit-divided framestamp (Theora)</h4>
Chris@1 316
Chris@1 317 <p>Some video codecs may be able to use the simple framestamp scheme for
Chris@1 318 granule position. However, most modern video codecs introduce at
Chris@1 319 least the following complications:</p>
Chris@1 320
Chris@1 321 <ul>
Chris@1 322
Chris@1 323 <li>video frames are relatively far apart compared to audio samples;
Chris@1 324 for this reason, the point at which a video frame changes to the next
Chris@1 325 frame is usually a strictly defined offset within the frame 'period'.
Chris@1 326 That is, video at 50fps could just as easily define frame transitions
Chris@1 327 &lt;.015, .035, .055...&gt; as at &lt;.00, .02, .04...&gt;.</li>
Chris@1 328
Chris@1 329 <li>frame rates often include drop-frames, leap-frames or other
Chris@1 330 rational-but-non-integer timings.</li>
Chris@1 331
Chris@1 332 <li>Decode must begin at a 'key frame' or 'I frame'. Keyframes usually
Chris@1 333 occur relatively seldom.</li>
Chris@1 334
Chris@1 335 </ul>
Chris@1 336
Chris@1 337 <p>The first two points can be handled straightforwardly via the fact
Chris@1 338 that the codec has complete control mapping granule position to
Chris@1 339 absolute time; non-integer frame rates and offsets can be set in the
Chris@1 340 codec's initial header, and the rest is just arithmetic.</p>
Chris@1 341
Chris@1 342 <p>The third point appears trickier at first glance, but it too can be
Chris@1 343 handled through the granule position mapping mechanism. Here we
Chris@1 344 arrange the granule position in such a way that granule positions of
Chris@1 345 key frames are easy to find. Divide the granule position into two
Chris@1 346 fields; the most-significant bits are an absolute frame counter, but
Chris@1 347 it's only updated at each key frame. The least significant bits encode
Chris@1 348 the number of frames since the last key frame. In this way, each
Chris@1 349 granule position both encodes the absolute time of the current frame
Chris@1 350 as well as the absolute time of the last key frame.</p>
Chris@1 351
Chris@1 352 <p>Seeking to a most recent preceding key frame is then accomplished by
Chris@1 353 first seeking to the original desired point, inspecting the granulepos
Chris@1 354 of the resulting video page, extracting from that granulepos the
Chris@1 355 absolute time of the desired key frame, and then seeking directly to
Chris@1 356 that key frame's page. Of course, it's still possible for an
Chris@1 357 application to ignore key frames and use a simpler seeking algorithm
Chris@1 358 (decode would be unable to present decoded video until the next
Chris@1 359 key frame). Surprisingly many player applications do choose the
Chris@1 360 simpler approach.</p>
Chris@1 361
Chris@1 362 <h3>granule position, packets and pages</h3>
Chris@1 363
Chris@1 364 <p>Although each packet of data in a logical stream theoretically has a
Chris@1 365 specific granule position, only one granule position is encoded
Chris@1 366 per page. It is possible to encode a logical stream such that each
Chris@1 367 page contains only a single packet (so that granule positions are
Chris@1 368 preserved for each packet), however a one-to-one packet/page mapping
Chris@1 369 is not intended to be the general case.</p>
Chris@1 370
Chris@1 371 <p>Because Ogg functions at the page, not packet, level, this
Chris@1 372 once-per-page time information provides Ogg with the finest-grained
Chris@1 373 time information is can use. Ogg passes this granule positioning data
Chris@1 374 to the codec (along with the packets extracted from a page); it is the
Chris@1 375 responsibility of codecs to track timing information at granularities
Chris@1 376 finer than a single page.</p>
Chris@1 377
Chris@1 378 <h3>start-time and end-time positioning</h3>
Chris@1 379
Chris@1 380 <p>A granule position represents the <em>instantaneous time location
Chris@1 381 between two pages</em>. However, continuous streams and discontinuous
Chris@1 382 streams differ on whether the granulepos represents the end-time of
Chris@1 383 the data on a page or the start-time. Continuous streams are
Chris@1 384 'end-time' encoded; the granulepos represents the point in time
Chris@1 385 immediately after the last data decoded from a page. Discontinuous
Chris@1 386 streams are 'start-time' encoded; the granulepos represents the point
Chris@1 387 in time of the first data decoded from the page.</p>
Chris@1 388
Chris@1 389 <p>An Ogg stream type is declared continuous or discontinuous by its
Chris@1 390 codec. A given codec may support both continuous and discontinuous
Chris@1 391 operation so long as any given logical stream is continuous or
Chris@1 392 discontinuous for its entirety and the codec is able to ascertain (and
Chris@1 393 inform the Ogg layer) as to which after decoding the initial stream
Chris@1 394 header. The majority of codecs will always be continuous (such as
Chris@1 395 Vorbis) or discontinuous (such as Writ).</p>
Chris@1 396
Chris@1 397 <p>Start- and end-time encoding do not affect multiplexing sort-order;
Chris@1 398 pages are still sorted by the absolute time a given granulepos maps to
Chris@1 399 regardless of whether that granulepos represents start- or
Chris@1 400 end-time.</p>
Chris@1 401
Chris@1 402 <h2>Multiplex/Demultiplex Division of Labor</h2>
Chris@1 403
Chris@1 404 <p>The Ogg multiplex/demultiplex layer provides mechanisms for encoding
Chris@1 405 raw packets into Ogg pages, decoding Ogg pages back into the original
Chris@1 406 codec packets, determining the logical structure of an Ogg stream, and
Chris@1 407 navigating through and synchronizing with an Ogg stream at a desired
Chris@1 408 stream location. Strict multiplex/demultiplex operations are entirely
Chris@1 409 in the Ogg domain and require no intervention from codecs.</p>
Chris@1 410
Chris@1 411 <p>Implementation of more complex operations does require codec
Chris@1 412 knowledge, however. Unlike other framing systems, Ogg maintains
Chris@1 413 strict separation between framing and the framed bitstream data; Ogg
Chris@1 414 does not replicate codec-specific information in the page/framing
Chris@1 415 data, nor does Ogg blur the line between framing and stream
Chris@1 416 data/metadata. Because Ogg is fully data-agnostic toward the data it
Chris@1 417 frames, operations which require specifics of bitstream data (such as
Chris@1 418 'seek to key frame') also require interaction with the codec layer
Chris@1 419 (because, in this example, the Ogg layer is not aware of the concept
Chris@1 420 of key frames). This is different from systems that blur the
Chris@1 421 separation between framing and stream data in order to simplify the
Chris@1 422 separation of code. The Ogg system purposely keeps the distinction in
Chris@1 423 data simple so that later codec innovations are not constrained by
Chris@1 424 framing design.</p>
Chris@1 425
Chris@1 426 <p>For this reason, however, complex seeking operations require
Chris@1 427 interaction with the codecs in order to decode the granule position of
Chris@1 428 a given stream type back to absolute time or in order to find
Chris@1 429 'decodable points' such as key frames in video.</p>
Chris@1 430
Chris@1 431 <h2>Unsorted Discussion Points</h2>
Chris@1 432
Chris@1 433 <p>flushes around key frames? RFC suggestion: repaginating or building a
Chris@1 434 stream this way is nice but not required</p>
Chris@1 435
Chris@1 436 <h2>Appendix A: multiplexing examples</h2>
Chris@1 437
Chris@1 438 <div id="copyright">
Chris@1 439 The Xiph Fish Logo is a
Chris@1 440 trademark (&trade;) of Xiph.Org.<br/>
Chris@1 441
Chris@1 442 These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
Chris@1 443 </div>
Chris@1 444
Chris@1 445 </body>
Chris@1 446 </html>