Mercurial > hg > sv-dependency-builds
comparison src/libogg-1.3.0/doc/ogg-multiplex.html @ 1:05aa0afa9217
Bring in flac, ogg, vorbis
author | Chris Cannam |
---|---|
date | Tue, 19 Mar 2013 17:37:49 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
0:c7265573341e | 1:05aa0afa9217 |
---|---|
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
2 <html> | |
3 <head> | |
4 | |
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> | |
6 <title>Ogg Documentation</title> | |
7 | |
8 <style type="text/css"> | |
9 body { | |
10 margin: 0 18px 0 18px; | |
11 padding-bottom: 30px; | |
12 font-family: Verdana, Arial, Helvetica, sans-serif; | |
13 color: #333333; | |
14 font-size: .8em; | |
15 } | |
16 | |
17 a { | |
18 color: #3366cc; | |
19 } | |
20 | |
21 img { | |
22 border: 0; | |
23 } | |
24 | |
25 #xiphlogo { | |
26 margin: 30px 0 16px 0; | |
27 } | |
28 | |
29 #content p { | |
30 line-height: 1.4; | |
31 } | |
32 | |
33 h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a { | |
34 font-weight: bold; | |
35 color: #ff9900; | |
36 margin: 1.3em 0 8px 0; | |
37 } | |
38 | |
39 h1 { | |
40 font-size: 1.3em; | |
41 } | |
42 | |
43 h2 { | |
44 font-size: 1.2em; | |
45 } | |
46 | |
47 h3 { | |
48 font-size: 1.1em; | |
49 } | |
50 | |
51 li { | |
52 line-height: 1.4; | |
53 } | |
54 | |
55 #copyright { | |
56 margin-top: 30px; | |
57 line-height: 1.5em; | |
58 text-align: center; | |
59 font-size: .8em; | |
60 color: #888888; | |
61 clear: both; | |
62 } | |
63 </style> | |
64 | |
65 </head> | |
66 | |
67 <body> | |
68 | |
69 <div id="xiphlogo"> | |
70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> | |
71 </div> | |
72 | |
73 <h1>Page Multiplexing and Ordering in a Physical Ogg Stream</h1> | |
74 | |
75 <p>The low-level mechanisms of an Ogg stream (as described in the Ogg | |
76 Bitstream Overview) provide means for mixing multiple logical streams | |
77 and media types into a single linear-chronological stream. This | |
78 document specifies the high-level arrangement and use of page | |
79 structure to multiplex multiple streams of mixed media type within a | |
80 physical Ogg stream.</p> | |
81 | |
82 <h2>Design Elements</h2> | |
83 | |
84 <p>The design and arrangement of the Ogg container format is governed by | |
85 several high-level design decisions that form the reasoning behind | |
86 specific low-level design decisions.</p> | |
87 | |
88 <h3>Linear media</h3> | |
89 | |
90 <p>The Ogg bitstream is intended to encapsulate chronological, | |
91 time-linear mixed media into a single delivery stream or file. The | |
92 design is such that an application can always encode and/or decode a | |
93 full-featured bitstream in one pass with no seeking and minimal | |
94 buffering. Seeking to provide optimized encoding (such as two-pass | |
95 encoding) or interactive decoding (such as scrubbing or instant | |
96 replay) is not disallowed or discouraged, however no bitstream feature | |
97 must require nonlinear operation on the bitstream.</p> | |
98 | |
99 <h3>Multiplexing</h3> | |
100 | |
101 <p>Ogg bitstreams multiplex multiple logical streams into a single | |
102 physical stream at the page level. Each page contains an abstract | |
103 time stamp (the Granule Position) that represents an absolute time | |
104 landmark within the stream. After the pages representing stream | |
105 headers (all logical stream headers occur at the beginning of a | |
106 physical bitstream section before any logical stream data), logical | |
107 stream data pages are arranged in a physical bitstream in strict | |
108 non-decreasing order by chronological absolute time as | |
109 specified by the granule position.</p> | |
110 | |
111 <p>The only exception to arranging pages in strictly ascending time order | |
112 by granule position is those pages that do not set the granule | |
113 position value. This is a special case when exceptionally large | |
114 packets span multiple pages; the specifics of handling this special | |
115 case are described later under 'Continuous and Discontinuous | |
116 Streams'.</p> | |
117 | |
118 <h3>Seeking</h3> | |
119 | |
120 <p>Ogg is designed to use an interpolated bisection search to | |
121 implement exact positional seeking. Interpolated bisection search is | |
122 a spec-mandated mechanism.</p> | |
123 | |
124 <p><i>An index may improve objective performance, but it seldom | |
125 improves subjective performance outside of a few high-latency use | |
126 cases and adds no additional functionality as bisection search | |
127 delivers the same functionality for both one- and two-pass stream | |
128 types. For these reasons, use of indexes is discouraged, except in | |
129 cases where an index provides demonstrable and noticable performance | |
130 improvement.</i></p> | |
131 | |
132 <p>Seek operations are by absolute time; a direct bisection search must | |
133 find the exact time position requested. Information in the Ogg | |
134 bitstream is arranged such that all information to be presented for | |
135 playback from the desired seek point will occur at or after the | |
136 desired seek point. Seek operations are neither 'fuzzy' nor | |
137 heuristic.</p> | |
138 | |
139 <p><i>Although key frame handling in video appears to be an exception to | |
140 "all needed playback information lies ahead of a given seek", | |
141 key frames can still be handled directly within this indexless | |
142 framework. Seeking to a key frame in video (as well as seeking in other | |
143 media types with analogous restraints) is handled as two seeks; first | |
144 a seek to the desired time which extracts state information that | |
145 decodes to the time of the last key frame, followed by a second seek | |
146 directly to the key frame. The location of the previous key frame is | |
147 embedded as state information in the granulepos; this mechanism is | |
148 described in more detail later.</i></p> | |
149 | |
150 <h3>Continuous and Discontinuous Streams</h3> | |
151 | |
152 <p>Logical streams within a physical Ogg stream belong to one of two | |
153 categories, "Continuous" streams and "Discontinuous" streams. | |
154 Although these are discussed in more detail later, the distinction is | |
155 important to a high-level understanding of how to buffer an Ogg | |
156 stream.</p> | |
157 | |
158 <p>A stream that provides a gapless, time-continuous media type with a | |
159 fine-grained timebase is considered to be 'Continuous'. A continuous | |
160 stream should never be starved of data. Clear examples of continuous | |
161 data types include broadcast audio and video.</p> | |
162 | |
163 <p>A stream that delivers data in a potentially irregular pattern or with | |
164 widely spaced timing gaps is considered to be 'Discontinuous'. A | |
165 discontinuous stream may be best thought of as data representing | |
166 scattered events; although they happen in order, they are typically | |
167 unconnected data often located far apart. One possible example of a | |
168 discontinuous stream types would be captioning. Although it's | |
169 possible to design captions as a continuous stream type, it's most | |
170 natural to think of captions as widely spaced pieces of text with | |
171 little happening between.</p> | |
172 | |
173 <p>The fundamental design distinction between continuous and | |
174 discontinuous streams concerns buffering.</p> | |
175 | |
176 <h3>Buffering</h3> | |
177 | |
178 <p>Because a continuous stream is, by definition, gapless, Ogg buffering | |
179 is based on the simple premise of never allowing any active continuous | |
180 stream to starve for data during decode; buffering proceeds ahead | |
181 until all continuous streams in a physical stream have data ready to | |
182 decode on demand.</p> | |
183 | |
184 <p>Discontinuous stream data may occur on a fairly regular basis, but the | |
185 timing of, for example, a specific caption is impossible to predict | |
186 with certainty in most captioning systems. Thus the buffering system | |
187 should take discontinuous data 'as it comes' rather than working ahead | |
188 (for a potentially unbounded period) to look for future discontinuous | |
189 data. As such, discontinuous streams are ignored when managing | |
190 buffering; their pages simply 'fall out' of the stream when continuous | |
191 streams are handled properly.</p> | |
192 | |
193 <p>Buffering requirements need not be explicitly declared or managed for | |
194 the encoded stream; the decoder simply reads as much data as is | |
195 necessary to keep all continuous stream types gapless (also ensuring | |
196 discontinuous data arrives in time) and no more, resulting in optimum | |
197 implicit buffer usage for a given stream. Because all pages of all | |
198 data types are stamped with absolute timing information within the | |
199 stream, inter-stream synchronization timing is always explicitly | |
200 maintained without the need for explicitly declared buffer-ahead | |
201 hinting.</p> | |
202 | |
203 <p>Further details, mechanisms and reasons for the differing arrangement | |
204 and behavior of continuous and discontinuous streams is discussed | |
205 later.</p> | |
206 | |
207 <h3>Whole-stream navigation</h3> | |
208 | |
209 <p>Ogg is designed so that the simplest navigation operations treat the | |
210 physical Ogg stream as a whole summary of its streams, rather than | |
211 navigating each interleaved stream as a separate entity.</p> | |
212 | |
213 <p>First Example: seeking to a desired time position in a multiplexed (or | |
214 unmultiplexed) Ogg stream can be accomplished through a bisection | |
215 search on time position of all pages in the stream (as encoded in the | |
216 granule position). More powerful searches (such as a key frame-aware | |
217 seek within video) are also possible with additional search | |
218 complexity, but similar computational complexity.</p> | |
219 | |
220 <p>Second Example: A bitstream section may consist of three multiplexed | |
221 streams of differing lengths. The result of multiplexing these | |
222 streams should be thought of as a single mixed stream with a length | |
223 equal to the longest of the three component streams. Although it is | |
224 also possible to think of the multiplexed results as three concurrent | |
225 streams of different lengths and it is possible to recover the three | |
226 original streams, it will also become obvious that once multiplexed, | |
227 it isn't possible to find the internal lengths of the component | |
228 streams without a linear search of the whole bitstream section. | |
229 However, it is possible to find the length of the whole bitstream | |
230 section easily (in near-constant time per section) just as it is for a | |
231 single-media unmultiplexed stream.</p> | |
232 | |
233 <h2>Granule Position</h2> | |
234 | |
235 <h3>Description</h3> | |
236 | |
237 <p>The Granule Position is a signed 64 bit field appearing in the header | |
238 of every Ogg page. Although the granule position represents absolute | |
239 time within a logical stream, its value does not necessarily directly | |
240 encode a simple timestamp. It may represent frames elapsed (as in | |
241 Vorbis), a simple timestamp, or a more complex bit-division encoding | |
242 (such as in Theora). The exact encoding of the granule position is up | |
243 to a specific codec.</p> | |
244 | |
245 <p>The granule position is governed by the following rules:</p> | |
246 | |
247 <ul> | |
248 | |
249 <li>Granule Position must always increase forward or remain equal from | |
250 page to page, be unset, or be zero for a header page. The absolute | |
251 time to which any correct sequence of granule position maps must | |
252 similarly always increase forward or remain equal. <i>(A codec may | |
253 make use of data, such as a control sequence, that only affects codec | |
254 working state without producing data and thus advancing granule | |
255 position and time. Although the packet sequence number increases in | |
256 this case, the granule position, and thus the time position, do | |
257 not.)</i></li> | |
258 | |
259 <li>Granule position may only be unset if there no packet defining a | |
260 time boundary on the page (that is, if no packet in a continuous | |
261 stream ends on the page, or no packet in a discontinuous stream begins | |
262 on the page. This will be discussed in more detail under Continuous | |
263 and Discontinuous streams).</li> | |
264 | |
265 <li>A codec must be able to translate a given granule position value | |
266 to a unique, deterministic absolute time value through direct | |
267 calculation. A codec is not required to be able to translate an | |
268 absolute time value into a unique granule position value.</li> | |
269 | |
270 <li>Codecs shall choose a granule position definition that allows that | |
271 codec means to seek as directly as possible to an immediately | |
272 decodable point, such as the bit-divided granule position encoding of | |
273 Theora allows the codec to seek efficiently to key frame without using | |
274 an index. That is, additional information other than absolute time | |
275 may be encoded into a granule position value so long as the granule | |
276 position obeys the above points.</li> | |
277 | |
278 </ul> | |
279 | |
280 <h4>Example: timestamp</h4> | |
281 | |
282 <p>In general, a codec/stream type should choose the simplest granule | |
283 position encoding that addresses its requirements. The examples here | |
284 are by no means exhaustive of the possibilities within Ogg.</p> | |
285 | |
286 <p>A simple granule position could encode a timestamp directly. For | |
287 example, a granule position that encoded milliseconds from beginning | |
288 of stream would allow a logical stream length of over 100,000,000,000 | |
289 days before beginning a new logical stream (to avoid the granule | |
290 position wrapping).</p> | |
291 | |
292 <h4>Example: framestamp</h4> | |
293 | |
294 <p>A simple millisecond timestamp granule encoding might suit many stream | |
295 types, but a millisecond resolution is inappropriate to, eg, most | |
296 audio encodings where exact single-sample resolution is generally a | |
297 requirement. A millisecond is both too large a granule and often does | |
298 not represent an integer number of samples.</p> | |
299 | |
300 <p>In the event that audio frames are always encoded as the same number of | |
301 samples, the granule position could simply be a linear count of frames | |
302 since beginning of stream. This has the advantages of being exact and | |
303 efficient. Position in time would simply be <tt>[granule_position] * | |
304 [samples_per_frame] / [samples_per_second]</tt>.</p> | |
305 | |
306 <h4>Example: samplestamp (Vorbis)</h4> | |
307 | |
308 <p>Frame counting is insufficient in codecs such as Vorbis where an audio | |
309 frame [packet] encodes a variable number of samples. In Vorbis's | |
310 case, the granule position is a count of the number of raw samples | |
311 from the beginning of stream; the absolute time of | |
312 a granule position is <tt>[granule_position] / | |
313 [samples_per_second]</tt>.</p> | |
314 | |
315 <h4>Example: bit-divided framestamp (Theora)</h4> | |
316 | |
317 <p>Some video codecs may be able to use the simple framestamp scheme for | |
318 granule position. However, most modern video codecs introduce at | |
319 least the following complications:</p> | |
320 | |
321 <ul> | |
322 | |
323 <li>video frames are relatively far apart compared to audio samples; | |
324 for this reason, the point at which a video frame changes to the next | |
325 frame is usually a strictly defined offset within the frame 'period'. | |
326 That is, video at 50fps could just as easily define frame transitions | |
327 <.015, .035, .055...> as at <.00, .02, .04...>.</li> | |
328 | |
329 <li>frame rates often include drop-frames, leap-frames or other | |
330 rational-but-non-integer timings.</li> | |
331 | |
332 <li>Decode must begin at a 'key frame' or 'I frame'. Keyframes usually | |
333 occur relatively seldom.</li> | |
334 | |
335 </ul> | |
336 | |
337 <p>The first two points can be handled straightforwardly via the fact | |
338 that the codec has complete control mapping granule position to | |
339 absolute time; non-integer frame rates and offsets can be set in the | |
340 codec's initial header, and the rest is just arithmetic.</p> | |
341 | |
342 <p>The third point appears trickier at first glance, but it too can be | |
343 handled through the granule position mapping mechanism. Here we | |
344 arrange the granule position in such a way that granule positions of | |
345 key frames are easy to find. Divide the granule position into two | |
346 fields; the most-significant bits are an absolute frame counter, but | |
347 it's only updated at each key frame. The least significant bits encode | |
348 the number of frames since the last key frame. In this way, each | |
349 granule position both encodes the absolute time of the current frame | |
350 as well as the absolute time of the last key frame.</p> | |
351 | |
352 <p>Seeking to a most recent preceding key frame is then accomplished by | |
353 first seeking to the original desired point, inspecting the granulepos | |
354 of the resulting video page, extracting from that granulepos the | |
355 absolute time of the desired key frame, and then seeking directly to | |
356 that key frame's page. Of course, it's still possible for an | |
357 application to ignore key frames and use a simpler seeking algorithm | |
358 (decode would be unable to present decoded video until the next | |
359 key frame). Surprisingly many player applications do choose the | |
360 simpler approach.</p> | |
361 | |
362 <h3>granule position, packets and pages</h3> | |
363 | |
364 <p>Although each packet of data in a logical stream theoretically has a | |
365 specific granule position, only one granule position is encoded | |
366 per page. It is possible to encode a logical stream such that each | |
367 page contains only a single packet (so that granule positions are | |
368 preserved for each packet), however a one-to-one packet/page mapping | |
369 is not intended to be the general case.</p> | |
370 | |
371 <p>Because Ogg functions at the page, not packet, level, this | |
372 once-per-page time information provides Ogg with the finest-grained | |
373 time information is can use. Ogg passes this granule positioning data | |
374 to the codec (along with the packets extracted from a page); it is the | |
375 responsibility of codecs to track timing information at granularities | |
376 finer than a single page.</p> | |
377 | |
378 <h3>start-time and end-time positioning</h3> | |
379 | |
380 <p>A granule position represents the <em>instantaneous time location | |
381 between two pages</em>. However, continuous streams and discontinuous | |
382 streams differ on whether the granulepos represents the end-time of | |
383 the data on a page or the start-time. Continuous streams are | |
384 'end-time' encoded; the granulepos represents the point in time | |
385 immediately after the last data decoded from a page. Discontinuous | |
386 streams are 'start-time' encoded; the granulepos represents the point | |
387 in time of the first data decoded from the page.</p> | |
388 | |
389 <p>An Ogg stream type is declared continuous or discontinuous by its | |
390 codec. A given codec may support both continuous and discontinuous | |
391 operation so long as any given logical stream is continuous or | |
392 discontinuous for its entirety and the codec is able to ascertain (and | |
393 inform the Ogg layer) as to which after decoding the initial stream | |
394 header. The majority of codecs will always be continuous (such as | |
395 Vorbis) or discontinuous (such as Writ).</p> | |
396 | |
397 <p>Start- and end-time encoding do not affect multiplexing sort-order; | |
398 pages are still sorted by the absolute time a given granulepos maps to | |
399 regardless of whether that granulepos represents start- or | |
400 end-time.</p> | |
401 | |
402 <h2>Multiplex/Demultiplex Division of Labor</h2> | |
403 | |
404 <p>The Ogg multiplex/demultiplex layer provides mechanisms for encoding | |
405 raw packets into Ogg pages, decoding Ogg pages back into the original | |
406 codec packets, determining the logical structure of an Ogg stream, and | |
407 navigating through and synchronizing with an Ogg stream at a desired | |
408 stream location. Strict multiplex/demultiplex operations are entirely | |
409 in the Ogg domain and require no intervention from codecs.</p> | |
410 | |
411 <p>Implementation of more complex operations does require codec | |
412 knowledge, however. Unlike other framing systems, Ogg maintains | |
413 strict separation between framing and the framed bitstream data; Ogg | |
414 does not replicate codec-specific information in the page/framing | |
415 data, nor does Ogg blur the line between framing and stream | |
416 data/metadata. Because Ogg is fully data-agnostic toward the data it | |
417 frames, operations which require specifics of bitstream data (such as | |
418 'seek to key frame') also require interaction with the codec layer | |
419 (because, in this example, the Ogg layer is not aware of the concept | |
420 of key frames). This is different from systems that blur the | |
421 separation between framing and stream data in order to simplify the | |
422 separation of code. The Ogg system purposely keeps the distinction in | |
423 data simple so that later codec innovations are not constrained by | |
424 framing design.</p> | |
425 | |
426 <p>For this reason, however, complex seeking operations require | |
427 interaction with the codecs in order to decode the granule position of | |
428 a given stream type back to absolute time or in order to find | |
429 'decodable points' such as key frames in video.</p> | |
430 | |
431 <h2>Unsorted Discussion Points</h2> | |
432 | |
433 <p>flushes around key frames? RFC suggestion: repaginating or building a | |
434 stream this way is nice but not required</p> | |
435 | |
436 <h2>Appendix A: multiplexing examples</h2> | |
437 | |
438 <div id="copyright"> | |
439 The Xiph Fish Logo is a | |
440 trademark (™) of Xiph.Org.<br/> | |
441 | |
442 These pages © 1994 - 2005 Xiph.Org. All rights reserved. | |
443 </div> | |
444 | |
445 </body> | |
446 </html> |