Mercurial > hg > sv-dependency-builds
comparison src/libvorbis-1.3.3/doc/framing.html @ 1:05aa0afa9217
Bring in flac, ogg, vorbis
author | Chris Cannam |
---|---|
date | Tue, 19 Mar 2013 17:37:49 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
0:c7265573341e | 1:05aa0afa9217 |
---|---|
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | |
2 <html> | |
3 <head> | |
4 | |
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> | |
6 <title>Ogg Vorbis Documentation</title> | |
7 | |
8 <style type="text/css"> | |
9 body { | |
10 margin: 0 18px 0 18px; | |
11 padding-bottom: 30px; | |
12 font-family: Verdana, Arial, Helvetica, sans-serif; | |
13 color: #333333; | |
14 font-size: .8em; | |
15 } | |
16 | |
17 a { | |
18 color: #3366cc; | |
19 } | |
20 | |
21 img { | |
22 border: 0; | |
23 } | |
24 | |
25 #xiphlogo { | |
26 margin: 30px 0 16px 0; | |
27 } | |
28 | |
29 #content p { | |
30 line-height: 1.4; | |
31 } | |
32 | |
33 h1, h1 a, h2, h2 a, h3, h3 a { | |
34 font-weight: bold; | |
35 color: #ff9900; | |
36 margin: 1.3em 0 8px 0; | |
37 } | |
38 | |
39 h1 { | |
40 font-size: 1.3em; | |
41 } | |
42 | |
43 h2 { | |
44 font-size: 1.2em; | |
45 } | |
46 | |
47 h3 { | |
48 font-size: 1.1em; | |
49 } | |
50 | |
51 li { | |
52 line-height: 1.4; | |
53 } | |
54 | |
55 #copyright { | |
56 margin-top: 30px; | |
57 line-height: 1.5em; | |
58 text-align: center; | |
59 font-size: .8em; | |
60 color: #888888; | |
61 clear: both; | |
62 } | |
63 </style> | |
64 | |
65 </head> | |
66 | |
67 <body> | |
68 | |
69 <div id="xiphlogo"> | |
70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a> | |
71 </div> | |
72 | |
73 <h1>Ogg logical bitstream framing</h1> | |
74 | |
75 <h2>Ogg bitstreams</h2> | |
76 | |
77 <p>The Ogg transport bitstream is designed to provide framing, error | |
78 protection and seeking structure for higher-level codec streams that | |
79 consist of raw, unencapsulated data packets, such as the Vorbis audio | |
80 codec or Theora video codec.</p> | |
81 | |
82 <h2>Application example: Vorbis</h2> | |
83 | |
84 <p>Vorbis encodes short-time blocks of PCM data into raw packets of | |
85 bit-packed data. These raw packets may be used directly by transport | |
86 mechanisms that provide their own framing and packet-separation | |
87 mechanisms (such as UDP datagrams). For stream based storage (such as | |
88 files) and transport (such as TCP streams or pipes), Vorbis uses the | |
89 Ogg bitstream format to provide framing/sync, sync recapture | |
90 after error, landmarks during seeking, and enough information to | |
91 properly separate data back into packets at the original packet | |
92 boundaries without relying on decoding to find packet boundaries.</p> | |
93 | |
94 <h2>Design constraints for Ogg bitstreams</h2> | |
95 | |
96 <ol> | |
97 <li>True streaming; we must not need to seek to build a 100% | |
98 complete bitstream.</li> | |
99 <li>Use no more than approximately 1-2% of bitstream bandwidth for | |
100 packet boundary marking, high-level framing, sync and seeking.</li> | |
101 <li>Specification of absolute position within the original sample | |
102 stream.</li> | |
103 <li>Simple mechanism to ease limited editing, such as a simplified | |
104 concatenation mechanism.</li> | |
105 <li>Detection of corruption, recapture after error and direct, random | |
106 access to data at arbitrary positions in the bitstream.</li> | |
107 </ol> | |
108 | |
109 <h2>Logical and Physical Bitstreams</h2> | |
110 | |
111 <p>A <em>logical</em> Ogg bitstream is a contiguous stream of | |
112 sequential pages belonging only to the logical bitstream. A | |
113 <em>physical</em> Ogg bitstream is constructed from one or more | |
114 than one logical Ogg bitstream (the simplest physical bitstream | |
115 is simply a single logical bitstream). We describe below the exact | |
116 formatting of an Ogg logical bitstream. Combining logical | |
117 bitstreams into more complex physical bitstreams is described in the | |
118 <a href="oggstream.html">Ogg bitstream overview</a>. The exact | |
119 mapping of raw Vorbis packets into a valid Ogg Vorbis physical | |
120 bitstream is described in the Vorbis I Specification.</p> | |
121 | |
122 <h2>Bitstream structure</h2> | |
123 | |
124 <p>An Ogg stream is structured by dividing incoming packets into | |
125 segments of up to 255 bytes and then wrapping a group of contiguous | |
126 packet segments into a variable length page preceded by a page | |
127 header. Both the header size and page size are variable; the page | |
128 header contains sizing information and checksum data to determine | |
129 header/page size and data integrity.</p> | |
130 | |
131 <p>The bitstream is captured (or recaptured) by looking for the beginning | |
132 of a page, specifically the capture pattern. Once the capture pattern | |
133 is found, the decoder verifies page sync and integrity by computing | |
134 and comparing the checksum. At that point, the decoder can extract the | |
135 packets themselves.</p> | |
136 | |
137 <h3>Packet segmentation</h3> | |
138 | |
139 <p>Packets are logically divided into multiple segments before encoding | |
140 into a page. Note that the segmentation and fragmentation process is a | |
141 logical one; it's used to compute page header values and the original | |
142 page data need not be disturbed, even when a packet spans page | |
143 boundaries.</p> | |
144 | |
145 <p>The raw packet is logically divided into [n] 255 byte segments and a | |
146 last fractional segment of < 255 bytes. A packet size may well | |
147 consist only of the trailing fractional segment, and a fractional | |
148 segment may be zero length. These values, called "lacing values" are | |
149 then saved and placed into the header segment table.</p> | |
150 | |
151 <p>An example should make the basic concept clear:</p> | |
152 | |
153 <pre> | |
154 <tt> | |
155 raw packet: | |
156 ___________________________________________ | |
157 |______________packet data__________________| 753 bytes | |
158 | |
159 lacing values for page header segment table: 255,255,243 | |
160 </tt> | |
161 </pre> | |
162 | |
163 <p>We simply add the lacing values for the total size; the last lacing | |
164 value for a packet is always the value that is less than 255. Note | |
165 that this encoding both avoids imposing a maximum packet size as well | |
166 as imposing minimum overhead on small packets (as opposed to, eg, | |
167 simply using two bytes at the head of every packet and having a max | |
168 packet size of 32k. Small packets (<255, the typical case) are | |
169 penalized with twice the segmentation overhead). Using the lacing | |
170 values as suggested, small packets see the minimum possible | |
171 byte-aligned overheade (1 byte) and large packets, over 512 bytes or | |
172 so, see a fairly constant ~.5% overhead on encoding space.</p> | |
173 | |
174 <p>Note that a lacing value of 255 implies that a second lacing value | |
175 follows in the packet, and a value of < 255 marks the end of the | |
176 packet after that many additional bytes. A packet of 255 bytes (or a | |
177 multiple of 255 bytes) is terminated by a lacing value of 0:</p> | |
178 | |
179 <pre><tt> | |
180 raw packet: | |
181 _______________________________ | |
182 |________packet data____________| 255 bytes | |
183 | |
184 lacing values: 255, 0 | |
185 </tt></pre> | |
186 | |
187 <p>Note also that a 'nil' (zero length) packet is not an error; it | |
188 consists of nothing more than a lacing value of zero in the header.</p> | |
189 | |
190 <h3>Packets spanning pages</h3> | |
191 | |
192 <p>Packets are not restricted to beginning and ending within a page, | |
193 although individual segments are, by definition, required to do so. | |
194 Packets are not restricted to a maximum size, although excessively | |
195 large packets in the data stream are discouraged; the Ogg | |
196 bitstream specification strongly recommends nominal page size of | |
197 approximately 4-8kB (large packets are foreseen as being useful for | |
198 initialization data at the beginning of a logical bitstream).</p> | |
199 | |
200 <p>After segmenting a packet, the encoder may decide not to place all the | |
201 resulting segments into the current page; to do so, the encoder places | |
202 the lacing values of the segments it wishes to belong to the current | |
203 page into the current segment table, then finishes the page. The next | |
204 page is begun with the first value in the segment table belonging to | |
205 the next packet segment, thus continuing the packet (data in the | |
206 packet body must also correspond properly to the lacing values in the | |
207 spanned pages. The segment data in the first packet corresponding to | |
208 the lacing values of the first page belong in that page; packet | |
209 segments listed in the segment table of the following page must begin | |
210 the page body of the subsequent page).</p> | |
211 | |
212 <p>The last mechanic to spanning a page boundary is to set the header | |
213 flag in the new page to indicate that the first lacing value in the | |
214 segment table continues rather than begins a packet; a header flag of | |
215 0x01 is set to indicate a continued packet. Although mandatory, it | |
216 is not actually algorithmically necessary; one could inspect the | |
217 preceding segment table to determine if the packet is new or | |
218 continued. Adding the information to the packet_header flag allows a | |
219 simpler design (with no overhead) that needs only inspect the current | |
220 page header after frame capture. This also allows faster error | |
221 recovery in the event that the packet originates in a corrupt | |
222 preceding page, implying that the previous page's segment table | |
223 cannot be trusted.</p> | |
224 | |
225 <p>Note that a packet can span an arbitrary number of pages; the above | |
226 spanning process is repeated for each spanned page boundary. Also a | |
227 'zero termination' on a packet size that is an even multiple of 255 | |
228 must appear even if the lacing value appears in the next page as a | |
229 zero-length continuation of the current packet. The header flag | |
230 should be set to 0x01 to indicate that the packet spanned, even though | |
231 the span is a nil case as far as data is concerned.</p> | |
232 | |
233 <p>The encoding looks odd, but is properly optimized for speed and the | |
234 expected case of the majority of packets being between 50 and 200 | |
235 bytes (note that it is designed such that packets of wildly different | |
236 sizes can be handled within the model; placing packet size | |
237 restrictions on the encoder would have only slightly simplified design | |
238 in page generation and increased overall encoder complexity).</p> | |
239 | |
240 <p>The main point behind tracking individual packets (and packet | |
241 segments) is to allow more flexible encoding tricks that requiring | |
242 explicit knowledge of packet size. An example is simple bandwidth | |
243 limiting, implemented by simply truncating packets in the nominal case | |
244 if the packet is arranged so that the least sensitive portion of the | |
245 data comes last.</p> | |
246 | |
247 <h3>Page header</h3> | |
248 | |
249 <p>The headering mechanism is designed to avoid copying and re-assembly | |
250 of the packet data (ie, making the packet segmentation process a | |
251 logical one); the header can be generated directly from incoming | |
252 packet data. The encoder buffers packet data until it finishes a | |
253 complete page at which point it writes the header followed by the | |
254 buffered packet segments.</p> | |
255 | |
256 <h4>capture_pattern</h4> | |
257 | |
258 <p>A header begins with a capture pattern that simplifies identifying | |
259 pages; once the decoder has found the capture pattern it can do a more | |
260 intensive job of verifying that it has in fact found a page boundary | |
261 (as opposed to an inadvertent coincidence in the byte stream).</p> | |
262 | |
263 <pre><tt> | |
264 byte value | |
265 | |
266 0 0x4f 'O' | |
267 1 0x67 'g' | |
268 2 0x67 'g' | |
269 3 0x53 'S' | |
270 </tt></pre> | |
271 | |
272 <h4>stream_structure_version</h4> | |
273 | |
274 <p>The capture pattern is followed by the stream structure revision:</p> | |
275 | |
276 <pre><tt> | |
277 byte value | |
278 | |
279 4 0x00 | |
280 </tt></pre> | |
281 | |
282 <h4>header_type_flag</h4> | |
283 | |
284 <p>The header type flag identifies this page's context in the bitstream:</p> | |
285 | |
286 <pre><tt> | |
287 byte value | |
288 | |
289 5 bitflags: 0x01: unset = fresh packet | |
290 set = continued packet | |
291 0x02: unset = not first page of logical bitstream | |
292 set = first page of logical bitstream (bos) | |
293 0x04: unset = not last page of logical bitstream | |
294 set = last page of logical bitstream (eos) | |
295 </tt></pre> | |
296 | |
297 <h4>absolute granule position</h4> | |
298 | |
299 <p>(This is packed in the same way the rest of Ogg data is packed; LSb | |
300 of LSB first. Note that the 'position' data specifies a 'sample' | |
301 number (eg, in a CD quality sample is four octets, 16 bits for left | |
302 and 16 bits for right; in video it would likely be the frame number. | |
303 It is up to the specific codec in use to define the semantic meaning | |
304 of the granule position value). The position specified is the total | |
305 samples encoded after including all packets finished on this page | |
306 (packets begun on this page but continuing on to the next page do not | |
307 count). The rationale here is that the position specified in the | |
308 frame header of the last page tells how long the data coded by the | |
309 bitstream is. A truncated stream will still return the proper number | |
310 of samples that can be decoded fully.</p> | |
311 | |
312 <p>A special value of '-1' (in two's complement) indicates that no packets | |
313 finish on this page.</p> | |
314 | |
315 <pre><tt> | |
316 byte value | |
317 | |
318 6 0xXX LSB | |
319 7 0xXX | |
320 8 0xXX | |
321 9 0xXX | |
322 10 0xXX | |
323 11 0xXX | |
324 12 0xXX | |
325 13 0xXX MSB | |
326 </tt></pre> | |
327 | |
328 <h4>stream serial number</h4> | |
329 | |
330 <p>Ogg allows for separate logical bitstreams to be mixed at page | |
331 granularity in a physical bitstream. The most common case would be | |
332 sequential arrangement, but it is possible to interleave pages for | |
333 two separate bitstreams to be decoded concurrently. The serial | |
334 number is the means by which pages physical pages are associated with | |
335 a particular logical stream. Each logical stream must have a unique | |
336 serial number within a physical stream:</p> | |
337 | |
338 <pre><tt> | |
339 byte value | |
340 | |
341 14 0xXX LSB | |
342 15 0xXX | |
343 16 0xXX | |
344 17 0xXX MSB | |
345 </tt></pre> | |
346 | |
347 <h4>page sequence no</h4> | |
348 | |
349 <p>Page counter; lets us know if a page is lost (useful where packets | |
350 span page boundaries).</p> | |
351 | |
352 <pre><tt> | |
353 byte value | |
354 | |
355 18 0xXX LSB | |
356 19 0xXX | |
357 20 0xXX | |
358 21 0xXX MSB | |
359 </tt></pre> | |
360 | |
361 <h4>page checksum</h4> | |
362 | |
363 <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0, | |
364 generator polynomial=0x04c11db7). The value is computed over the | |
365 entire header (with the CRC field in the header set to zero) and then | |
366 continued over the page. The CRC field is then filled with the | |
367 computed value.</p> | |
368 | |
369 <p>(A thorough discussion of CRC algorithms can be found in <a | |
370 href="http://www.ross.net/crc/download/crc_v3.txt">"A | |
371 Painless Guide to CRC Error Detection Algorithms"</a> by Ross | |
372 Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p> | |
373 | |
374 <pre><tt> | |
375 byte value | |
376 | |
377 22 0xXX LSB | |
378 23 0xXX | |
379 24 0xXX | |
380 25 0xXX MSB | |
381 </tt></pre> | |
382 | |
383 <h4>page_segments</h4> | |
384 | |
385 <p>The number of segment entries to appear in the segment table. The | |
386 maximum number of 255 segments (255 bytes each) sets the maximum | |
387 possible physical page size at 65307 bytes or just under 64kB (thus | |
388 we know that a header corrupted so as destroy sizing/alignment | |
389 information will not cause a runaway bitstream. We'll read in the | |
390 page according to the corrupted size information that's guaranteed to | |
391 be a reasonable size regardless, notice the checksum mismatch, drop | |
392 sync and then look for recapture).</p> | |
393 | |
394 <pre><tt> | |
395 byte value | |
396 | |
397 26 0x00-0xff (0-255) | |
398 </tt></pre> | |
399 | |
400 <h4>segment_table (containing packet lacing values)</h4> | |
401 | |
402 <p>The lacing values for each packet segment physically appearing in | |
403 this page are listed in contiguous order.</p> | |
404 | |
405 <pre><tt> | |
406 byte value | |
407 | |
408 27 0x00-0xff (0-255) | |
409 [...] | |
410 n 0x00-0xff (0-255, n=page_segments+26) | |
411 </tt></pre> | |
412 | |
413 <p>Total page size is calculated directly from the known header size and | |
414 lacing values in the segment table. Packet data segments follow | |
415 immediately after the header.</p> | |
416 | |
417 <p>Page headers typically impose a flat .25-.5% space overhead assuming | |
418 nominal ~8k page sizes. The segmentation table needed for exact | |
419 packet recovery in the streaming layer adds approximately .5-1% | |
420 nominal assuming expected encoder behavior in the 44.1kHz, 128kbps | |
421 stereo encodings.</p> | |
422 | |
423 <div id="copyright"> | |
424 The Xiph Fish Logo is a | |
425 trademark (™) of Xiph.Org.<br/> | |
426 | |
427 These pages © 1994 - 2005 Xiph.Org. All rights reserved. | |
428 </div> | |
429 | |
430 </body> | |
431 </html> |