annotate src/libvorbis-1.3.3/doc/framing.html @ 1:05aa0afa9217

Bring in flac, ogg, vorbis
author Chris Cannam
date Tue, 19 Mar 2013 17:37:49 +0000
parents
children
rev   line source
Chris@1 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Chris@1 2 <html>
Chris@1 3 <head>
Chris@1 4
Chris@1 5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
Chris@1 6 <title>Ogg Vorbis Documentation</title>
Chris@1 7
Chris@1 8 <style type="text/css">
Chris@1 9 body {
Chris@1 10 margin: 0 18px 0 18px;
Chris@1 11 padding-bottom: 30px;
Chris@1 12 font-family: Verdana, Arial, Helvetica, sans-serif;
Chris@1 13 color: #333333;
Chris@1 14 font-size: .8em;
Chris@1 15 }
Chris@1 16
Chris@1 17 a {
Chris@1 18 color: #3366cc;
Chris@1 19 }
Chris@1 20
Chris@1 21 img {
Chris@1 22 border: 0;
Chris@1 23 }
Chris@1 24
Chris@1 25 #xiphlogo {
Chris@1 26 margin: 30px 0 16px 0;
Chris@1 27 }
Chris@1 28
Chris@1 29 #content p {
Chris@1 30 line-height: 1.4;
Chris@1 31 }
Chris@1 32
Chris@1 33 h1, h1 a, h2, h2 a, h3, h3 a {
Chris@1 34 font-weight: bold;
Chris@1 35 color: #ff9900;
Chris@1 36 margin: 1.3em 0 8px 0;
Chris@1 37 }
Chris@1 38
Chris@1 39 h1 {
Chris@1 40 font-size: 1.3em;
Chris@1 41 }
Chris@1 42
Chris@1 43 h2 {
Chris@1 44 font-size: 1.2em;
Chris@1 45 }
Chris@1 46
Chris@1 47 h3 {
Chris@1 48 font-size: 1.1em;
Chris@1 49 }
Chris@1 50
Chris@1 51 li {
Chris@1 52 line-height: 1.4;
Chris@1 53 }
Chris@1 54
Chris@1 55 #copyright {
Chris@1 56 margin-top: 30px;
Chris@1 57 line-height: 1.5em;
Chris@1 58 text-align: center;
Chris@1 59 font-size: .8em;
Chris@1 60 color: #888888;
Chris@1 61 clear: both;
Chris@1 62 }
Chris@1 63 </style>
Chris@1 64
Chris@1 65 </head>
Chris@1 66
Chris@1 67 <body>
Chris@1 68
Chris@1 69 <div id="xiphlogo">
Chris@1 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a>
Chris@1 71 </div>
Chris@1 72
Chris@1 73 <h1>Ogg logical bitstream framing</h1>
Chris@1 74
Chris@1 75 <h2>Ogg bitstreams</h2>
Chris@1 76
Chris@1 77 <p>The Ogg transport bitstream is designed to provide framing, error
Chris@1 78 protection and seeking structure for higher-level codec streams that
Chris@1 79 consist of raw, unencapsulated data packets, such as the Vorbis audio
Chris@1 80 codec or Theora video codec.</p>
Chris@1 81
Chris@1 82 <h2>Application example: Vorbis</h2>
Chris@1 83
Chris@1 84 <p>Vorbis encodes short-time blocks of PCM data into raw packets of
Chris@1 85 bit-packed data. These raw packets may be used directly by transport
Chris@1 86 mechanisms that provide their own framing and packet-separation
Chris@1 87 mechanisms (such as UDP datagrams). For stream based storage (such as
Chris@1 88 files) and transport (such as TCP streams or pipes), Vorbis uses the
Chris@1 89 Ogg bitstream format to provide framing/sync, sync recapture
Chris@1 90 after error, landmarks during seeking, and enough information to
Chris@1 91 properly separate data back into packets at the original packet
Chris@1 92 boundaries without relying on decoding to find packet boundaries.</p>
Chris@1 93
Chris@1 94 <h2>Design constraints for Ogg bitstreams</h2>
Chris@1 95
Chris@1 96 <ol>
Chris@1 97 <li>True streaming; we must not need to seek to build a 100%
Chris@1 98 complete bitstream.</li>
Chris@1 99 <li>Use no more than approximately 1-2% of bitstream bandwidth for
Chris@1 100 packet boundary marking, high-level framing, sync and seeking.</li>
Chris@1 101 <li>Specification of absolute position within the original sample
Chris@1 102 stream.</li>
Chris@1 103 <li>Simple mechanism to ease limited editing, such as a simplified
Chris@1 104 concatenation mechanism.</li>
Chris@1 105 <li>Detection of corruption, recapture after error and direct, random
Chris@1 106 access to data at arbitrary positions in the bitstream.</li>
Chris@1 107 </ol>
Chris@1 108
Chris@1 109 <h2>Logical and Physical Bitstreams</h2>
Chris@1 110
Chris@1 111 <p>A <em>logical</em> Ogg bitstream is a contiguous stream of
Chris@1 112 sequential pages belonging only to the logical bitstream. A
Chris@1 113 <em>physical</em> Ogg bitstream is constructed from one or more
Chris@1 114 than one logical Ogg bitstream (the simplest physical bitstream
Chris@1 115 is simply a single logical bitstream). We describe below the exact
Chris@1 116 formatting of an Ogg logical bitstream. Combining logical
Chris@1 117 bitstreams into more complex physical bitstreams is described in the
Chris@1 118 <a href="oggstream.html">Ogg bitstream overview</a>. The exact
Chris@1 119 mapping of raw Vorbis packets into a valid Ogg Vorbis physical
Chris@1 120 bitstream is described in the Vorbis I Specification.</p>
Chris@1 121
Chris@1 122 <h2>Bitstream structure</h2>
Chris@1 123
Chris@1 124 <p>An Ogg stream is structured by dividing incoming packets into
Chris@1 125 segments of up to 255 bytes and then wrapping a group of contiguous
Chris@1 126 packet segments into a variable length page preceded by a page
Chris@1 127 header. Both the header size and page size are variable; the page
Chris@1 128 header contains sizing information and checksum data to determine
Chris@1 129 header/page size and data integrity.</p>
Chris@1 130
Chris@1 131 <p>The bitstream is captured (or recaptured) by looking for the beginning
Chris@1 132 of a page, specifically the capture pattern. Once the capture pattern
Chris@1 133 is found, the decoder verifies page sync and integrity by computing
Chris@1 134 and comparing the checksum. At that point, the decoder can extract the
Chris@1 135 packets themselves.</p>
Chris@1 136
Chris@1 137 <h3>Packet segmentation</h3>
Chris@1 138
Chris@1 139 <p>Packets are logically divided into multiple segments before encoding
Chris@1 140 into a page. Note that the segmentation and fragmentation process is a
Chris@1 141 logical one; it's used to compute page header values and the original
Chris@1 142 page data need not be disturbed, even when a packet spans page
Chris@1 143 boundaries.</p>
Chris@1 144
Chris@1 145 <p>The raw packet is logically divided into [n] 255 byte segments and a
Chris@1 146 last fractional segment of &lt; 255 bytes. A packet size may well
Chris@1 147 consist only of the trailing fractional segment, and a fractional
Chris@1 148 segment may be zero length. These values, called "lacing values" are
Chris@1 149 then saved and placed into the header segment table.</p>
Chris@1 150
Chris@1 151 <p>An example should make the basic concept clear:</p>
Chris@1 152
Chris@1 153 <pre>
Chris@1 154 <tt>
Chris@1 155 raw packet:
Chris@1 156 ___________________________________________
Chris@1 157 |______________packet data__________________| 753 bytes
Chris@1 158
Chris@1 159 lacing values for page header segment table: 255,255,243
Chris@1 160 </tt>
Chris@1 161 </pre>
Chris@1 162
Chris@1 163 <p>We simply add the lacing values for the total size; the last lacing
Chris@1 164 value for a packet is always the value that is less than 255. Note
Chris@1 165 that this encoding both avoids imposing a maximum packet size as well
Chris@1 166 as imposing minimum overhead on small packets (as opposed to, eg,
Chris@1 167 simply using two bytes at the head of every packet and having a max
Chris@1 168 packet size of 32k. Small packets (&lt;255, the typical case) are
Chris@1 169 penalized with twice the segmentation overhead). Using the lacing
Chris@1 170 values as suggested, small packets see the minimum possible
Chris@1 171 byte-aligned overheade (1 byte) and large packets, over 512 bytes or
Chris@1 172 so, see a fairly constant ~.5% overhead on encoding space.</p>
Chris@1 173
Chris@1 174 <p>Note that a lacing value of 255 implies that a second lacing value
Chris@1 175 follows in the packet, and a value of &lt; 255 marks the end of the
Chris@1 176 packet after that many additional bytes. A packet of 255 bytes (or a
Chris@1 177 multiple of 255 bytes) is terminated by a lacing value of 0:</p>
Chris@1 178
Chris@1 179 <pre><tt>
Chris@1 180 raw packet:
Chris@1 181 _______________________________
Chris@1 182 |________packet data____________| 255 bytes
Chris@1 183
Chris@1 184 lacing values: 255, 0
Chris@1 185 </tt></pre>
Chris@1 186
Chris@1 187 <p>Note also that a 'nil' (zero length) packet is not an error; it
Chris@1 188 consists of nothing more than a lacing value of zero in the header.</p>
Chris@1 189
Chris@1 190 <h3>Packets spanning pages</h3>
Chris@1 191
Chris@1 192 <p>Packets are not restricted to beginning and ending within a page,
Chris@1 193 although individual segments are, by definition, required to do so.
Chris@1 194 Packets are not restricted to a maximum size, although excessively
Chris@1 195 large packets in the data stream are discouraged; the Ogg
Chris@1 196 bitstream specification strongly recommends nominal page size of
Chris@1 197 approximately 4-8kB (large packets are foreseen as being useful for
Chris@1 198 initialization data at the beginning of a logical bitstream).</p>
Chris@1 199
Chris@1 200 <p>After segmenting a packet, the encoder may decide not to place all the
Chris@1 201 resulting segments into the current page; to do so, the encoder places
Chris@1 202 the lacing values of the segments it wishes to belong to the current
Chris@1 203 page into the current segment table, then finishes the page. The next
Chris@1 204 page is begun with the first value in the segment table belonging to
Chris@1 205 the next packet segment, thus continuing the packet (data in the
Chris@1 206 packet body must also correspond properly to the lacing values in the
Chris@1 207 spanned pages. The segment data in the first packet corresponding to
Chris@1 208 the lacing values of the first page belong in that page; packet
Chris@1 209 segments listed in the segment table of the following page must begin
Chris@1 210 the page body of the subsequent page).</p>
Chris@1 211
Chris@1 212 <p>The last mechanic to spanning a page boundary is to set the header
Chris@1 213 flag in the new page to indicate that the first lacing value in the
Chris@1 214 segment table continues rather than begins a packet; a header flag of
Chris@1 215 0x01 is set to indicate a continued packet. Although mandatory, it
Chris@1 216 is not actually algorithmically necessary; one could inspect the
Chris@1 217 preceding segment table to determine if the packet is new or
Chris@1 218 continued. Adding the information to the packet_header flag allows a
Chris@1 219 simpler design (with no overhead) that needs only inspect the current
Chris@1 220 page header after frame capture. This also allows faster error
Chris@1 221 recovery in the event that the packet originates in a corrupt
Chris@1 222 preceding page, implying that the previous page's segment table
Chris@1 223 cannot be trusted.</p>
Chris@1 224
Chris@1 225 <p>Note that a packet can span an arbitrary number of pages; the above
Chris@1 226 spanning process is repeated for each spanned page boundary. Also a
Chris@1 227 'zero termination' on a packet size that is an even multiple of 255
Chris@1 228 must appear even if the lacing value appears in the next page as a
Chris@1 229 zero-length continuation of the current packet. The header flag
Chris@1 230 should be set to 0x01 to indicate that the packet spanned, even though
Chris@1 231 the span is a nil case as far as data is concerned.</p>
Chris@1 232
Chris@1 233 <p>The encoding looks odd, but is properly optimized for speed and the
Chris@1 234 expected case of the majority of packets being between 50 and 200
Chris@1 235 bytes (note that it is designed such that packets of wildly different
Chris@1 236 sizes can be handled within the model; placing packet size
Chris@1 237 restrictions on the encoder would have only slightly simplified design
Chris@1 238 in page generation and increased overall encoder complexity).</p>
Chris@1 239
Chris@1 240 <p>The main point behind tracking individual packets (and packet
Chris@1 241 segments) is to allow more flexible encoding tricks that requiring
Chris@1 242 explicit knowledge of packet size. An example is simple bandwidth
Chris@1 243 limiting, implemented by simply truncating packets in the nominal case
Chris@1 244 if the packet is arranged so that the least sensitive portion of the
Chris@1 245 data comes last.</p>
Chris@1 246
Chris@1 247 <h3>Page header</h3>
Chris@1 248
Chris@1 249 <p>The headering mechanism is designed to avoid copying and re-assembly
Chris@1 250 of the packet data (ie, making the packet segmentation process a
Chris@1 251 logical one); the header can be generated directly from incoming
Chris@1 252 packet data. The encoder buffers packet data until it finishes a
Chris@1 253 complete page at which point it writes the header followed by the
Chris@1 254 buffered packet segments.</p>
Chris@1 255
Chris@1 256 <h4>capture_pattern</h4>
Chris@1 257
Chris@1 258 <p>A header begins with a capture pattern that simplifies identifying
Chris@1 259 pages; once the decoder has found the capture pattern it can do a more
Chris@1 260 intensive job of verifying that it has in fact found a page boundary
Chris@1 261 (as opposed to an inadvertent coincidence in the byte stream).</p>
Chris@1 262
Chris@1 263 <pre><tt>
Chris@1 264 byte value
Chris@1 265
Chris@1 266 0 0x4f 'O'
Chris@1 267 1 0x67 'g'
Chris@1 268 2 0x67 'g'
Chris@1 269 3 0x53 'S'
Chris@1 270 </tt></pre>
Chris@1 271
Chris@1 272 <h4>stream_structure_version</h4>
Chris@1 273
Chris@1 274 <p>The capture pattern is followed by the stream structure revision:</p>
Chris@1 275
Chris@1 276 <pre><tt>
Chris@1 277 byte value
Chris@1 278
Chris@1 279 4 0x00
Chris@1 280 </tt></pre>
Chris@1 281
Chris@1 282 <h4>header_type_flag</h4>
Chris@1 283
Chris@1 284 <p>The header type flag identifies this page's context in the bitstream:</p>
Chris@1 285
Chris@1 286 <pre><tt>
Chris@1 287 byte value
Chris@1 288
Chris@1 289 5 bitflags: 0x01: unset = fresh packet
Chris@1 290 set = continued packet
Chris@1 291 0x02: unset = not first page of logical bitstream
Chris@1 292 set = first page of logical bitstream (bos)
Chris@1 293 0x04: unset = not last page of logical bitstream
Chris@1 294 set = last page of logical bitstream (eos)
Chris@1 295 </tt></pre>
Chris@1 296
Chris@1 297 <h4>absolute granule position</h4>
Chris@1 298
Chris@1 299 <p>(This is packed in the same way the rest of Ogg data is packed; LSb
Chris@1 300 of LSB first. Note that the 'position' data specifies a 'sample'
Chris@1 301 number (eg, in a CD quality sample is four octets, 16 bits for left
Chris@1 302 and 16 bits for right; in video it would likely be the frame number.
Chris@1 303 It is up to the specific codec in use to define the semantic meaning
Chris@1 304 of the granule position value). The position specified is the total
Chris@1 305 samples encoded after including all packets finished on this page
Chris@1 306 (packets begun on this page but continuing on to the next page do not
Chris@1 307 count). The rationale here is that the position specified in the
Chris@1 308 frame header of the last page tells how long the data coded by the
Chris@1 309 bitstream is. A truncated stream will still return the proper number
Chris@1 310 of samples that can be decoded fully.</p>
Chris@1 311
Chris@1 312 <p>A special value of '-1' (in two's complement) indicates that no packets
Chris@1 313 finish on this page.</p>
Chris@1 314
Chris@1 315 <pre><tt>
Chris@1 316 byte value
Chris@1 317
Chris@1 318 6 0xXX LSB
Chris@1 319 7 0xXX
Chris@1 320 8 0xXX
Chris@1 321 9 0xXX
Chris@1 322 10 0xXX
Chris@1 323 11 0xXX
Chris@1 324 12 0xXX
Chris@1 325 13 0xXX MSB
Chris@1 326 </tt></pre>
Chris@1 327
Chris@1 328 <h4>stream serial number</h4>
Chris@1 329
Chris@1 330 <p>Ogg allows for separate logical bitstreams to be mixed at page
Chris@1 331 granularity in a physical bitstream. The most common case would be
Chris@1 332 sequential arrangement, but it is possible to interleave pages for
Chris@1 333 two separate bitstreams to be decoded concurrently. The serial
Chris@1 334 number is the means by which pages physical pages are associated with
Chris@1 335 a particular logical stream. Each logical stream must have a unique
Chris@1 336 serial number within a physical stream:</p>
Chris@1 337
Chris@1 338 <pre><tt>
Chris@1 339 byte value
Chris@1 340
Chris@1 341 14 0xXX LSB
Chris@1 342 15 0xXX
Chris@1 343 16 0xXX
Chris@1 344 17 0xXX MSB
Chris@1 345 </tt></pre>
Chris@1 346
Chris@1 347 <h4>page sequence no</h4>
Chris@1 348
Chris@1 349 <p>Page counter; lets us know if a page is lost (useful where packets
Chris@1 350 span page boundaries).</p>
Chris@1 351
Chris@1 352 <pre><tt>
Chris@1 353 byte value
Chris@1 354
Chris@1 355 18 0xXX LSB
Chris@1 356 19 0xXX
Chris@1 357 20 0xXX
Chris@1 358 21 0xXX MSB
Chris@1 359 </tt></pre>
Chris@1 360
Chris@1 361 <h4>page checksum</h4>
Chris@1 362
Chris@1 363 <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0,
Chris@1 364 generator polynomial=0x04c11db7). The value is computed over the
Chris@1 365 entire header (with the CRC field in the header set to zero) and then
Chris@1 366 continued over the page. The CRC field is then filled with the
Chris@1 367 computed value.</p>
Chris@1 368
Chris@1 369 <p>(A thorough discussion of CRC algorithms can be found in <a
Chris@1 370 href="http://www.ross.net/crc/download/crc_v3.txt">"A
Chris@1 371 Painless Guide to CRC Error Detection Algorithms"</a> by Ross
Chris@1 372 Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p>
Chris@1 373
Chris@1 374 <pre><tt>
Chris@1 375 byte value
Chris@1 376
Chris@1 377 22 0xXX LSB
Chris@1 378 23 0xXX
Chris@1 379 24 0xXX
Chris@1 380 25 0xXX MSB
Chris@1 381 </tt></pre>
Chris@1 382
Chris@1 383 <h4>page_segments</h4>
Chris@1 384
Chris@1 385 <p>The number of segment entries to appear in the segment table. The
Chris@1 386 maximum number of 255 segments (255 bytes each) sets the maximum
Chris@1 387 possible physical page size at 65307 bytes or just under 64kB (thus
Chris@1 388 we know that a header corrupted so as destroy sizing/alignment
Chris@1 389 information will not cause a runaway bitstream. We'll read in the
Chris@1 390 page according to the corrupted size information that's guaranteed to
Chris@1 391 be a reasonable size regardless, notice the checksum mismatch, drop
Chris@1 392 sync and then look for recapture).</p>
Chris@1 393
Chris@1 394 <pre><tt>
Chris@1 395 byte value
Chris@1 396
Chris@1 397 26 0x00-0xff (0-255)
Chris@1 398 </tt></pre>
Chris@1 399
Chris@1 400 <h4>segment_table (containing packet lacing values)</h4>
Chris@1 401
Chris@1 402 <p>The lacing values for each packet segment physically appearing in
Chris@1 403 this page are listed in contiguous order.</p>
Chris@1 404
Chris@1 405 <pre><tt>
Chris@1 406 byte value
Chris@1 407
Chris@1 408 27 0x00-0xff (0-255)
Chris@1 409 [...]
Chris@1 410 n 0x00-0xff (0-255, n=page_segments+26)
Chris@1 411 </tt></pre>
Chris@1 412
Chris@1 413 <p>Total page size is calculated directly from the known header size and
Chris@1 414 lacing values in the segment table. Packet data segments follow
Chris@1 415 immediately after the header.</p>
Chris@1 416
Chris@1 417 <p>Page headers typically impose a flat .25-.5% space overhead assuming
Chris@1 418 nominal ~8k page sizes. The segmentation table needed for exact
Chris@1 419 packet recovery in the streaming layer adds approximately .5-1%
Chris@1 420 nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
Chris@1 421 stereo encodings.</p>
Chris@1 422
Chris@1 423 <div id="copyright">
Chris@1 424 The Xiph Fish Logo is a
Chris@1 425 trademark (&trade;) of Xiph.Org.<br/>
Chris@1 426
Chris@1 427 These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
Chris@1 428 </div>
Chris@1 429
Chris@1 430 </body>
Chris@1 431 </html>