annotate src/libogg-1.3.0/doc/framing.html @ 86:98c1576536ae

Bring in flac, ogg, vorbis
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 19 Mar 2013 17:37:49 +0000
parents
children
rev   line source
cannam@86 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
cannam@86 2 <html>
cannam@86 3 <head>
cannam@86 4
cannam@86 5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
cannam@86 6 <title>Ogg Documentation</title>
cannam@86 7
cannam@86 8 <style type="text/css">
cannam@86 9 body {
cannam@86 10 margin: 0 18px 0 18px;
cannam@86 11 padding-bottom: 30px;
cannam@86 12 font-family: Verdana, Arial, Helvetica, sans-serif;
cannam@86 13 color: #333333;
cannam@86 14 font-size: .8em;
cannam@86 15 }
cannam@86 16
cannam@86 17 a {
cannam@86 18 color: #3366cc;
cannam@86 19 }
cannam@86 20
cannam@86 21 img {
cannam@86 22 border: 0;
cannam@86 23 }
cannam@86 24
cannam@86 25 #xiphlogo {
cannam@86 26 margin: 30px 0 16px 0;
cannam@86 27 }
cannam@86 28
cannam@86 29 #content p {
cannam@86 30 line-height: 1.4;
cannam@86 31 }
cannam@86 32
cannam@86 33 h1, h1 a, h2, h2 a, h3, h3 a {
cannam@86 34 font-weight: bold;
cannam@86 35 color: #ff9900;
cannam@86 36 margin: 1.3em 0 8px 0;
cannam@86 37 }
cannam@86 38
cannam@86 39 h1 {
cannam@86 40 font-size: 1.3em;
cannam@86 41 }
cannam@86 42
cannam@86 43 h2 {
cannam@86 44 font-size: 1.2em;
cannam@86 45 }
cannam@86 46
cannam@86 47 h3 {
cannam@86 48 font-size: 1.1em;
cannam@86 49 }
cannam@86 50
cannam@86 51 li {
cannam@86 52 line-height: 1.4;
cannam@86 53 }
cannam@86 54
cannam@86 55 #copyright {
cannam@86 56 margin-top: 30px;
cannam@86 57 line-height: 1.5em;
cannam@86 58 text-align: center;
cannam@86 59 font-size: .8em;
cannam@86 60 color: #888888;
cannam@86 61 clear: both;
cannam@86 62 }
cannam@86 63 </style>
cannam@86 64
cannam@86 65 </head>
cannam@86 66
cannam@86 67 <body>
cannam@86 68
cannam@86 69 <div id="xiphlogo">
cannam@86 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
cannam@86 71 </div>
cannam@86 72
cannam@86 73 <h1>Ogg logical bitstream framing</h1>
cannam@86 74
cannam@86 75 <h2>Ogg bitstreams</h2>
cannam@86 76
cannam@86 77 <p>The Ogg transport bitstream is designed to provide framing, error
cannam@86 78 protection and seeking structure for higher-level codec streams that
cannam@86 79 consist of raw, unencapsulated data packets, such as the Vorbis audio
cannam@86 80 codec or Theora video codec.</p>
cannam@86 81
cannam@86 82 <h2>Application example: Vorbis</h2>
cannam@86 83
cannam@86 84 <p>Vorbis encodes short-time blocks of PCM data into raw packets of
cannam@86 85 bit-packed data. These raw packets may be used directly by transport
cannam@86 86 mechanisms that provide their own framing and packet-separation
cannam@86 87 mechanisms (such as UDP datagrams). For stream based storage (such as
cannam@86 88 files) and transport (such as TCP streams or pipes), Vorbis uses the
cannam@86 89 Ogg bitstream format to provide framing/sync, sync recapture
cannam@86 90 after error, landmarks during seeking, and enough information to
cannam@86 91 properly separate data back into packets at the original packet
cannam@86 92 boundaries without relying on decoding to find packet boundaries.</p>
cannam@86 93
cannam@86 94 <h2>Design constraints for Ogg bitstreams</h2>
cannam@86 95
cannam@86 96 <ol>
cannam@86 97 <li>True streaming; we must not need to seek to build a 100%
cannam@86 98 complete bitstream.</li>
cannam@86 99 <li>Use no more than approximately 1-2% of bitstream bandwidth for
cannam@86 100 packet boundary marking, high-level framing, sync and seeking.</li>
cannam@86 101 <li>Specification of absolute position within the original sample
cannam@86 102 stream.</li>
cannam@86 103 <li>Simple mechanism to ease limited editing, such as a simplified
cannam@86 104 concatenation mechanism.</li>
cannam@86 105 <li>Detection of corruption, recapture after error and direct, random
cannam@86 106 access to data at arbitrary positions in the bitstream.</li>
cannam@86 107 </ol>
cannam@86 108
cannam@86 109 <h2>Logical and Physical Bitstreams</h2>
cannam@86 110
cannam@86 111 <p>A <em>logical</em> Ogg bitstream is a contiguous stream of
cannam@86 112 sequential pages belonging only to the logical bitstream. A
cannam@86 113 <em>physical</em> Ogg bitstream is constructed from one or more
cannam@86 114 than one logical Ogg bitstream (the simplest physical bitstream
cannam@86 115 is simply a single logical bitstream). We describe below the exact
cannam@86 116 formatting of an Ogg logical bitstream. Combining logical
cannam@86 117 bitstreams into more complex physical bitstreams is described in the
cannam@86 118 <a href="oggstream.html">Ogg bitstream overview</a>. The exact
cannam@86 119 mapping of raw Vorbis packets into a valid Ogg Vorbis physical
cannam@86 120 bitstream is described in the Vorbis I Specification.</p>
cannam@86 121
cannam@86 122 <h2>Bitstream structure</h2>
cannam@86 123
cannam@86 124 <p>An Ogg stream is structured by dividing incoming packets into
cannam@86 125 segments of up to 255 bytes and then wrapping a group of contiguous
cannam@86 126 packet segments into a variable length page preceded by a page
cannam@86 127 header. Both the header size and page size are variable; the page
cannam@86 128 header contains sizing information and checksum data to determine
cannam@86 129 header/page size and data integrity.</p>
cannam@86 130
cannam@86 131 <p>The bitstream is captured (or recaptured) by looking for the beginning
cannam@86 132 of a page, specifically the capture pattern. Once the capture pattern
cannam@86 133 is found, the decoder verifies page sync and integrity by computing
cannam@86 134 and comparing the checksum. At that point, the decoder can extract the
cannam@86 135 packets themselves.</p>
cannam@86 136
cannam@86 137 <h3>Packet segmentation</h3>
cannam@86 138
cannam@86 139 <p>Packets are logically divided into multiple segments before encoding
cannam@86 140 into a page. Note that the segmentation and fragmentation process is a
cannam@86 141 logical one; it's used to compute page header values and the original
cannam@86 142 page data need not be disturbed, even when a packet spans page
cannam@86 143 boundaries.</p>
cannam@86 144
cannam@86 145 <p>The raw packet is logically divided into [n] 255 byte segments and a
cannam@86 146 last fractional segment of &lt; 255 bytes. A packet size may well
cannam@86 147 consist only of the trailing fractional segment, and a fractional
cannam@86 148 segment may be zero length. These values, called "lacing values" are
cannam@86 149 then saved and placed into the header segment table.</p>
cannam@86 150
cannam@86 151 <p>An example should make the basic concept clear:</p>
cannam@86 152
cannam@86 153 <pre>
cannam@86 154 <tt>
cannam@86 155 raw packet:
cannam@86 156 ___________________________________________
cannam@86 157 |______________packet data__________________| 753 bytes
cannam@86 158
cannam@86 159 lacing values for page header segment table: 255,255,243
cannam@86 160 </tt>
cannam@86 161 </pre>
cannam@86 162
cannam@86 163 <p>We simply add the lacing values for the total size; the last lacing
cannam@86 164 value for a packet is always the value that is less than 255. Note
cannam@86 165 that this encoding both avoids imposing a maximum packet size as well
cannam@86 166 as imposing minimum overhead on small packets (as opposed to, eg,
cannam@86 167 simply using two bytes at the head of every packet and having a max
cannam@86 168 packet size of 32k. Small packets (&lt;255, the typical case) are
cannam@86 169 penalized with twice the segmentation overhead). Using the lacing
cannam@86 170 values as suggested, small packets see the minimum possible
cannam@86 171 byte-aligned overhead (1 byte) and large packets, over 512 bytes or
cannam@86 172 so, see a fairly constant ~.5% overhead on encoding space.</p>
cannam@86 173
cannam@86 174 <p>Note that a lacing value of 255 implies that a second lacing value
cannam@86 175 follows in the packet, and a value of &lt; 255 marks the end of the
cannam@86 176 packet after that many additional bytes. A packet of 255 bytes (or a
cannam@86 177 multiple of 255 bytes) is terminated by a lacing value of 0:</p>
cannam@86 178
cannam@86 179 <pre><tt>
cannam@86 180 raw packet:
cannam@86 181 _______________________________
cannam@86 182 |________packet data____________| 255 bytes
cannam@86 183
cannam@86 184 lacing values: 255, 0
cannam@86 185 </tt></pre>
cannam@86 186
cannam@86 187 <p>Note also that a 'nil' (zero length) packet is not an error; it
cannam@86 188 consists of nothing more than a lacing value of zero in the header.</p>
cannam@86 189
cannam@86 190 <h3>Packets spanning pages</h3>
cannam@86 191
cannam@86 192 <p>Packets are not restricted to beginning and ending within a page,
cannam@86 193 although individual segments are, by definition, required to do so.
cannam@86 194 Packets are not restricted to a maximum size, although excessively
cannam@86 195 large packets in the data stream are discouraged.</p>
cannam@86 196
cannam@86 197 <p>After segmenting a packet, the encoder may decide not to place all the
cannam@86 198 resulting segments into the current page; to do so, the encoder places
cannam@86 199 the lacing values of the segments it wishes to belong to the current
cannam@86 200 page into the current segment table, then finishes the page. The next
cannam@86 201 page is begun with the first value in the segment table belonging to
cannam@86 202 the next packet segment, thus continuing the packet (data in the
cannam@86 203 packet body must also correspond properly to the lacing values in the
cannam@86 204 spanned pages. The segment data in the first packet corresponding to
cannam@86 205 the lacing values of the first page belong in that page; packet
cannam@86 206 segments listed in the segment table of the following page must begin
cannam@86 207 the page body of the subsequent page).</p>
cannam@86 208
cannam@86 209 <p>The last mechanic to spanning a page boundary is to set the header
cannam@86 210 flag in the new page to indicate that the first lacing value in the
cannam@86 211 segment table continues rather than begins a packet; a header flag of
cannam@86 212 0x01 is set to indicate a continued packet. Although mandatory, it
cannam@86 213 is not actually algorithmically necessary; one could inspect the
cannam@86 214 preceding segment table to determine if the packet is new or
cannam@86 215 continued. Adding the information to the packet_header flag allows a
cannam@86 216 simpler design (with no overhead) that needs only inspect the current
cannam@86 217 page header after frame capture. This also allows faster error
cannam@86 218 recovery in the event that the packet originates in a corrupt
cannam@86 219 preceding page, implying that the previous page's segment table
cannam@86 220 cannot be trusted.</p>
cannam@86 221
cannam@86 222 <p>Note that a packet can span an arbitrary number of pages; the above
cannam@86 223 spanning process is repeated for each spanned page boundary. Also a
cannam@86 224 'zero termination' on a packet size that is an even multiple of 255
cannam@86 225 must appear even if the lacing value appears in the next page as a
cannam@86 226 zero-length continuation of the current packet. The header flag
cannam@86 227 should be set to 0x01 to indicate that the packet spanned, even though
cannam@86 228 the span is a nil case as far as data is concerned.</p>
cannam@86 229
cannam@86 230 <p>The encoding looks odd, but is properly optimized for speed and the
cannam@86 231 expected case of the majority of packets being between 50 and 200
cannam@86 232 bytes (note that it is designed such that packets of wildly different
cannam@86 233 sizes can be handled within the model; placing packet size
cannam@86 234 restrictions on the encoder would have only slightly simplified design
cannam@86 235 in page generation and increased overall encoder complexity).</p>
cannam@86 236
cannam@86 237 <p>The main point behind tracking individual packets (and packet
cannam@86 238 segments) is to allow more flexible encoding tricks that requiring
cannam@86 239 explicit knowledge of packet size. An example is simple bandwidth
cannam@86 240 limiting, implemented by simply truncating packets in the nominal case
cannam@86 241 if the packet is arranged so that the least sensitive portion of the
cannam@86 242 data comes last.</p>
cannam@86 243
cannam@86 244 <a name="page_header">
cannam@86 245 <h3>Page header</h3>
cannam@86 246
cannam@86 247 <p>The headering mechanism is designed to avoid copying and re-assembly
cannam@86 248 of the packet data (ie, making the packet segmentation process a
cannam@86 249 logical one); the header can be generated directly from incoming
cannam@86 250 packet data. The encoder buffers packet data until it finishes a
cannam@86 251 complete page at which point it writes the header followed by the
cannam@86 252 buffered packet segments.</p>
cannam@86 253
cannam@86 254 <h4>capture_pattern</h4>
cannam@86 255
cannam@86 256 <p>A header begins with a capture pattern that simplifies identifying
cannam@86 257 pages; once the decoder has found the capture pattern it can do a more
cannam@86 258 intensive job of verifying that it has in fact found a page boundary
cannam@86 259 (as opposed to an inadvertent coincidence in the byte stream).</p>
cannam@86 260
cannam@86 261 <pre><tt>
cannam@86 262 byte value
cannam@86 263
cannam@86 264 0 0x4f 'O'
cannam@86 265 1 0x67 'g'
cannam@86 266 2 0x67 'g'
cannam@86 267 3 0x53 'S'
cannam@86 268 </tt></pre>
cannam@86 269
cannam@86 270 <h4>stream_structure_version</h4>
cannam@86 271
cannam@86 272 <p>The capture pattern is followed by the stream structure revision:</p>
cannam@86 273
cannam@86 274 <pre><tt>
cannam@86 275 byte value
cannam@86 276
cannam@86 277 4 0x00
cannam@86 278 </tt></pre>
cannam@86 279
cannam@86 280 <h4>header_type_flag</h4>
cannam@86 281
cannam@86 282 <p>The header type flag identifies this page's context in the bitstream:</p>
cannam@86 283
cannam@86 284 <pre><tt>
cannam@86 285 byte value
cannam@86 286
cannam@86 287 5 bitflags: 0x01: unset = fresh packet
cannam@86 288 set = continued packet
cannam@86 289 0x02: unset = not first page of logical bitstream
cannam@86 290 set = first page of logical bitstream (bos)
cannam@86 291 0x04: unset = not last page of logical bitstream
cannam@86 292 set = last page of logical bitstream (eos)
cannam@86 293 </tt></pre>
cannam@86 294
cannam@86 295 <h4>absolute granule position</h4>
cannam@86 296
cannam@86 297 <p>(This is packed in the same way the rest of Ogg data is packed; LSb
cannam@86 298 of LSB first. Note that the 'position' data specifies a 'sample'
cannam@86 299 number (eg, in a CD quality sample is four octets, 16 bits for left
cannam@86 300 and 16 bits for right; in video it would likely be the frame number.
cannam@86 301 It is up to the specific codec in use to define the semantic meaning
cannam@86 302 of the granule position value). The position specified is the total
cannam@86 303 samples encoded after including all packets finished on this page
cannam@86 304 (packets begun on this page but continuing on to the next page do not
cannam@86 305 count). The rationale here is that the position specified in the
cannam@86 306 frame header of the last page tells how long the data coded by the
cannam@86 307 bitstream is. A truncated stream will still return the proper number
cannam@86 308 of samples that can be decoded fully.</p>
cannam@86 309
cannam@86 310 <p>A special value of '-1' (in two's complement) indicates that no packets
cannam@86 311 finish on this page.</p>
cannam@86 312
cannam@86 313 <pre><tt>
cannam@86 314 byte value
cannam@86 315
cannam@86 316 6 0xXX LSB
cannam@86 317 7 0xXX
cannam@86 318 8 0xXX
cannam@86 319 9 0xXX
cannam@86 320 10 0xXX
cannam@86 321 11 0xXX
cannam@86 322 12 0xXX
cannam@86 323 13 0xXX MSB
cannam@86 324 </tt></pre>
cannam@86 325
cannam@86 326 <h4>stream serial number</h4>
cannam@86 327
cannam@86 328 <p>Ogg allows for separate logical bitstreams to be mixed at page
cannam@86 329 granularity in a physical bitstream. The most common case would be
cannam@86 330 sequential arrangement, but it is possible to interleave pages for
cannam@86 331 two separate bitstreams to be decoded concurrently. The serial
cannam@86 332 number is the means by which pages physical pages are associated with
cannam@86 333 a particular logical stream. Each logical stream must have a unique
cannam@86 334 serial number within a physical stream:</p>
cannam@86 335
cannam@86 336 <pre><tt>
cannam@86 337 byte value
cannam@86 338
cannam@86 339 14 0xXX LSB
cannam@86 340 15 0xXX
cannam@86 341 16 0xXX
cannam@86 342 17 0xXX MSB
cannam@86 343 </tt></pre>
cannam@86 344
cannam@86 345 <h4>page sequence no</h4>
cannam@86 346
cannam@86 347 <p>Page counter; lets us know if a page is lost (useful where packets
cannam@86 348 span page boundaries).</p>
cannam@86 349
cannam@86 350 <pre><tt>
cannam@86 351 byte value
cannam@86 352
cannam@86 353 18 0xXX LSB
cannam@86 354 19 0xXX
cannam@86 355 20 0xXX
cannam@86 356 21 0xXX MSB
cannam@86 357 </tt></pre>
cannam@86 358
cannam@86 359 <h4>page checksum</h4>
cannam@86 360
cannam@86 361 <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0,
cannam@86 362 generator polynomial=0x04c11db7). The value is computed over the
cannam@86 363 entire header (with the CRC field in the header set to zero) and then
cannam@86 364 continued over the page. The CRC field is then filled with the
cannam@86 365 computed value.</p>
cannam@86 366
cannam@86 367 <p>(A thorough discussion of CRC algorithms can be found in <a
cannam@86 368 href="http://www.ross.net/crc/download/crc_v3.txt">"A
cannam@86 369 Painless Guide to CRC Error Detection Algorithms"</a> by Ross
cannam@86 370 Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p>
cannam@86 371
cannam@86 372 <pre><tt>
cannam@86 373 byte value
cannam@86 374
cannam@86 375 22 0xXX LSB
cannam@86 376 23 0xXX
cannam@86 377 24 0xXX
cannam@86 378 25 0xXX MSB
cannam@86 379 </tt></pre>
cannam@86 380
cannam@86 381 <h4>page_segments</h4>
cannam@86 382
cannam@86 383 <p>The number of segment entries to appear in the segment table. The
cannam@86 384 maximum number of 255 segments (255 bytes each) sets the maximum
cannam@86 385 possible physical page size at 65307 bytes or just under 64kB (thus
cannam@86 386 we know that a header corrupted so as destroy sizing/alignment
cannam@86 387 information will not cause a runaway bitstream. We'll read in the
cannam@86 388 page according to the corrupted size information that's guaranteed to
cannam@86 389 be a reasonable size regardless, notice the checksum mismatch, drop
cannam@86 390 sync and then look for recapture).</p>
cannam@86 391
cannam@86 392 <pre><tt>
cannam@86 393 byte value
cannam@86 394
cannam@86 395 26 0x00-0xff (0-255)
cannam@86 396 </tt></pre>
cannam@86 397
cannam@86 398 <h4>segment_table (containing packet lacing values)</h4>
cannam@86 399
cannam@86 400 <p>The lacing values for each packet segment physically appearing in
cannam@86 401 this page are listed in contiguous order.</p>
cannam@86 402
cannam@86 403 <pre><tt>
cannam@86 404 byte value
cannam@86 405
cannam@86 406 27 0x00-0xff (0-255)
cannam@86 407 [...]
cannam@86 408 n 0x00-0xff (0-255, n=page_segments+26)
cannam@86 409 </tt></pre>
cannam@86 410
cannam@86 411 <p>Total page size is calculated directly from the known header size and
cannam@86 412 lacing values in the segment table. Packet data segments follow
cannam@86 413 immediately after the header.</p>
cannam@86 414
cannam@86 415 <p>Page headers typically impose a flat .25-.5% space overhead assuming
cannam@86 416 nominal ~8k page sizes. The segmentation table needed for exact
cannam@86 417 packet recovery in the streaming layer adds approximately .5-1%
cannam@86 418 nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
cannam@86 419 stereo encodings.</p>
cannam@86 420
cannam@86 421 <div id="copyright">
cannam@86 422 The Xiph Fish Logo is a
cannam@86 423 trademark (&trade;) of Xiph.Org.<br/>
cannam@86 424
cannam@86 425 These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
cannam@86 426 </div>
cannam@86 427
cannam@86 428 </body>
cannam@86 429 </html>