annotate src/libvorbis-1.3.3/doc/framing.html @ 86:98c1576536ae

Bring in flac, ogg, vorbis
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 19 Mar 2013 17:37:49 +0000
parents
children
rev   line source
cannam@86 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
cannam@86 2 <html>
cannam@86 3 <head>
cannam@86 4
cannam@86 5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
cannam@86 6 <title>Ogg Vorbis Documentation</title>
cannam@86 7
cannam@86 8 <style type="text/css">
cannam@86 9 body {
cannam@86 10 margin: 0 18px 0 18px;
cannam@86 11 padding-bottom: 30px;
cannam@86 12 font-family: Verdana, Arial, Helvetica, sans-serif;
cannam@86 13 color: #333333;
cannam@86 14 font-size: .8em;
cannam@86 15 }
cannam@86 16
cannam@86 17 a {
cannam@86 18 color: #3366cc;
cannam@86 19 }
cannam@86 20
cannam@86 21 img {
cannam@86 22 border: 0;
cannam@86 23 }
cannam@86 24
cannam@86 25 #xiphlogo {
cannam@86 26 margin: 30px 0 16px 0;
cannam@86 27 }
cannam@86 28
cannam@86 29 #content p {
cannam@86 30 line-height: 1.4;
cannam@86 31 }
cannam@86 32
cannam@86 33 h1, h1 a, h2, h2 a, h3, h3 a {
cannam@86 34 font-weight: bold;
cannam@86 35 color: #ff9900;
cannam@86 36 margin: 1.3em 0 8px 0;
cannam@86 37 }
cannam@86 38
cannam@86 39 h1 {
cannam@86 40 font-size: 1.3em;
cannam@86 41 }
cannam@86 42
cannam@86 43 h2 {
cannam@86 44 font-size: 1.2em;
cannam@86 45 }
cannam@86 46
cannam@86 47 h3 {
cannam@86 48 font-size: 1.1em;
cannam@86 49 }
cannam@86 50
cannam@86 51 li {
cannam@86 52 line-height: 1.4;
cannam@86 53 }
cannam@86 54
cannam@86 55 #copyright {
cannam@86 56 margin-top: 30px;
cannam@86 57 line-height: 1.5em;
cannam@86 58 text-align: center;
cannam@86 59 font-size: .8em;
cannam@86 60 color: #888888;
cannam@86 61 clear: both;
cannam@86 62 }
cannam@86 63 </style>
cannam@86 64
cannam@86 65 </head>
cannam@86 66
cannam@86 67 <body>
cannam@86 68
cannam@86 69 <div id="xiphlogo">
cannam@86 70 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.Org"/></a>
cannam@86 71 </div>
cannam@86 72
cannam@86 73 <h1>Ogg logical bitstream framing</h1>
cannam@86 74
cannam@86 75 <h2>Ogg bitstreams</h2>
cannam@86 76
cannam@86 77 <p>The Ogg transport bitstream is designed to provide framing, error
cannam@86 78 protection and seeking structure for higher-level codec streams that
cannam@86 79 consist of raw, unencapsulated data packets, such as the Vorbis audio
cannam@86 80 codec or Theora video codec.</p>
cannam@86 81
cannam@86 82 <h2>Application example: Vorbis</h2>
cannam@86 83
cannam@86 84 <p>Vorbis encodes short-time blocks of PCM data into raw packets of
cannam@86 85 bit-packed data. These raw packets may be used directly by transport
cannam@86 86 mechanisms that provide their own framing and packet-separation
cannam@86 87 mechanisms (such as UDP datagrams). For stream based storage (such as
cannam@86 88 files) and transport (such as TCP streams or pipes), Vorbis uses the
cannam@86 89 Ogg bitstream format to provide framing/sync, sync recapture
cannam@86 90 after error, landmarks during seeking, and enough information to
cannam@86 91 properly separate data back into packets at the original packet
cannam@86 92 boundaries without relying on decoding to find packet boundaries.</p>
cannam@86 93
cannam@86 94 <h2>Design constraints for Ogg bitstreams</h2>
cannam@86 95
cannam@86 96 <ol>
cannam@86 97 <li>True streaming; we must not need to seek to build a 100%
cannam@86 98 complete bitstream.</li>
cannam@86 99 <li>Use no more than approximately 1-2% of bitstream bandwidth for
cannam@86 100 packet boundary marking, high-level framing, sync and seeking.</li>
cannam@86 101 <li>Specification of absolute position within the original sample
cannam@86 102 stream.</li>
cannam@86 103 <li>Simple mechanism to ease limited editing, such as a simplified
cannam@86 104 concatenation mechanism.</li>
cannam@86 105 <li>Detection of corruption, recapture after error and direct, random
cannam@86 106 access to data at arbitrary positions in the bitstream.</li>
cannam@86 107 </ol>
cannam@86 108
cannam@86 109 <h2>Logical and Physical Bitstreams</h2>
cannam@86 110
cannam@86 111 <p>A <em>logical</em> Ogg bitstream is a contiguous stream of
cannam@86 112 sequential pages belonging only to the logical bitstream. A
cannam@86 113 <em>physical</em> Ogg bitstream is constructed from one or more
cannam@86 114 than one logical Ogg bitstream (the simplest physical bitstream
cannam@86 115 is simply a single logical bitstream). We describe below the exact
cannam@86 116 formatting of an Ogg logical bitstream. Combining logical
cannam@86 117 bitstreams into more complex physical bitstreams is described in the
cannam@86 118 <a href="oggstream.html">Ogg bitstream overview</a>. The exact
cannam@86 119 mapping of raw Vorbis packets into a valid Ogg Vorbis physical
cannam@86 120 bitstream is described in the Vorbis I Specification.</p>
cannam@86 121
cannam@86 122 <h2>Bitstream structure</h2>
cannam@86 123
cannam@86 124 <p>An Ogg stream is structured by dividing incoming packets into
cannam@86 125 segments of up to 255 bytes and then wrapping a group of contiguous
cannam@86 126 packet segments into a variable length page preceded by a page
cannam@86 127 header. Both the header size and page size are variable; the page
cannam@86 128 header contains sizing information and checksum data to determine
cannam@86 129 header/page size and data integrity.</p>
cannam@86 130
cannam@86 131 <p>The bitstream is captured (or recaptured) by looking for the beginning
cannam@86 132 of a page, specifically the capture pattern. Once the capture pattern
cannam@86 133 is found, the decoder verifies page sync and integrity by computing
cannam@86 134 and comparing the checksum. At that point, the decoder can extract the
cannam@86 135 packets themselves.</p>
cannam@86 136
cannam@86 137 <h3>Packet segmentation</h3>
cannam@86 138
cannam@86 139 <p>Packets are logically divided into multiple segments before encoding
cannam@86 140 into a page. Note that the segmentation and fragmentation process is a
cannam@86 141 logical one; it's used to compute page header values and the original
cannam@86 142 page data need not be disturbed, even when a packet spans page
cannam@86 143 boundaries.</p>
cannam@86 144
cannam@86 145 <p>The raw packet is logically divided into [n] 255 byte segments and a
cannam@86 146 last fractional segment of &lt; 255 bytes. A packet size may well
cannam@86 147 consist only of the trailing fractional segment, and a fractional
cannam@86 148 segment may be zero length. These values, called "lacing values" are
cannam@86 149 then saved and placed into the header segment table.</p>
cannam@86 150
cannam@86 151 <p>An example should make the basic concept clear:</p>
cannam@86 152
cannam@86 153 <pre>
cannam@86 154 <tt>
cannam@86 155 raw packet:
cannam@86 156 ___________________________________________
cannam@86 157 |______________packet data__________________| 753 bytes
cannam@86 158
cannam@86 159 lacing values for page header segment table: 255,255,243
cannam@86 160 </tt>
cannam@86 161 </pre>
cannam@86 162
cannam@86 163 <p>We simply add the lacing values for the total size; the last lacing
cannam@86 164 value for a packet is always the value that is less than 255. Note
cannam@86 165 that this encoding both avoids imposing a maximum packet size as well
cannam@86 166 as imposing minimum overhead on small packets (as opposed to, eg,
cannam@86 167 simply using two bytes at the head of every packet and having a max
cannam@86 168 packet size of 32k. Small packets (&lt;255, the typical case) are
cannam@86 169 penalized with twice the segmentation overhead). Using the lacing
cannam@86 170 values as suggested, small packets see the minimum possible
cannam@86 171 byte-aligned overheade (1 byte) and large packets, over 512 bytes or
cannam@86 172 so, see a fairly constant ~.5% overhead on encoding space.</p>
cannam@86 173
cannam@86 174 <p>Note that a lacing value of 255 implies that a second lacing value
cannam@86 175 follows in the packet, and a value of &lt; 255 marks the end of the
cannam@86 176 packet after that many additional bytes. A packet of 255 bytes (or a
cannam@86 177 multiple of 255 bytes) is terminated by a lacing value of 0:</p>
cannam@86 178
cannam@86 179 <pre><tt>
cannam@86 180 raw packet:
cannam@86 181 _______________________________
cannam@86 182 |________packet data____________| 255 bytes
cannam@86 183
cannam@86 184 lacing values: 255, 0
cannam@86 185 </tt></pre>
cannam@86 186
cannam@86 187 <p>Note also that a 'nil' (zero length) packet is not an error; it
cannam@86 188 consists of nothing more than a lacing value of zero in the header.</p>
cannam@86 189
cannam@86 190 <h3>Packets spanning pages</h3>
cannam@86 191
cannam@86 192 <p>Packets are not restricted to beginning and ending within a page,
cannam@86 193 although individual segments are, by definition, required to do so.
cannam@86 194 Packets are not restricted to a maximum size, although excessively
cannam@86 195 large packets in the data stream are discouraged; the Ogg
cannam@86 196 bitstream specification strongly recommends nominal page size of
cannam@86 197 approximately 4-8kB (large packets are foreseen as being useful for
cannam@86 198 initialization data at the beginning of a logical bitstream).</p>
cannam@86 199
cannam@86 200 <p>After segmenting a packet, the encoder may decide not to place all the
cannam@86 201 resulting segments into the current page; to do so, the encoder places
cannam@86 202 the lacing values of the segments it wishes to belong to the current
cannam@86 203 page into the current segment table, then finishes the page. The next
cannam@86 204 page is begun with the first value in the segment table belonging to
cannam@86 205 the next packet segment, thus continuing the packet (data in the
cannam@86 206 packet body must also correspond properly to the lacing values in the
cannam@86 207 spanned pages. The segment data in the first packet corresponding to
cannam@86 208 the lacing values of the first page belong in that page; packet
cannam@86 209 segments listed in the segment table of the following page must begin
cannam@86 210 the page body of the subsequent page).</p>
cannam@86 211
cannam@86 212 <p>The last mechanic to spanning a page boundary is to set the header
cannam@86 213 flag in the new page to indicate that the first lacing value in the
cannam@86 214 segment table continues rather than begins a packet; a header flag of
cannam@86 215 0x01 is set to indicate a continued packet. Although mandatory, it
cannam@86 216 is not actually algorithmically necessary; one could inspect the
cannam@86 217 preceding segment table to determine if the packet is new or
cannam@86 218 continued. Adding the information to the packet_header flag allows a
cannam@86 219 simpler design (with no overhead) that needs only inspect the current
cannam@86 220 page header after frame capture. This also allows faster error
cannam@86 221 recovery in the event that the packet originates in a corrupt
cannam@86 222 preceding page, implying that the previous page's segment table
cannam@86 223 cannot be trusted.</p>
cannam@86 224
cannam@86 225 <p>Note that a packet can span an arbitrary number of pages; the above
cannam@86 226 spanning process is repeated for each spanned page boundary. Also a
cannam@86 227 'zero termination' on a packet size that is an even multiple of 255
cannam@86 228 must appear even if the lacing value appears in the next page as a
cannam@86 229 zero-length continuation of the current packet. The header flag
cannam@86 230 should be set to 0x01 to indicate that the packet spanned, even though
cannam@86 231 the span is a nil case as far as data is concerned.</p>
cannam@86 232
cannam@86 233 <p>The encoding looks odd, but is properly optimized for speed and the
cannam@86 234 expected case of the majority of packets being between 50 and 200
cannam@86 235 bytes (note that it is designed such that packets of wildly different
cannam@86 236 sizes can be handled within the model; placing packet size
cannam@86 237 restrictions on the encoder would have only slightly simplified design
cannam@86 238 in page generation and increased overall encoder complexity).</p>
cannam@86 239
cannam@86 240 <p>The main point behind tracking individual packets (and packet
cannam@86 241 segments) is to allow more flexible encoding tricks that requiring
cannam@86 242 explicit knowledge of packet size. An example is simple bandwidth
cannam@86 243 limiting, implemented by simply truncating packets in the nominal case
cannam@86 244 if the packet is arranged so that the least sensitive portion of the
cannam@86 245 data comes last.</p>
cannam@86 246
cannam@86 247 <h3>Page header</h3>
cannam@86 248
cannam@86 249 <p>The headering mechanism is designed to avoid copying and re-assembly
cannam@86 250 of the packet data (ie, making the packet segmentation process a
cannam@86 251 logical one); the header can be generated directly from incoming
cannam@86 252 packet data. The encoder buffers packet data until it finishes a
cannam@86 253 complete page at which point it writes the header followed by the
cannam@86 254 buffered packet segments.</p>
cannam@86 255
cannam@86 256 <h4>capture_pattern</h4>
cannam@86 257
cannam@86 258 <p>A header begins with a capture pattern that simplifies identifying
cannam@86 259 pages; once the decoder has found the capture pattern it can do a more
cannam@86 260 intensive job of verifying that it has in fact found a page boundary
cannam@86 261 (as opposed to an inadvertent coincidence in the byte stream).</p>
cannam@86 262
cannam@86 263 <pre><tt>
cannam@86 264 byte value
cannam@86 265
cannam@86 266 0 0x4f 'O'
cannam@86 267 1 0x67 'g'
cannam@86 268 2 0x67 'g'
cannam@86 269 3 0x53 'S'
cannam@86 270 </tt></pre>
cannam@86 271
cannam@86 272 <h4>stream_structure_version</h4>
cannam@86 273
cannam@86 274 <p>The capture pattern is followed by the stream structure revision:</p>
cannam@86 275
cannam@86 276 <pre><tt>
cannam@86 277 byte value
cannam@86 278
cannam@86 279 4 0x00
cannam@86 280 </tt></pre>
cannam@86 281
cannam@86 282 <h4>header_type_flag</h4>
cannam@86 283
cannam@86 284 <p>The header type flag identifies this page's context in the bitstream:</p>
cannam@86 285
cannam@86 286 <pre><tt>
cannam@86 287 byte value
cannam@86 288
cannam@86 289 5 bitflags: 0x01: unset = fresh packet
cannam@86 290 set = continued packet
cannam@86 291 0x02: unset = not first page of logical bitstream
cannam@86 292 set = first page of logical bitstream (bos)
cannam@86 293 0x04: unset = not last page of logical bitstream
cannam@86 294 set = last page of logical bitstream (eos)
cannam@86 295 </tt></pre>
cannam@86 296
cannam@86 297 <h4>absolute granule position</h4>
cannam@86 298
cannam@86 299 <p>(This is packed in the same way the rest of Ogg data is packed; LSb
cannam@86 300 of LSB first. Note that the 'position' data specifies a 'sample'
cannam@86 301 number (eg, in a CD quality sample is four octets, 16 bits for left
cannam@86 302 and 16 bits for right; in video it would likely be the frame number.
cannam@86 303 It is up to the specific codec in use to define the semantic meaning
cannam@86 304 of the granule position value). The position specified is the total
cannam@86 305 samples encoded after including all packets finished on this page
cannam@86 306 (packets begun on this page but continuing on to the next page do not
cannam@86 307 count). The rationale here is that the position specified in the
cannam@86 308 frame header of the last page tells how long the data coded by the
cannam@86 309 bitstream is. A truncated stream will still return the proper number
cannam@86 310 of samples that can be decoded fully.</p>
cannam@86 311
cannam@86 312 <p>A special value of '-1' (in two's complement) indicates that no packets
cannam@86 313 finish on this page.</p>
cannam@86 314
cannam@86 315 <pre><tt>
cannam@86 316 byte value
cannam@86 317
cannam@86 318 6 0xXX LSB
cannam@86 319 7 0xXX
cannam@86 320 8 0xXX
cannam@86 321 9 0xXX
cannam@86 322 10 0xXX
cannam@86 323 11 0xXX
cannam@86 324 12 0xXX
cannam@86 325 13 0xXX MSB
cannam@86 326 </tt></pre>
cannam@86 327
cannam@86 328 <h4>stream serial number</h4>
cannam@86 329
cannam@86 330 <p>Ogg allows for separate logical bitstreams to be mixed at page
cannam@86 331 granularity in a physical bitstream. The most common case would be
cannam@86 332 sequential arrangement, but it is possible to interleave pages for
cannam@86 333 two separate bitstreams to be decoded concurrently. The serial
cannam@86 334 number is the means by which pages physical pages are associated with
cannam@86 335 a particular logical stream. Each logical stream must have a unique
cannam@86 336 serial number within a physical stream:</p>
cannam@86 337
cannam@86 338 <pre><tt>
cannam@86 339 byte value
cannam@86 340
cannam@86 341 14 0xXX LSB
cannam@86 342 15 0xXX
cannam@86 343 16 0xXX
cannam@86 344 17 0xXX MSB
cannam@86 345 </tt></pre>
cannam@86 346
cannam@86 347 <h4>page sequence no</h4>
cannam@86 348
cannam@86 349 <p>Page counter; lets us know if a page is lost (useful where packets
cannam@86 350 span page boundaries).</p>
cannam@86 351
cannam@86 352 <pre><tt>
cannam@86 353 byte value
cannam@86 354
cannam@86 355 18 0xXX LSB
cannam@86 356 19 0xXX
cannam@86 357 20 0xXX
cannam@86 358 21 0xXX MSB
cannam@86 359 </tt></pre>
cannam@86 360
cannam@86 361 <h4>page checksum</h4>
cannam@86 362
cannam@86 363 <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0,
cannam@86 364 generator polynomial=0x04c11db7). The value is computed over the
cannam@86 365 entire header (with the CRC field in the header set to zero) and then
cannam@86 366 continued over the page. The CRC field is then filled with the
cannam@86 367 computed value.</p>
cannam@86 368
cannam@86 369 <p>(A thorough discussion of CRC algorithms can be found in <a
cannam@86 370 href="http://www.ross.net/crc/download/crc_v3.txt">"A
cannam@86 371 Painless Guide to CRC Error Detection Algorithms"</a> by Ross
cannam@86 372 Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p>
cannam@86 373
cannam@86 374 <pre><tt>
cannam@86 375 byte value
cannam@86 376
cannam@86 377 22 0xXX LSB
cannam@86 378 23 0xXX
cannam@86 379 24 0xXX
cannam@86 380 25 0xXX MSB
cannam@86 381 </tt></pre>
cannam@86 382
cannam@86 383 <h4>page_segments</h4>
cannam@86 384
cannam@86 385 <p>The number of segment entries to appear in the segment table. The
cannam@86 386 maximum number of 255 segments (255 bytes each) sets the maximum
cannam@86 387 possible physical page size at 65307 bytes or just under 64kB (thus
cannam@86 388 we know that a header corrupted so as destroy sizing/alignment
cannam@86 389 information will not cause a runaway bitstream. We'll read in the
cannam@86 390 page according to the corrupted size information that's guaranteed to
cannam@86 391 be a reasonable size regardless, notice the checksum mismatch, drop
cannam@86 392 sync and then look for recapture).</p>
cannam@86 393
cannam@86 394 <pre><tt>
cannam@86 395 byte value
cannam@86 396
cannam@86 397 26 0x00-0xff (0-255)
cannam@86 398 </tt></pre>
cannam@86 399
cannam@86 400 <h4>segment_table (containing packet lacing values)</h4>
cannam@86 401
cannam@86 402 <p>The lacing values for each packet segment physically appearing in
cannam@86 403 this page are listed in contiguous order.</p>
cannam@86 404
cannam@86 405 <pre><tt>
cannam@86 406 byte value
cannam@86 407
cannam@86 408 27 0x00-0xff (0-255)
cannam@86 409 [...]
cannam@86 410 n 0x00-0xff (0-255, n=page_segments+26)
cannam@86 411 </tt></pre>
cannam@86 412
cannam@86 413 <p>Total page size is calculated directly from the known header size and
cannam@86 414 lacing values in the segment table. Packet data segments follow
cannam@86 415 immediately after the header.</p>
cannam@86 416
cannam@86 417 <p>Page headers typically impose a flat .25-.5% space overhead assuming
cannam@86 418 nominal ~8k page sizes. The segmentation table needed for exact
cannam@86 419 packet recovery in the streaming layer adds approximately .5-1%
cannam@86 420 nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
cannam@86 421 stereo encodings.</p>
cannam@86 422
cannam@86 423 <div id="copyright">
cannam@86 424 The Xiph Fish Logo is a
cannam@86 425 trademark (&trade;) of Xiph.Org.<br/>
cannam@86 426
cannam@86 427 These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
cannam@86 428 </div>
cannam@86 429
cannam@86 430 </body>
cannam@86 431 </html>