annotate src/libogg-1.3.0/doc/rfc3533.txt @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 05aa0afa9217
children
rev   line source
Chris@1 1
Chris@1 2
Chris@1 3
Chris@1 4
Chris@1 5
Chris@1 6
Chris@1 7 Network Working Group S. Pfeiffer
Chris@1 8 Request for Comments: 3533 CSIRO
Chris@1 9 Category: Informational May 2003
Chris@1 10
Chris@1 11
Chris@1 12 The Ogg Encapsulation Format Version 0
Chris@1 13
Chris@1 14 Status of this Memo
Chris@1 15
Chris@1 16 This memo provides information for the Internet community. It does
Chris@1 17 not specify an Internet standard of any kind. Distribution of this
Chris@1 18 memo is unlimited.
Chris@1 19
Chris@1 20 Copyright Notice
Chris@1 21
Chris@1 22 Copyright (C) The Internet Society (2003). All Rights Reserved.
Chris@1 23
Chris@1 24 Abstract
Chris@1 25
Chris@1 26 This document describes the Ogg bitstream format version 0, which is
Chris@1 27 a general, freely-available encapsulation format for media streams.
Chris@1 28 It is able to encapsulate any kind and number of video and audio
Chris@1 29 encoding formats as well as other data streams in a single bitstream.
Chris@1 30
Chris@1 31 Terminology
Chris@1 32
Chris@1 33 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
Chris@1 34 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
Chris@1 35 document are to be interpreted as described in BCP 14, RFC 2119 [2].
Chris@1 36
Chris@1 37 Table of Contents
Chris@1 38
Chris@1 39 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chris@1 40 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chris@1 41 3. Requirements for a generic encapsulation format . . . . . . . 3
Chris@1 42 4. The Ogg bitstream format . . . . . . . . . . . . . . . . . . . 3
Chris@1 43 5. The encapsulation process . . . . . . . . . . . . . . . . . . 6
Chris@1 44 6. The Ogg page format . . . . . . . . . . . . . . . . . . . . . 9
Chris@1 45 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11
Chris@1 46 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chris@1 47 A. Glossary of terms and abbreviations . . . . . . . . . . . . . 13
Chris@1 48 B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
Chris@1 49 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14
Chris@1 50 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15
Chris@1 51
Chris@1 52
Chris@1 53
Chris@1 54
Chris@1 55
Chris@1 56
Chris@1 57
Chris@1 58 Pfeiffer Informational [Page 1]
Chris@1 59
Chris@1 60 RFC 3533 OGG May 2003
Chris@1 61
Chris@1 62
Chris@1 63 1. Introduction
Chris@1 64
Chris@1 65 The Ogg bitstream format has been developed as a part of a larger
Chris@1 66 project aimed at creating a set of components for the coding and
Chris@1 67 decoding of multimedia content (codecs) which are to be freely
Chris@1 68 available and freely re-implementable, both in software and in
Chris@1 69 hardware for the computing community at large, including the Internet
Chris@1 70 community. It is the intention of the Ogg developers represented by
Chris@1 71 Xiph.Org that it be usable without intellectual property concerns.
Chris@1 72
Chris@1 73 This document describes the Ogg bitstream format and how to use it to
Chris@1 74 encapsulate one or several media bitstreams created by one or several
Chris@1 75 encoders. The Ogg transport bitstream is designed to provide
Chris@1 76 framing, error protection and seeking structure for higher-level
Chris@1 77 codec streams that consist of raw, unencapsulated data packets, such
Chris@1 78 as the Vorbis audio codec or the upcoming Tarkin and Theora video
Chris@1 79 codecs. It is capable of interleaving different binary media and
Chris@1 80 other time-continuous data streams that are prepared by an encoder as
Chris@1 81 a sequence of data packets. Ogg provides enough information to
Chris@1 82 properly separate data back into such encoder created data packets at
Chris@1 83 the original packet boundaries without relying on decoding to find
Chris@1 84 packet boundaries.
Chris@1 85
Chris@1 86 Please note that the MIME type application/ogg has been registered
Chris@1 87 with the IANA [1].
Chris@1 88
Chris@1 89 2. Definitions
Chris@1 90
Chris@1 91 For describing the Ogg encapsulation process, a set of terms will be
Chris@1 92 used whose meaning needs to be well understood. Therefore, some of
Chris@1 93 the most fundamental terms are defined now before we start with the
Chris@1 94 description of the requirements for a generic media stream
Chris@1 95 encapsulation format, the process of encapsulation, and the concrete
Chris@1 96 format of the Ogg bitstream. See the Appendix for a more complete
Chris@1 97 glossary.
Chris@1 98
Chris@1 99 The result of an Ogg encapsulation is called the "Physical (Ogg)
Chris@1 100 Bitstream". It encapsulates one or several encoder-created
Chris@1 101 bitstreams, which are called "Logical Bitstreams". A logical
Chris@1 102 bitstream, provided to the Ogg encapsulation process, has a
Chris@1 103 structure, i.e., it is split up into a sequence of so-called
Chris@1 104 "Packets". The packets are created by the encoder of that logical
Chris@1 105 bitstream and represent meaningful entities for that encoder only
Chris@1 106 (e.g., an uncompressed stream may use video frames as packets). They
Chris@1 107 do not contain boundary information - strung together they appear to
Chris@1 108 be streams of random bytes with no landmarks.
Chris@1 109
Chris@1 110
Chris@1 111
Chris@1 112
Chris@1 113
Chris@1 114 Pfeiffer Informational [Page 2]
Chris@1 115
Chris@1 116 RFC 3533 OGG May 2003
Chris@1 117
Chris@1 118
Chris@1 119 Please note that the term "packet" is not used in this document to
Chris@1 120 signify entities for transport over a network.
Chris@1 121
Chris@1 122 3. Requirements for a generic encapsulation format
Chris@1 123
Chris@1 124 The design idea behind Ogg was to provide a generic, linear media
Chris@1 125 transport format to enable both file-based storage and stream-based
Chris@1 126 transmission of one or several interleaved media streams independent
Chris@1 127 of the encoding format of the media data. Such an encapsulation
Chris@1 128 format needs to provide:
Chris@1 129
Chris@1 130 o framing for logical bitstreams.
Chris@1 131
Chris@1 132 o interleaving of different logical bitstreams.
Chris@1 133
Chris@1 134 o detection of corruption.
Chris@1 135
Chris@1 136 o recapture after a parsing error.
Chris@1 137
Chris@1 138 o position landmarks for direct random access of arbitrary positions
Chris@1 139 in the bitstream.
Chris@1 140
Chris@1 141 o streaming capability (i.e., no seeking is needed to build a 100%
Chris@1 142 complete bitstream).
Chris@1 143
Chris@1 144 o small overhead (i.e., use no more than approximately 1-2% of
Chris@1 145 bitstream bandwidth for packet boundary marking, high-level
Chris@1 146 framing, sync and seeking).
Chris@1 147
Chris@1 148 o simplicity to enable fast parsing.
Chris@1 149
Chris@1 150 o simple concatenation mechanism of several physical bitstreams.
Chris@1 151
Chris@1 152 All of these design considerations have been taken into consideration
Chris@1 153 for Ogg. Ogg supports framing and interleaving of logical
Chris@1 154 bitstreams, seeking landmarks, detection of corruption, and stream
Chris@1 155 resynchronisation after a parsing error with no more than
Chris@1 156 approximately 1-2% overhead. It is a generic framework to perform
Chris@1 157 encapsulation of time-continuous bitstreams. It does not know any
Chris@1 158 specifics about the codec data that it encapsulates and is thus
Chris@1 159 independent of any media codec.
Chris@1 160
Chris@1 161 4. The Ogg bitstream format
Chris@1 162
Chris@1 163 A physical Ogg bitstream consists of multiple logical bitstreams
Chris@1 164 interleaved in so-called "Pages". Whole pages are taken in order
Chris@1 165 from multiple logical bitstreams multiplexed at the page level. The
Chris@1 166 logical bitstreams are identified by a unique serial number in the
Chris@1 167
Chris@1 168
Chris@1 169
Chris@1 170 Pfeiffer Informational [Page 3]
Chris@1 171
Chris@1 172 RFC 3533 OGG May 2003
Chris@1 173
Chris@1 174
Chris@1 175 header of each page of the physical bitstream. This unique serial
Chris@1 176 number is created randomly and does not have any connection to the
Chris@1 177 content or encoder of the logical bitstream it represents. Pages of
Chris@1 178 all logical bitstreams are concurrently interleaved, but they need
Chris@1 179 not be in a regular order - they are only required to be consecutive
Chris@1 180 within the logical bitstream. Ogg demultiplexing reconstructs the
Chris@1 181 original logical bitstreams from the physical bitstream by taking the
Chris@1 182 pages in order from the physical bitstream and redirecting them into
Chris@1 183 the appropriate logical decoding entity.
Chris@1 184
Chris@1 185 Each Ogg page contains only one type of data as it belongs to one
Chris@1 186 logical bitstream only. Pages are of variable size and have a page
Chris@1 187 header containing encapsulation and error recovery information. Each
Chris@1 188 logical bitstream in a physical Ogg bitstream starts with a special
Chris@1 189 start page (bos=beginning of stream) and ends with a special page
Chris@1 190 (eos=end of stream).
Chris@1 191
Chris@1 192 The bos page contains information to uniquely identify the codec type
Chris@1 193 and MAY contain information to set up the decoding process. The bos
Chris@1 194 page SHOULD also contain information about the encoded media - for
Chris@1 195 example, for audio, it should contain the sample rate and number of
Chris@1 196 channels. By convention, the first bytes of the bos page contain
Chris@1 197 magic data that uniquely identifies the required codec. It is the
Chris@1 198 responsibility of anyone fielding a new codec to make sure it is
Chris@1 199 possible to reliably distinguish his/her codec from all other codecs
Chris@1 200 in use. There is no fixed way to detect the end of the codec-
Chris@1 201 identifying marker. The format of the bos page is dependent on the
Chris@1 202 codec and therefore MUST be given in the encapsulation specification
Chris@1 203 of that logical bitstream type. Ogg also allows but does not require
Chris@1 204 secondary header packets after the bos page for logical bitstreams
Chris@1 205 and these must also precede any data packets in any logical
Chris@1 206 bitstream. These subsequent header packets are framed into an
Chris@1 207 integral number of pages, which will not contain any data packets.
Chris@1 208 So, a physical bitstream begins with the bos pages of all logical
Chris@1 209 bitstreams containing one initial header packet per page, followed by
Chris@1 210 the subsidiary header packets of all streams, followed by pages
Chris@1 211 containing data packets.
Chris@1 212
Chris@1 213 The encapsulation specification for one or more logical bitstreams is
Chris@1 214 called a "media mapping". An example for a media mapping is "Ogg
Chris@1 215 Vorbis", which uses the Ogg framework to encapsulate Vorbis-encoded
Chris@1 216 audio data for stream-based storage (such as files) and transport
Chris@1 217 (such as TCP streams or pipes). Ogg Vorbis provides the name and
Chris@1 218 revision of the Vorbis codec, the audio rate and the audio quality on
Chris@1 219 the Ogg Vorbis bos page. It also uses two additional header pages
Chris@1 220 per logical bitstream. The Ogg Vorbis bos page starts with the byte
Chris@1 221 0x01, followed by "vorbis" (a total of 7 bytes of identifier).
Chris@1 222
Chris@1 223
Chris@1 224
Chris@1 225
Chris@1 226 Pfeiffer Informational [Page 4]
Chris@1 227
Chris@1 228 RFC 3533 OGG May 2003
Chris@1 229
Chris@1 230
Chris@1 231 Ogg knows two types of multiplexing: concurrent multiplexing (so-
Chris@1 232 called "Grouping") and sequential multiplexing (so-called
Chris@1 233 "Chaining"). Grouping defines how to interleave several logical
Chris@1 234 bitstreams page-wise in the same physical bitstream. Grouping is for
Chris@1 235 example needed for interleaving a video stream with several
Chris@1 236 synchronised audio tracks using different codecs in different logical
Chris@1 237 bitstreams. Chaining on the other hand, is defined to provide a
Chris@1 238 simple mechanism to concatenate physical Ogg bitstreams, as is often
Chris@1 239 needed for streaming applications.
Chris@1 240
Chris@1 241 In grouping, all bos pages of all logical bitstreams MUST appear
Chris@1 242 together at the beginning of the Ogg bitstream. The media mapping
Chris@1 243 specifies the order of the initial pages. For example, the grouping
Chris@1 244 of a specific Ogg video and Ogg audio bitstream may specify that the
Chris@1 245 physical bitstream MUST begin with the bos page of the logical video
Chris@1 246 bitstream, followed by the bos page of the audio bitstream. Unlike
Chris@1 247 bos pages, eos pages for the logical bitstreams need not all occur
Chris@1 248 contiguously. Eos pages may be 'nil' pages, that is, pages
Chris@1 249 containing no content but simply a page header with position
Chris@1 250 information and the eos flag set in the page header. Each grouped
Chris@1 251 logical bitstream MUST have a unique serial number within the scope
Chris@1 252 of the physical bitstream.
Chris@1 253
Chris@1 254 In chaining, complete logical bitstreams are concatenated. The
Chris@1 255 bitstreams do not overlap, i.e., the eos page of a given logical
Chris@1 256 bitstream is immediately followed by the bos page of the next. Each
Chris@1 257 chained logical bitstream MUST have a unique serial number within the
Chris@1 258 scope of the physical bitstream.
Chris@1 259
Chris@1 260 It is possible to consecutively chain groups of concurrently
Chris@1 261 multiplexed bitstreams. The groups, when unchained, MUST stand on
Chris@1 262 their own as a valid concurrently multiplexed bitstream. The
Chris@1 263 following diagram shows a schematic example of such a physical
Chris@1 264 bitstream that obeys all the rules of both grouped and chained
Chris@1 265 multiplexed bitstreams.
Chris@1 266
Chris@1 267 physical bitstream with pages of
Chris@1 268 different logical bitstreams grouped and chained
Chris@1 269 -------------------------------------------------------------
Chris@1 270 |*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#|
Chris@1 271 -------------------------------------------------------------
Chris@1 272 bos bos bos eos eos eos bos eos
Chris@1 273
Chris@1 274 In this example, there are two chained physical bitstreams, the first
Chris@1 275 of which is a grouped stream of three logical bitstreams A, B, and C.
Chris@1 276 The second physical bitstream is chained after the end of the grouped
Chris@1 277 bitstream, which ends after the last eos page of all its grouped
Chris@1 278 logical bitstreams. As can be seen, grouped bitstreams begin
Chris@1 279
Chris@1 280
Chris@1 281
Chris@1 282 Pfeiffer Informational [Page 5]
Chris@1 283
Chris@1 284 RFC 3533 OGG May 2003
Chris@1 285
Chris@1 286
Chris@1 287 together - all of the bos pages MUST appear before any data pages.
Chris@1 288 It can also be seen that pages of concurrently multiplexed bitstreams
Chris@1 289 need not conform to a regular order. And it can be seen that a
Chris@1 290 grouped bitstream can end long before the other bitstreams in the
Chris@1 291 group end.
Chris@1 292
Chris@1 293 Ogg does not know any specifics about the codec data except that each
Chris@1 294 logical bitstream belongs to a different codec, the data from the
Chris@1 295 codec comes in order and has position markers (so-called "Granule
Chris@1 296 positions"). Ogg does not have a concept of 'time': it only knows
Chris@1 297 about sequentially increasing, unitless position markers. An
Chris@1 298 application can only get temporal information through higher layers
Chris@1 299 which have access to the codec APIs to assign and convert granule
Chris@1 300 positions or time.
Chris@1 301
Chris@1 302 A specific definition of a media mapping using Ogg may put further
Chris@1 303 constraints on its specific use of the Ogg bitstream format. For
Chris@1 304 example, a specific media mapping may require that all the eos pages
Chris@1 305 for all grouped bitstreams need to appear in direct sequence. An
Chris@1 306 example for a media mapping is the specification of "Ogg Vorbis".
Chris@1 307 Another example is the upcoming "Ogg Theora" specification which
Chris@1 308 encapsulates Theora-encoded video data and usually comes multiplexed
Chris@1 309 with a Vorbis stream for an Ogg containing synchronised audio and
Chris@1 310 video. As Ogg does not specify temporal relationships between the
Chris@1 311 encapsulated concurrently multiplexed bitstreams, the temporal
Chris@1 312 synchronisation between the audio and video stream will be specified
Chris@1 313 in this media mapping. To enable streaming, pages from various
Chris@1 314 logical bitstreams will typically be interleaved in chronological
Chris@1 315 order.
Chris@1 316
Chris@1 317 5. The encapsulation process
Chris@1 318
Chris@1 319 The process of multiplexing different logical bitstreams happens at
Chris@1 320 the level of pages as described above. The bitstreams provided by
Chris@1 321 encoders are however handed over to Ogg as so-called "Packets" with
Chris@1 322 packet boundaries dependent on the encoding format. The process of
Chris@1 323 encapsulating packets into pages will be described now.
Chris@1 324
Chris@1 325 From Ogg's perspective, packets can be of any arbitrary size. A
Chris@1 326 specific media mapping will define how to group or break up packets
Chris@1 327 from a specific media encoder. As Ogg pages have a maximum size of
Chris@1 328 about 64 kBytes, sometimes a packet has to be distributed over
Chris@1 329 several pages. To simplify that process, Ogg divides each packet
Chris@1 330 into 255 byte long chunks plus a final shorter chunk. These chunks
Chris@1 331 are called "Ogg Segments". They are only a logical construct and do
Chris@1 332 not have a header for themselves.
Chris@1 333
Chris@1 334
Chris@1 335
Chris@1 336
Chris@1 337
Chris@1 338 Pfeiffer Informational [Page 6]
Chris@1 339
Chris@1 340 RFC 3533 OGG May 2003
Chris@1 341
Chris@1 342
Chris@1 343 A group of contiguous segments is wrapped into a variable length page
Chris@1 344 preceded by a header. A segment table in the page header tells about
Chris@1 345 the "Lacing values" (sizes) of the segments included in the page. A
Chris@1 346 flag in the page header tells whether a page contains a packet
Chris@1 347 continued from a previous page. Note that a lacing value of 255
Chris@1 348 implies that a second lacing value follows in the packet, and a value
Chris@1 349 of less than 255 marks the end of the packet after that many
Chris@1 350 additional bytes. A packet of 255 bytes (or a multiple of 255 bytes)
Chris@1 351 is terminated by a lacing value of 0. Note also that a 'nil' (zero
Chris@1 352 length) packet is not an error; it consists of nothing more than a
Chris@1 353 lacing value of zero in the header.
Chris@1 354
Chris@1 355 The encoding is optimized for speed and the expected case of the
Chris@1 356 majority of packets being between 50 and 200 bytes large. This is a
Chris@1 357 design justification rather than a recommendation. This encoding
Chris@1 358 both avoids imposing a maximum packet size as well as imposing
Chris@1 359 minimum overhead on small packets. In contrast, e.g., simply using
Chris@1 360 two bytes at the head of every packet and having a max packet size of
Chris@1 361 32 kBytes would always penalize small packets (< 255 bytes, the
Chris@1 362 typical case) with twice the segmentation overhead. Using the lacing
Chris@1 363 values as suggested, small packets see the minimum possible byte-
Chris@1 364 aligned overhead (1 byte) and large packets (>512 bytes) see a fairly
Chris@1 365 constant ~0.5% overhead on encoding space.
Chris@1 366
Chris@1 367
Chris@1 368
Chris@1 369
Chris@1 370
Chris@1 371
Chris@1 372
Chris@1 373
Chris@1 374
Chris@1 375
Chris@1 376
Chris@1 377
Chris@1 378
Chris@1 379
Chris@1 380
Chris@1 381
Chris@1 382
Chris@1 383
Chris@1 384
Chris@1 385
Chris@1 386
Chris@1 387
Chris@1 388
Chris@1 389
Chris@1 390
Chris@1 391
Chris@1 392
Chris@1 393
Chris@1 394 Pfeiffer Informational [Page 7]
Chris@1 395
Chris@1 396 RFC 3533 OGG May 2003
Chris@1 397
Chris@1 398
Chris@1 399 The following diagram shows a schematic example of a media mapping
Chris@1 400 using Ogg and grouped logical bitstreams:
Chris@1 401
Chris@1 402 logical bitstream with packet boundaries
Chris@1 403 -----------------------------------------------------------------
Chris@1 404 > | packet_1 | packet_2 | packet_3 | <
Chris@1 405 -----------------------------------------------------------------
Chris@1 406
Chris@1 407 |segmentation (logically only)
Chris@1 408 v
Chris@1 409
Chris@1 410 packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs)
Chris@1 411 ------------------------------ -------------------- ------------
Chris@1 412 .. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 | ..
Chris@1 413 ------------------------------ -------------------- ------------
Chris@1 414
Chris@1 415 | page encapsulation
Chris@1 416 v
Chris@1 417
Chris@1 418 page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data)
Chris@1 419 ------------------------ ---------------- ------------------------
Chris@1 420 |H|------------------- | |H|----------- | |H|------------------- |
Chris@1 421 |D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ...
Chris@1 422 |R|------------------- | |R|----------- | |R|------------------- |
Chris@1 423 ------------------------ ---------------- ------------------------
Chris@1 424
Chris@1 425 |
Chris@1 426 pages of |
Chris@1 427 other --------| |
Chris@1 428 logical -------
Chris@1 429 bitstreams | MUX |
Chris@1 430 -------
Chris@1 431 |
Chris@1 432 v
Chris@1 433
Chris@1 434 page_1 page_2 page_3
Chris@1 435 ------ ------ ------- ----- -------
Chris@1 436 ... || | || | || | || | || | ...
Chris@1 437 ------ ------ ------- ----- -------
Chris@1 438 physical Ogg bitstream
Chris@1 439
Chris@1 440 In this example we take a snapshot of the encapsulation process of
Chris@1 441 one logical bitstream. We can see part of that bitstream's
Chris@1 442 subdivision into packets as provided by the codec. The Ogg
Chris@1 443 encapsulation process chops up the packets into segments. The
Chris@1 444 packets in this example are rather large such that packet_1 is split
Chris@1 445 into 5 segments - 4 segments with 255 bytes and a final smaller one.
Chris@1 446 Packet_2 is split into 4 segments - 3 segments with 255 bytes and a
Chris@1 447
Chris@1 448
Chris@1 449
Chris@1 450 Pfeiffer Informational [Page 8]
Chris@1 451
Chris@1 452 RFC 3533 OGG May 2003
Chris@1 453
Chris@1 454
Chris@1 455 final very small one - and packet_3 is split into two segments. The
Chris@1 456 encapsulation process then creates pages, which are quite small in
Chris@1 457 this example. Page_1 consists of the first three segments of
Chris@1 458 packet_1, page_2 contains the remaining 2 segments from packet_1, and
Chris@1 459 page_3 contains the first three pages of packet_2. Finally, this
Chris@1 460 logical bitstream is multiplexed into a physical Ogg bitstream with
Chris@1 461 pages of other logical bitstreams.
Chris@1 462
Chris@1 463 6. The Ogg page format
Chris@1 464
Chris@1 465 A physical Ogg bitstream consists of a sequence of concatenated
Chris@1 466 pages. Pages are of variable size, usually 4-8 kB, maximum 65307
Chris@1 467 bytes. A page header contains all the information needed to
Chris@1 468 demultiplex the logical bitstreams out of the physical bitstream and
Chris@1 469 to perform basic error recovery and landmarks for seeking. Each page
Chris@1 470 is a self-contained entity such that the page decode mechanism can
Chris@1 471 recognize, verify, and handle single pages at a time without
Chris@1 472 requiring the overall bitstream.
Chris@1 473
Chris@1 474 The Ogg page header has the following format:
Chris@1 475
Chris@1 476 0 1 2 3
Chris@1 477 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
Chris@1 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 479 | capture_pattern: Magic number for page start "OggS" | 0-3
Chris@1 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 481 | version | header_type | granule_position | 4-7
Chris@1 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 483 | | 8-11
Chris@1 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 485 | | bitstream_serial_number | 12-15
Chris@1 486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 487 | | page_sequence_number | 16-19
Chris@1 488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 489 | | CRC_checksum | 20-23
Chris@1 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 491 | |page_segments | segment_table | 24-27
Chris@1 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 493 | ... | 28-
Chris@1 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Chris@1 495
Chris@1 496 The LSb (least significant bit) comes first in the Bytes. Fields
Chris@1 497 with more than one byte length are encoded LSB (least significant
Chris@1 498 byte) first.
Chris@1 499
Chris@1 500
Chris@1 501
Chris@1 502
Chris@1 503
Chris@1 504
Chris@1 505
Chris@1 506 Pfeiffer Informational [Page 9]
Chris@1 507
Chris@1 508 RFC 3533 OGG May 2003
Chris@1 509
Chris@1 510
Chris@1 511 The fields in the page header have the following meaning:
Chris@1 512
Chris@1 513 1. capture_pattern: a 4 Byte field that signifies the beginning of a
Chris@1 514 page. It contains the magic numbers:
Chris@1 515
Chris@1 516 0x4f 'O'
Chris@1 517
Chris@1 518 0x67 'g'
Chris@1 519
Chris@1 520 0x67 'g'
Chris@1 521
Chris@1 522 0x53 'S'
Chris@1 523
Chris@1 524 It helps a decoder to find the page boundaries and regain
Chris@1 525 synchronisation after parsing a corrupted stream. Once the
Chris@1 526 capture pattern is found, the decoder verifies page sync and
Chris@1 527 integrity by computing and comparing the checksum.
Chris@1 528
Chris@1 529 2. stream_structure_version: 1 Byte signifying the version number of
Chris@1 530 the Ogg file format used in this stream (this document specifies
Chris@1 531 version 0).
Chris@1 532
Chris@1 533 3. header_type_flag: the bits in this 1 Byte field identify the
Chris@1 534 specific type of this page.
Chris@1 535
Chris@1 536 * bit 0x01
Chris@1 537
Chris@1 538 set: page contains data of a packet continued from the previous
Chris@1 539 page
Chris@1 540
Chris@1 541 unset: page contains a fresh packet
Chris@1 542
Chris@1 543 * bit 0x02
Chris@1 544
Chris@1 545 set: this is the first page of a logical bitstream (bos)
Chris@1 546
Chris@1 547 unset: this page is not a first page
Chris@1 548
Chris@1 549 * bit 0x04
Chris@1 550
Chris@1 551 set: this is the last page of a logical bitstream (eos)
Chris@1 552
Chris@1 553 unset: this page is not a last page
Chris@1 554
Chris@1 555 4. granule_position: an 8 Byte field containing position information.
Chris@1 556 For example, for an audio stream, it MAY contain the total number
Chris@1 557 of PCM samples encoded after including all frames finished on this
Chris@1 558 page. For a video stream it MAY contain the total number of video
Chris@1 559
Chris@1 560
Chris@1 561
Chris@1 562 Pfeiffer Informational [Page 10]
Chris@1 563
Chris@1 564 RFC 3533 OGG May 2003
Chris@1 565
Chris@1 566
Chris@1 567 frames encoded after this page. This is a hint for the decoder
Chris@1 568 and gives it some timing and position information. Its meaning is
Chris@1 569 dependent on the codec for that logical bitstream and specified in
Chris@1 570 a specific media mapping. A special value of -1 (in two's
Chris@1 571 complement) indicates that no packets finish on this page.
Chris@1 572
Chris@1 573 5. bitstream_serial_number: a 4 Byte field containing the unique
Chris@1 574 serial number by which the logical bitstream is identified.
Chris@1 575
Chris@1 576 6. page_sequence_number: a 4 Byte field containing the sequence
Chris@1 577 number of the page so the decoder can identify page loss. This
Chris@1 578 sequence number is increasing on each logical bitstream
Chris@1 579 separately.
Chris@1 580
Chris@1 581 7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of
Chris@1 582 the page (including header with zero CRC field and page content).
Chris@1 583 The generator polynomial is 0x04c11db7.
Chris@1 584
Chris@1 585 8. number_page_segments: 1 Byte giving the number of segment entries
Chris@1 586 encoded in the segment table.
Chris@1 587
Chris@1 588 9. segment_table: number_page_segments Bytes containing the lacing
Chris@1 589 values of all segments in this page. Each Byte contains one
Chris@1 590 lacing value.
Chris@1 591
Chris@1 592 The total header size in bytes is given by:
Chris@1 593 header_size = number_page_segments + 27 [Byte]
Chris@1 594
Chris@1 595 The total page size in Bytes is given by:
Chris@1 596 page_size = header_size + sum(lacing_values: 1..number_page_segments)
Chris@1 597 [Byte]
Chris@1 598
Chris@1 599 7. Security Considerations
Chris@1 600
Chris@1 601 The Ogg encapsulation format is a container format and only
Chris@1 602 encapsulates content (such as Vorbis-encoded audio). It does not
Chris@1 603 provide for any generic encryption or signing of itself or its
Chris@1 604 contained content bitstreams. However, it encapsulates any kind of
Chris@1 605 content bitstream as long as there is a codec for it, and is thus
Chris@1 606 able to contain encrypted and signed content data. It is also
Chris@1 607 possible to add an external security mechanism that encrypts or signs
Chris@1 608 an Ogg physical bitstream and thus provides content confidentiality
Chris@1 609 and authenticity.
Chris@1 610
Chris@1 611 As Ogg encapsulates binary data, it is possible to include executable
Chris@1 612 content in an Ogg bitstream. This can be an issue with applications
Chris@1 613 that are implemented using the Ogg format, especially when Ogg is
Chris@1 614 used for streaming or file transfer in a networking scenario. As
Chris@1 615
Chris@1 616
Chris@1 617
Chris@1 618 Pfeiffer Informational [Page 11]
Chris@1 619
Chris@1 620 RFC 3533 OGG May 2003
Chris@1 621
Chris@1 622
Chris@1 623 such, Ogg does not pose a threat there. However, an application
Chris@1 624 decoding Ogg and its encapsulated content bitstreams has to ensure
Chris@1 625 correct handling of manipulated bitstreams, of buffer overflows and
Chris@1 626 the like.
Chris@1 627
Chris@1 628 8. References
Chris@1 629
Chris@1 630 [1] Walleij, L., "The application/ogg Media Type", RFC 3534, May
Chris@1 631 2003.
Chris@1 632
Chris@1 633 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Chris@1 634 Levels", BCP 14, RFC 2119, March 1997.
Chris@1 635
Chris@1 636
Chris@1 637
Chris@1 638
Chris@1 639
Chris@1 640
Chris@1 641
Chris@1 642
Chris@1 643
Chris@1 644
Chris@1 645
Chris@1 646
Chris@1 647
Chris@1 648
Chris@1 649
Chris@1 650
Chris@1 651
Chris@1 652
Chris@1 653
Chris@1 654
Chris@1 655
Chris@1 656
Chris@1 657
Chris@1 658
Chris@1 659
Chris@1 660
Chris@1 661
Chris@1 662
Chris@1 663
Chris@1 664
Chris@1 665
Chris@1 666
Chris@1 667
Chris@1 668
Chris@1 669
Chris@1 670
Chris@1 671
Chris@1 672
Chris@1 673
Chris@1 674 Pfeiffer Informational [Page 12]
Chris@1 675
Chris@1 676 RFC 3533 OGG May 2003
Chris@1 677
Chris@1 678
Chris@1 679 Appendix A. Glossary of terms and abbreviations
Chris@1 680
Chris@1 681 bos page: The initial page (beginning of stream) of a logical
Chris@1 682 bitstream which contains information to identify the codec type
Chris@1 683 and other decoding-relevant information.
Chris@1 684
Chris@1 685 chaining (or sequential multiplexing): Concatenation of two or more
Chris@1 686 complete physical Ogg bitstreams.
Chris@1 687
Chris@1 688 eos page: The final page (end of stream) of a logical bitstream.
Chris@1 689
Chris@1 690 granule position: An increasing position number for a specific
Chris@1 691 logical bitstream stored in the page header. Its meaning is
Chris@1 692 dependent on the codec for that logical bitstream and specified in
Chris@1 693 a specific media mapping.
Chris@1 694
Chris@1 695 grouping (or concurrent multiplexing): Interleaving of pages of
Chris@1 696 several logical bitstreams into one complete physical Ogg
Chris@1 697 bitstream under the restriction that all bos pages of all grouped
Chris@1 698 logical bitstreams MUST appear before any data pages.
Chris@1 699
Chris@1 700 lacing value: An entry in the segment table of a page header
Chris@1 701 representing the size of the related segment.
Chris@1 702
Chris@1 703 logical bitstream: A sequence of bits being the result of an encoded
Chris@1 704 media stream.
Chris@1 705
Chris@1 706 media mapping: A specific use of the Ogg encapsulation format
Chris@1 707 together with a specific (set of) codec(s).
Chris@1 708
Chris@1 709 (Ogg) packet: A subpart of a logical bitstream that is created by the
Chris@1 710 encoder for that bitstream and represents a meaningful entity for
Chris@1 711 the encoder, but only a sequence of bits to the Ogg encapsulation.
Chris@1 712
Chris@1 713 (Ogg) page: A physical bitstream consists of a sequence of Ogg pages
Chris@1 714 containing data of one logical bitstream only. It usually
Chris@1 715 contains a group of contiguous segments of one packet only, but
Chris@1 716 sometimes packets are too large and need to be split over several
Chris@1 717 pages.
Chris@1 718
Chris@1 719 physical (Ogg) bitstream: The sequence of bits resulting from an Ogg
Chris@1 720 encapsulation of one or several logical bitstreams. It consists
Chris@1 721 of a sequence of pages from the logical bitstreams with the
Chris@1 722 restriction that the pages of one logical bitstream MUST come in
Chris@1 723 their correct temporal order.
Chris@1 724
Chris@1 725
Chris@1 726
Chris@1 727
Chris@1 728
Chris@1 729
Chris@1 730 Pfeiffer Informational [Page 13]
Chris@1 731
Chris@1 732 RFC 3533 OGG May 2003
Chris@1 733
Chris@1 734
Chris@1 735 (Ogg) segment: The Ogg encapsulation process splits each packet into
Chris@1 736 chunks of 255 bytes plus a last fractional chunk of less than 255
Chris@1 737 bytes. These chunks are called segments.
Chris@1 738
Chris@1 739 Appendix B. Acknowledgements
Chris@1 740
Chris@1 741 The author gratefully acknowledges the work that Christopher
Chris@1 742 Montgomery and the Xiph.Org foundation have done in defining the Ogg
Chris@1 743 multimedia project and as part of it the open file format described
Chris@1 744 in this document. The author hopes that providing this document to
Chris@1 745 the Internet community will help in promoting the Ogg multimedia
Chris@1 746 project at http://www.xiph.org/. Many thanks also for the many
Chris@1 747 technical and typo corrections that C. Montgomery and the Ogg
Chris@1 748 community provided as feedback to this RFC.
Chris@1 749
Chris@1 750 Author's Address
Chris@1 751
Chris@1 752 Silvia Pfeiffer
Chris@1 753 CSIRO, Australia
Chris@1 754 Locked Bag 17
Chris@1 755 North Ryde, NSW 2113
Chris@1 756 Australia
Chris@1 757
Chris@1 758 Phone: +61 2 9325 3141
Chris@1 759 EMail: Silvia.Pfeiffer@csiro.au
Chris@1 760 URI: http://www.cmis.csiro.au/Silvia.Pfeiffer/
Chris@1 761
Chris@1 762
Chris@1 763
Chris@1 764
Chris@1 765
Chris@1 766
Chris@1 767
Chris@1 768
Chris@1 769
Chris@1 770
Chris@1 771
Chris@1 772
Chris@1 773
Chris@1 774
Chris@1 775
Chris@1 776
Chris@1 777
Chris@1 778
Chris@1 779
Chris@1 780
Chris@1 781
Chris@1 782
Chris@1 783
Chris@1 784
Chris@1 785
Chris@1 786 Pfeiffer Informational [Page 14]
Chris@1 787
Chris@1 788 RFC 3533 OGG May 2003
Chris@1 789
Chris@1 790
Chris@1 791 Full Copyright Statement
Chris@1 792
Chris@1 793 Copyright (C) The Internet Society (2003). All Rights Reserved.
Chris@1 794
Chris@1 795 This document and translations of it may be copied and furnished to
Chris@1 796 others, and derivative works that comment on or otherwise explain it
Chris@1 797 or assist in its implementation may be prepared, copied, published
Chris@1 798 and distributed, in whole or in part, without restriction of any
Chris@1 799 kind, provided that the above copyright notice and this paragraph are
Chris@1 800 included on all such copies and derivative works. However, this
Chris@1 801 document itself may not be modified in any way, such as by removing
Chris@1 802 the copyright notice or references to the Internet Society or other
Chris@1 803 Internet organizations, except as needed for the purpose of
Chris@1 804 developing Internet standards in which case the procedures for
Chris@1 805 copyrights defined in the Internet Standards process must be
Chris@1 806 followed, or as required to translate it into languages other than
Chris@1 807 English.
Chris@1 808
Chris@1 809 The limited permissions granted above are perpetual and will not be
Chris@1 810 revoked by the Internet Society or its successors or assigns.
Chris@1 811
Chris@1 812 This document and the information contained herein is provided on an
Chris@1 813 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
Chris@1 814 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
Chris@1 815 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
Chris@1 816 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
Chris@1 817 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Chris@1 818
Chris@1 819 Acknowledgement
Chris@1 820
Chris@1 821 Funding for the RFC Editor function is currently provided by the
Chris@1 822 Internet Society.
Chris@1 823
Chris@1 824
Chris@1 825
Chris@1 826
Chris@1 827
Chris@1 828
Chris@1 829
Chris@1 830
Chris@1 831
Chris@1 832
Chris@1 833
Chris@1 834
Chris@1 835
Chris@1 836
Chris@1 837
Chris@1 838
Chris@1 839
Chris@1 840
Chris@1 841
Chris@1 842 Pfeiffer Informational [Page 15]
Chris@1 843