cannam@86: Topic: cannam@86: cannam@86: Sample granularity editing of a Vorbis file; inferred arbitrary sample cannam@86: length starting offsets / PCM stream lengths cannam@86: cannam@86: Overview: cannam@86: cannam@86: Vorbis, like mp3, is a frame-based* audio compression where audio is cannam@86: broken up into discrete short time segments. These segments are cannam@86: 'atomic' that is, one must recover the entire short time segment from cannam@86: the frame packet; there's no way to recover only a part of the PCM time cannam@86: segment from part of the coded packet without expanding the entire cannam@86: packet and then discarding a portion of the resulting PCM audio. cannam@86: cannam@86: * In mp3, the data segment representing a given time period is called cannam@86: a 'frame'; the roughly equivalent Vorbis construct is a 'packet'. cannam@86: cannam@86: Thus, when we edit a Vorbis stream, the finest physical editing cannam@86: granularity is on these packet boundaries (the mp3 case is cannam@86: actually somewhat more complex and mp3 editing is more complicated cannam@86: than just snipping on a frame boundary because time data can be spread cannam@86: backward or forward over frames. In Vorbis, packets are all cannam@86: stand-alone). Thus, at the physical packet level, Vorbis is still cannam@86: limited to streams that contain an integral number of packets. cannam@86: cannam@86: However, Vorbis streams may still exactly represent and be edited to a cannam@86: PCM stream of arbitrary length and starting offset without padding the cannam@86: beginning or end of the decoded stream or requiring that the desired cannam@86: edit points be packet aligned. Vorbis makes use of Ogg stream cannam@86: framing, and this framing provides time-stamping data, called a cannam@86: 'granule position'; our starting offset and finished stream length may cannam@86: be inferred from correct usage of the granule position data. cannam@86: cannam@86: Time stamping mechanism: cannam@86: cannam@86: Vorbis packets are bundled into into Ogg pages (note that pages do not cannam@86: necessarily contain integral numbers of packets, but that isn't cannam@86: inportant in this discussion. More about Ogg framing can be found in cannam@86: ogg/doc/framing.html). Each page that contains a packet boundary is cannam@86: stamped with the absolute sample-granularity offset of the data, that cannam@86: is, 'complete samples-to-date' up to the last completed packet of that cannam@86: page. (The same mechanism is used for eg, video, where the number cannam@86: represents complete 2-D frames, and so on). cannam@86: cannam@86: (It's possible but rare for a packet to span more than two pages such cannam@86: that page[s] in the middle have no packet boundary; these packets have cannam@86: a granule position of '-1'.) cannam@86: cannam@86: This granule position mechaism in Ogg is used by Vorbis to indicate when the cannam@86: PCM data intended to be represented in a Vorbis segment begins a cannam@86: number of samples into the data represented by the first packet[s] cannam@86: and/or ends before the physical PCM data represented in the last cannam@86: packet[s]. cannam@86: cannam@86: File length a non-integral number of frames: cannam@86: cannam@86: A file to be encoded in Vorbis will probably not encode into an cannam@86: integral number of packets; such a file is encoded with the last cannam@86: packet containing 'extra'* samples. These samples are not padding; they cannam@86: will be discarded in decode. cannam@86: cannam@86: *(For best results, the encoder should use extra samples that preserve cannam@86: the character of the last frame. Simply setting them to zero will cannam@86: introduce a 'cliff' that's hard to encode, resulting in spread-frame cannam@86: noise. Libvorbis extrapolates the last frame past the end of data to cannam@86: produce the extra samples. Even simply duplicating the last value is cannam@86: better than clamping the signal to zero). cannam@86: cannam@86: The encoder indicates to the decoder that the file is actually shorter cannam@86: than all of the samples ('original' + 'extra') by setting the granule cannam@86: position in the last page to a short value, that is, the last cannam@86: timestamp is the original length of the file discarding extra samples. cannam@86: The decoder will see that the number of samples it has decoded in the cannam@86: last page is too many; it is 'original' + 'extra', where the cannam@86: granulepos says that through the last packet we only have 'original' cannam@86: number of samples. The decoder then ignores the 'extra' samples. cannam@86: This behavior is to occur only when the end-of-stream bit is set in cannam@86: the page (indicating last page of the logical stream). cannam@86: cannam@86: Note that it not legal for the granule position of the last page to cannam@86: indicate that there are more samples in the file than actually exist, cannam@86: however, implementations should handle such an illegal file gracefully cannam@86: in the interests of robust programming. cannam@86: cannam@86: Beginning point not on integral packet boundary: cannam@86: cannam@86: It is possible that we will the PCM data represented by a Vorbis cannam@86: stream to begin at a position later than where the decoded PCM data cannam@86: really begins after an integral packet boundary, a situation analagous cannam@86: to the above description where the PCM data does not end at an cannam@86: integral packet boundary. The easiest example is taking a clip out of cannam@86: a larger Vorbis stream, and choosing a beginning point of the clip cannam@86: that is not on a packet boundary; we need to ignore a few samples to cannam@86: get the desired beginning point. cannam@86: cannam@86: The process of marking the desired beginning point is similar to cannam@86: marking an arbitrary ending point. If the encoder wishes sample zero cannam@86: to be some location past the actual beginning of data, it associates a cannam@86: 'short' granule position value with the completion of the second* cannam@86: audio packet. The granule position is associated with the second cannam@86: packet simply by making sure the second packet completes its page. cannam@86: cannam@86: *(We associate the short value with the second packet for two reasons. cannam@86: a) The first packet only primes the overlap/add buffer. No data is cannam@86: returned before decoding the second packet; this places the decision cannam@86: information at the point of decision. b) Placing the short value on cannam@86: the first packet would make the value negative (as the first packet cannam@86: normally represents position zero); a negative value would break the cannam@86: requirement that granule positions increase; the headers have cannam@86: position values of zero) cannam@86: cannam@86: The decoder sees that on the first page that will return cannam@86: data from the overlap/add queue, we have more samples than the granule cannam@86: position accounts for, and discards the 'surplus' from the beginning cannam@86: of the queue. cannam@86: cannam@86: Note that short granule values (indicating less than the actually cannam@86: returned about of data) are not legal in the Vorbis spec outside of cannam@86: indicating beginning and ending sample positions. However, decoders cannam@86: should, at minimum, tolerate inadvertant short values elsewhere in the cannam@86: stream (just as they should tolerate out-of-order/non-increasing cannam@86: granulepos values, although this too is illegal). cannam@86: cannam@86: Beginning point at arbitrary positive timestamp (no 'zero' sample): cannam@86: cannam@86: It's also possible that the granule position of the first page of an cannam@86: audio stream is a 'long value', that is, a value larger than the cannam@86: amount of PCM audio decoded. This implies only that we are starting cannam@86: playback at some point into the logical stream, a potentially common cannam@86: occurence in streaming applications where the decoder may be cannam@86: connecting into a live stream. The decoder should not treat the long cannam@86: value specially. cannam@86: cannam@86: A long value elsewhere in the stream would normally occur only when a cannam@86: page is lost or out of sequence, as indicated by the page's sequence cannam@86: number. A long value under any other situation is not legal, however cannam@86: a decoder should tolerate both possibilities. cannam@86: cannam@86: