annotate src/libvorbis-1.3.3/doc/vorbis-clip.txt @ 72:7b5216b54e42

Update exclusion list
author Chris Cannam
date Fri, 25 Jan 2019 13:49:22 +0000
parents 05aa0afa9217
children
rev   line source
Chris@1 1 Topic:
Chris@1 2
Chris@1 3 Sample granularity editing of a Vorbis file; inferred arbitrary sample
Chris@1 4 length starting offsets / PCM stream lengths
Chris@1 5
Chris@1 6 Overview:
Chris@1 7
Chris@1 8 Vorbis, like mp3, is a frame-based* audio compression where audio is
Chris@1 9 broken up into discrete short time segments. These segments are
Chris@1 10 'atomic' that is, one must recover the entire short time segment from
Chris@1 11 the frame packet; there's no way to recover only a part of the PCM time
Chris@1 12 segment from part of the coded packet without expanding the entire
Chris@1 13 packet and then discarding a portion of the resulting PCM audio.
Chris@1 14
Chris@1 15 * In mp3, the data segment representing a given time period is called
Chris@1 16 a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.
Chris@1 17
Chris@1 18 Thus, when we edit a Vorbis stream, the finest physical editing
Chris@1 19 granularity is on these packet boundaries (the mp3 case is
Chris@1 20 actually somewhat more complex and mp3 editing is more complicated
Chris@1 21 than just snipping on a frame boundary because time data can be spread
Chris@1 22 backward or forward over frames. In Vorbis, packets are all
Chris@1 23 stand-alone). Thus, at the physical packet level, Vorbis is still
Chris@1 24 limited to streams that contain an integral number of packets.
Chris@1 25
Chris@1 26 However, Vorbis streams may still exactly represent and be edited to a
Chris@1 27 PCM stream of arbitrary length and starting offset without padding the
Chris@1 28 beginning or end of the decoded stream or requiring that the desired
Chris@1 29 edit points be packet aligned. Vorbis makes use of Ogg stream
Chris@1 30 framing, and this framing provides time-stamping data, called a
Chris@1 31 'granule position'; our starting offset and finished stream length may
Chris@1 32 be inferred from correct usage of the granule position data.
Chris@1 33
Chris@1 34 Time stamping mechanism:
Chris@1 35
Chris@1 36 Vorbis packets are bundled into into Ogg pages (note that pages do not
Chris@1 37 necessarily contain integral numbers of packets, but that isn't
Chris@1 38 inportant in this discussion. More about Ogg framing can be found in
Chris@1 39 ogg/doc/framing.html). Each page that contains a packet boundary is
Chris@1 40 stamped with the absolute sample-granularity offset of the data, that
Chris@1 41 is, 'complete samples-to-date' up to the last completed packet of that
Chris@1 42 page. (The same mechanism is used for eg, video, where the number
Chris@1 43 represents complete 2-D frames, and so on).
Chris@1 44
Chris@1 45 (It's possible but rare for a packet to span more than two pages such
Chris@1 46 that page[s] in the middle have no packet boundary; these packets have
Chris@1 47 a granule position of '-1'.)
Chris@1 48
Chris@1 49 This granule position mechaism in Ogg is used by Vorbis to indicate when the
Chris@1 50 PCM data intended to be represented in a Vorbis segment begins a
Chris@1 51 number of samples into the data represented by the first packet[s]
Chris@1 52 and/or ends before the physical PCM data represented in the last
Chris@1 53 packet[s].
Chris@1 54
Chris@1 55 File length a non-integral number of frames:
Chris@1 56
Chris@1 57 A file to be encoded in Vorbis will probably not encode into an
Chris@1 58 integral number of packets; such a file is encoded with the last
Chris@1 59 packet containing 'extra'* samples. These samples are not padding; they
Chris@1 60 will be discarded in decode.
Chris@1 61
Chris@1 62 *(For best results, the encoder should use extra samples that preserve
Chris@1 63 the character of the last frame. Simply setting them to zero will
Chris@1 64 introduce a 'cliff' that's hard to encode, resulting in spread-frame
Chris@1 65 noise. Libvorbis extrapolates the last frame past the end of data to
Chris@1 66 produce the extra samples. Even simply duplicating the last value is
Chris@1 67 better than clamping the signal to zero).
Chris@1 68
Chris@1 69 The encoder indicates to the decoder that the file is actually shorter
Chris@1 70 than all of the samples ('original' + 'extra') by setting the granule
Chris@1 71 position in the last page to a short value, that is, the last
Chris@1 72 timestamp is the original length of the file discarding extra samples.
Chris@1 73 The decoder will see that the number of samples it has decoded in the
Chris@1 74 last page is too many; it is 'original' + 'extra', where the
Chris@1 75 granulepos says that through the last packet we only have 'original'
Chris@1 76 number of samples. The decoder then ignores the 'extra' samples.
Chris@1 77 This behavior is to occur only when the end-of-stream bit is set in
Chris@1 78 the page (indicating last page of the logical stream).
Chris@1 79
Chris@1 80 Note that it not legal for the granule position of the last page to
Chris@1 81 indicate that there are more samples in the file than actually exist,
Chris@1 82 however, implementations should handle such an illegal file gracefully
Chris@1 83 in the interests of robust programming.
Chris@1 84
Chris@1 85 Beginning point not on integral packet boundary:
Chris@1 86
Chris@1 87 It is possible that we will the PCM data represented by a Vorbis
Chris@1 88 stream to begin at a position later than where the decoded PCM data
Chris@1 89 really begins after an integral packet boundary, a situation analagous
Chris@1 90 to the above description where the PCM data does not end at an
Chris@1 91 integral packet boundary. The easiest example is taking a clip out of
Chris@1 92 a larger Vorbis stream, and choosing a beginning point of the clip
Chris@1 93 that is not on a packet boundary; we need to ignore a few samples to
Chris@1 94 get the desired beginning point.
Chris@1 95
Chris@1 96 The process of marking the desired beginning point is similar to
Chris@1 97 marking an arbitrary ending point. If the encoder wishes sample zero
Chris@1 98 to be some location past the actual beginning of data, it associates a
Chris@1 99 'short' granule position value with the completion of the second*
Chris@1 100 audio packet. The granule position is associated with the second
Chris@1 101 packet simply by making sure the second packet completes its page.
Chris@1 102
Chris@1 103 *(We associate the short value with the second packet for two reasons.
Chris@1 104 a) The first packet only primes the overlap/add buffer. No data is
Chris@1 105 returned before decoding the second packet; this places the decision
Chris@1 106 information at the point of decision. b) Placing the short value on
Chris@1 107 the first packet would make the value negative (as the first packet
Chris@1 108 normally represents position zero); a negative value would break the
Chris@1 109 requirement that granule positions increase; the headers have
Chris@1 110 position values of zero)
Chris@1 111
Chris@1 112 The decoder sees that on the first page that will return
Chris@1 113 data from the overlap/add queue, we have more samples than the granule
Chris@1 114 position accounts for, and discards the 'surplus' from the beginning
Chris@1 115 of the queue.
Chris@1 116
Chris@1 117 Note that short granule values (indicating less than the actually
Chris@1 118 returned about of data) are not legal in the Vorbis spec outside of
Chris@1 119 indicating beginning and ending sample positions. However, decoders
Chris@1 120 should, at minimum, tolerate inadvertant short values elsewhere in the
Chris@1 121 stream (just as they should tolerate out-of-order/non-increasing
Chris@1 122 granulepos values, although this too is illegal).
Chris@1 123
Chris@1 124 Beginning point at arbitrary positive timestamp (no 'zero' sample):
Chris@1 125
Chris@1 126 It's also possible that the granule position of the first page of an
Chris@1 127 audio stream is a 'long value', that is, a value larger than the
Chris@1 128 amount of PCM audio decoded. This implies only that we are starting
Chris@1 129 playback at some point into the logical stream, a potentially common
Chris@1 130 occurence in streaming applications where the decoder may be
Chris@1 131 connecting into a live stream. The decoder should not treat the long
Chris@1 132 value specially.
Chris@1 133
Chris@1 134 A long value elsewhere in the stream would normally occur only when a
Chris@1 135 page is lost or out of sequence, as indicated by the page's sequence
Chris@1 136 number. A long value under any other situation is not legal, however
Chris@1 137 a decoder should tolerate both possibilities.
Chris@1 138
Chris@1 139