annotate src/libvorbis-1.3.3/doc/vorbis-clip.txt @ 169:223a55898ab9 tip default

Add null config files
author Chris Cannam <cannam@all-day-breakfast.com>
date Mon, 02 Mar 2020 14:03:47 +0000
parents 98c1576536ae
children
rev   line source
cannam@86 1 Topic:
cannam@86 2
cannam@86 3 Sample granularity editing of a Vorbis file; inferred arbitrary sample
cannam@86 4 length starting offsets / PCM stream lengths
cannam@86 5
cannam@86 6 Overview:
cannam@86 7
cannam@86 8 Vorbis, like mp3, is a frame-based* audio compression where audio is
cannam@86 9 broken up into discrete short time segments. These segments are
cannam@86 10 'atomic' that is, one must recover the entire short time segment from
cannam@86 11 the frame packet; there's no way to recover only a part of the PCM time
cannam@86 12 segment from part of the coded packet without expanding the entire
cannam@86 13 packet and then discarding a portion of the resulting PCM audio.
cannam@86 14
cannam@86 15 * In mp3, the data segment representing a given time period is called
cannam@86 16 a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.
cannam@86 17
cannam@86 18 Thus, when we edit a Vorbis stream, the finest physical editing
cannam@86 19 granularity is on these packet boundaries (the mp3 case is
cannam@86 20 actually somewhat more complex and mp3 editing is more complicated
cannam@86 21 than just snipping on a frame boundary because time data can be spread
cannam@86 22 backward or forward over frames. In Vorbis, packets are all
cannam@86 23 stand-alone). Thus, at the physical packet level, Vorbis is still
cannam@86 24 limited to streams that contain an integral number of packets.
cannam@86 25
cannam@86 26 However, Vorbis streams may still exactly represent and be edited to a
cannam@86 27 PCM stream of arbitrary length and starting offset without padding the
cannam@86 28 beginning or end of the decoded stream or requiring that the desired
cannam@86 29 edit points be packet aligned. Vorbis makes use of Ogg stream
cannam@86 30 framing, and this framing provides time-stamping data, called a
cannam@86 31 'granule position'; our starting offset and finished stream length may
cannam@86 32 be inferred from correct usage of the granule position data.
cannam@86 33
cannam@86 34 Time stamping mechanism:
cannam@86 35
cannam@86 36 Vorbis packets are bundled into into Ogg pages (note that pages do not
cannam@86 37 necessarily contain integral numbers of packets, but that isn't
cannam@86 38 inportant in this discussion. More about Ogg framing can be found in
cannam@86 39 ogg/doc/framing.html). Each page that contains a packet boundary is
cannam@86 40 stamped with the absolute sample-granularity offset of the data, that
cannam@86 41 is, 'complete samples-to-date' up to the last completed packet of that
cannam@86 42 page. (The same mechanism is used for eg, video, where the number
cannam@86 43 represents complete 2-D frames, and so on).
cannam@86 44
cannam@86 45 (It's possible but rare for a packet to span more than two pages such
cannam@86 46 that page[s] in the middle have no packet boundary; these packets have
cannam@86 47 a granule position of '-1'.)
cannam@86 48
cannam@86 49 This granule position mechaism in Ogg is used by Vorbis to indicate when the
cannam@86 50 PCM data intended to be represented in a Vorbis segment begins a
cannam@86 51 number of samples into the data represented by the first packet[s]
cannam@86 52 and/or ends before the physical PCM data represented in the last
cannam@86 53 packet[s].
cannam@86 54
cannam@86 55 File length a non-integral number of frames:
cannam@86 56
cannam@86 57 A file to be encoded in Vorbis will probably not encode into an
cannam@86 58 integral number of packets; such a file is encoded with the last
cannam@86 59 packet containing 'extra'* samples. These samples are not padding; they
cannam@86 60 will be discarded in decode.
cannam@86 61
cannam@86 62 *(For best results, the encoder should use extra samples that preserve
cannam@86 63 the character of the last frame. Simply setting them to zero will
cannam@86 64 introduce a 'cliff' that's hard to encode, resulting in spread-frame
cannam@86 65 noise. Libvorbis extrapolates the last frame past the end of data to
cannam@86 66 produce the extra samples. Even simply duplicating the last value is
cannam@86 67 better than clamping the signal to zero).
cannam@86 68
cannam@86 69 The encoder indicates to the decoder that the file is actually shorter
cannam@86 70 than all of the samples ('original' + 'extra') by setting the granule
cannam@86 71 position in the last page to a short value, that is, the last
cannam@86 72 timestamp is the original length of the file discarding extra samples.
cannam@86 73 The decoder will see that the number of samples it has decoded in the
cannam@86 74 last page is too many; it is 'original' + 'extra', where the
cannam@86 75 granulepos says that through the last packet we only have 'original'
cannam@86 76 number of samples. The decoder then ignores the 'extra' samples.
cannam@86 77 This behavior is to occur only when the end-of-stream bit is set in
cannam@86 78 the page (indicating last page of the logical stream).
cannam@86 79
cannam@86 80 Note that it not legal for the granule position of the last page to
cannam@86 81 indicate that there are more samples in the file than actually exist,
cannam@86 82 however, implementations should handle such an illegal file gracefully
cannam@86 83 in the interests of robust programming.
cannam@86 84
cannam@86 85 Beginning point not on integral packet boundary:
cannam@86 86
cannam@86 87 It is possible that we will the PCM data represented by a Vorbis
cannam@86 88 stream to begin at a position later than where the decoded PCM data
cannam@86 89 really begins after an integral packet boundary, a situation analagous
cannam@86 90 to the above description where the PCM data does not end at an
cannam@86 91 integral packet boundary. The easiest example is taking a clip out of
cannam@86 92 a larger Vorbis stream, and choosing a beginning point of the clip
cannam@86 93 that is not on a packet boundary; we need to ignore a few samples to
cannam@86 94 get the desired beginning point.
cannam@86 95
cannam@86 96 The process of marking the desired beginning point is similar to
cannam@86 97 marking an arbitrary ending point. If the encoder wishes sample zero
cannam@86 98 to be some location past the actual beginning of data, it associates a
cannam@86 99 'short' granule position value with the completion of the second*
cannam@86 100 audio packet. The granule position is associated with the second
cannam@86 101 packet simply by making sure the second packet completes its page.
cannam@86 102
cannam@86 103 *(We associate the short value with the second packet for two reasons.
cannam@86 104 a) The first packet only primes the overlap/add buffer. No data is
cannam@86 105 returned before decoding the second packet; this places the decision
cannam@86 106 information at the point of decision. b) Placing the short value on
cannam@86 107 the first packet would make the value negative (as the first packet
cannam@86 108 normally represents position zero); a negative value would break the
cannam@86 109 requirement that granule positions increase; the headers have
cannam@86 110 position values of zero)
cannam@86 111
cannam@86 112 The decoder sees that on the first page that will return
cannam@86 113 data from the overlap/add queue, we have more samples than the granule
cannam@86 114 position accounts for, and discards the 'surplus' from the beginning
cannam@86 115 of the queue.
cannam@86 116
cannam@86 117 Note that short granule values (indicating less than the actually
cannam@86 118 returned about of data) are not legal in the Vorbis spec outside of
cannam@86 119 indicating beginning and ending sample positions. However, decoders
cannam@86 120 should, at minimum, tolerate inadvertant short values elsewhere in the
cannam@86 121 stream (just as they should tolerate out-of-order/non-increasing
cannam@86 122 granulepos values, although this too is illegal).
cannam@86 123
cannam@86 124 Beginning point at arbitrary positive timestamp (no 'zero' sample):
cannam@86 125
cannam@86 126 It's also possible that the granule position of the first page of an
cannam@86 127 audio stream is a 'long value', that is, a value larger than the
cannam@86 128 amount of PCM audio decoded. This implies only that we are starting
cannam@86 129 playback at some point into the logical stream, a potentially common
cannam@86 130 occurence in streaming applications where the decoder may be
cannam@86 131 connecting into a live stream. The decoder should not treat the long
cannam@86 132 value specially.
cannam@86 133
cannam@86 134 A long value elsewhere in the stream would normally occur only when a
cannam@86 135 page is lost or out of sequence, as indicated by the page's sequence
cannam@86 136 number. A long value under any other situation is not legal, however
cannam@86 137 a decoder should tolerate both possibilities.
cannam@86 138
cannam@86 139