Chris@1
|
1 % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
|
Chris@1
|
2 %!TEX root = Vorbis_I_spec.tex
|
Chris@1
|
3 % $Id$
|
Chris@1
|
4 \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
|
Chris@1
|
5
|
Chris@1
|
6 \subsection{Overview}
|
Chris@1
|
7
|
Chris@1
|
8 This document describes using Ogg logical and physical transport
|
Chris@1
|
9 streams to encapsulate Vorbis compressed audio packet data into file
|
Chris@1
|
10 form.
|
Chris@1
|
11
|
Chris@1
|
12 The \xref{vorbis:spec:intro} provides an overview of the construction
|
Chris@1
|
13 of Vorbis audio packets.
|
Chris@1
|
14
|
Chris@1
|
15 The \href{oggstream.html}{Ogg
|
Chris@1
|
16 bitstream overview} and \href{framing.html}{Ogg logical
|
Chris@1
|
17 bitstream and framing spec} provide detailed descriptions of Ogg
|
Chris@1
|
18 transport streams. This specification document assumes a working
|
Chris@1
|
19 knowledge of the concepts covered in these named backround
|
Chris@1
|
20 documents. Please read them first.
|
Chris@1
|
21
|
Chris@1
|
22 \subsubsection{Restrictions}
|
Chris@1
|
23
|
Chris@1
|
24 The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
|
Chris@1
|
25 streams use Ogg transport streams in degenerate, unmultiplexed
|
Chris@1
|
26 form only. That is:
|
Chris@1
|
27
|
Chris@1
|
28 \begin{itemize}
|
Chris@1
|
29 \item
|
Chris@1
|
30 A meta-headerless Ogg file encapsulates the Vorbis I packets
|
Chris@1
|
31
|
Chris@1
|
32 \item
|
Chris@1
|
33 The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
|
Chris@1
|
34
|
Chris@1
|
35 \item
|
Chris@1
|
36 The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
|
Chris@1
|
37
|
Chris@1
|
38 \end{itemize}
|
Chris@1
|
39
|
Chris@1
|
40
|
Chris@1
|
41 This is not to say that it is not currently possible to multiplex
|
Chris@1
|
42 Vorbis with other media types into a multi-stream Ogg file. At the
|
Chris@1
|
43 time this document was written, Ogg was becoming a popular container
|
Chris@1
|
44 for low-bitrate movies consisting of DivX video and Vorbis audio.
|
Chris@1
|
45 However, a 'Vorbis I audio file' is taken to imply Vorbis audio
|
Chris@1
|
46 existing alone within a degenerate Ogg stream. A compliant 'Vorbis
|
Chris@1
|
47 audio player' is not required to implement Ogg support beyond the
|
Chris@1
|
48 specific support of Vorbis within a degenrate Ogg stream (naturally,
|
Chris@1
|
49 application authors are encouraged to support full multiplexed Ogg
|
Chris@1
|
50 handling).
|
Chris@1
|
51
|
Chris@1
|
52
|
Chris@1
|
53
|
Chris@1
|
54
|
Chris@1
|
55 \subsubsection{MIME type}
|
Chris@1
|
56
|
Chris@1
|
57 The MIME type of Ogg files depend on the context. Specifically, complex
|
Chris@1
|
58 multimedia and applications should use \literal{application/ogg},
|
Chris@1
|
59 while visual media should use \literal{video/ogg}, and audio
|
Chris@1
|
60 \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
|
Chris@1
|
61 in any of those types. RTP encapsulated Vorbis should use
|
Chris@1
|
62 \literal{audio/vorbis} + \literal{audio/vorbis-config}.
|
Chris@1
|
63
|
Chris@1
|
64
|
Chris@1
|
65 \subsection{Encapsulation}
|
Chris@1
|
66
|
Chris@1
|
67 Ogg encapsulation of a Vorbis packet stream is straightforward.
|
Chris@1
|
68
|
Chris@1
|
69 \begin{itemize}
|
Chris@1
|
70
|
Chris@1
|
71 \item
|
Chris@1
|
72 The first Vorbis packet (the identification header), which
|
Chris@1
|
73 uniquely identifies a stream as Vorbis audio, is placed alone in the
|
Chris@1
|
74 first page of the logical Ogg stream. This results in a first Ogg
|
Chris@1
|
75 page of exactly 58 bytes at the very beginning of the logical stream.
|
Chris@1
|
76
|
Chris@1
|
77
|
Chris@1
|
78 \item
|
Chris@1
|
79 This first page is marked 'beginning of stream' in the page flags.
|
Chris@1
|
80
|
Chris@1
|
81
|
Chris@1
|
82 \item
|
Chris@1
|
83 The second and third vorbis packets (comment and setup
|
Chris@1
|
84 headers) may span one or more pages beginning on the second page of
|
Chris@1
|
85 the logical stream. However many pages they span, the third header
|
Chris@1
|
86 packet finishes the page on which it ends. The next (first audio) packet
|
Chris@1
|
87 must begin on a fresh page.
|
Chris@1
|
88
|
Chris@1
|
89
|
Chris@1
|
90 \item
|
Chris@1
|
91 The granule position of these first pages containing only headers is zero.
|
Chris@1
|
92
|
Chris@1
|
93
|
Chris@1
|
94 \item
|
Chris@1
|
95 The first audio packet of the logical stream begins a fresh Ogg page.
|
Chris@1
|
96
|
Chris@1
|
97
|
Chris@1
|
98 \item
|
Chris@1
|
99 Packets are placed into ogg pages in order until the end of stream.
|
Chris@1
|
100
|
Chris@1
|
101
|
Chris@1
|
102 \item
|
Chris@1
|
103 The last page is marked 'end of stream' in the page flags.
|
Chris@1
|
104
|
Chris@1
|
105
|
Chris@1
|
106 \item
|
Chris@1
|
107 Vorbis packets may span page boundaries.
|
Chris@1
|
108
|
Chris@1
|
109
|
Chris@1
|
110 \item
|
Chris@1
|
111 The granule position of pages containing Vorbis audio is in units
|
Chris@1
|
112 of PCM audio samples (per channel; a stereo stream's granule position
|
Chris@1
|
113 does not increment at twice the speed of a mono stream).
|
Chris@1
|
114
|
Chris@1
|
115
|
Chris@1
|
116 \item
|
Chris@1
|
117 The granule position of a page represents the end PCM sample
|
Chris@1
|
118 position of the last packet \emph{completed} on that
|
Chris@1
|
119 page. The 'last PCM sample' is the last complete sample returned by
|
Chris@1
|
120 decode, not an internal sample awaiting lapping with a
|
Chris@1
|
121 subsequent block. A page that is entirely spanned by a single
|
Chris@1
|
122 packet (that completes on a subsequent page) has no granule
|
Chris@1
|
123 position, and the granule position is set to '-1'.
|
Chris@1
|
124
|
Chris@1
|
125
|
Chris@1
|
126 Note that the last decoded (fully lapped) PCM sample from a packet
|
Chris@1
|
127 is not necessarily the middle sample from that block. If, eg, the
|
Chris@1
|
128 current Vorbis packet encodes a "long block" and the next Vorbis
|
Chris@1
|
129 packet encodes a "short block", the last decodable sample from the
|
Chris@1
|
130 current packet be at position (3*long\_block\_length/4) -
|
Chris@1
|
131 (short\_block\_length/4).
|
Chris@1
|
132
|
Chris@1
|
133
|
Chris@1
|
134 \item
|
Chris@1
|
135 The granule (PCM) position of the first page need not indicate
|
Chris@1
|
136 that the stream started at position zero. Although the granule
|
Chris@1
|
137 position belongs to the last completed packet on the page and a
|
Chris@1
|
138 valid granule position must be positive, by
|
Chris@1
|
139 inference it may indicate that the PCM position of the beginning
|
Chris@1
|
140 of audio is positive or negative.
|
Chris@1
|
141
|
Chris@1
|
142
|
Chris@1
|
143 \begin{itemize}
|
Chris@1
|
144 \item
|
Chris@1
|
145 A positive starting value simply indicates that this stream begins at
|
Chris@1
|
146 some positive time offset, potentially within a larger
|
Chris@1
|
147 program. This is a common case when connecting to the middle
|
Chris@1
|
148 of broadcast stream.
|
Chris@1
|
149
|
Chris@1
|
150 \item
|
Chris@1
|
151 A negative value indicates that
|
Chris@1
|
152 output samples preceeding time zero should be discarded during
|
Chris@1
|
153 decoding; this technique is used to allow sample-granularity
|
Chris@1
|
154 editing of the stream start time of already-encoded Vorbis
|
Chris@1
|
155 streams. The number of samples to be discarded must not exceed
|
Chris@1
|
156 the overlap-add span of the first two audio packets.
|
Chris@1
|
157
|
Chris@1
|
158 \end{itemize}
|
Chris@1
|
159
|
Chris@1
|
160
|
Chris@1
|
161 In both of these cases in which the initial audio PCM starting
|
Chris@1
|
162 offset is nonzero, the second finished audio packet must flush the
|
Chris@1
|
163 page on which it appears and the third packet begin a fresh page.
|
Chris@1
|
164 This allows the decoder to always be able to perform PCM position
|
Chris@1
|
165 adjustments before needing to return any PCM data from synthesis,
|
Chris@1
|
166 resulting in correct positioning information without any aditional
|
Chris@1
|
167 seeking logic.
|
Chris@1
|
168
|
Chris@1
|
169
|
Chris@1
|
170 \begin{note}
|
Chris@1
|
171 Failure to do so should, at worst, cause a
|
Chris@1
|
172 decoder implementation to return incorrect positioning information
|
Chris@1
|
173 for seeking operations at the very beginning of the stream.
|
Chris@1
|
174 \end{note}
|
Chris@1
|
175
|
Chris@1
|
176
|
Chris@1
|
177 \item
|
Chris@1
|
178 A granule position on the final page in a stream that indicates
|
Chris@1
|
179 less audio data than the final packet would normally return is used to
|
Chris@1
|
180 end the stream on other than even frame boundaries. The difference
|
Chris@1
|
181 between the actual available data returned and the declared amount
|
Chris@1
|
182 indicates how many trailing samples to discard from the decoding
|
Chris@1
|
183 process.
|
Chris@1
|
184
|
Chris@1
|
185 \end{itemize}
|