cannam@86: % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- cannam@86: %!TEX root = Vorbis_I_spec.tex cannam@86: % $Id$ cannam@86: \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} cannam@86: cannam@86: \subsection{Overview} cannam@86: cannam@86: This document describes using Ogg logical and physical transport cannam@86: streams to encapsulate Vorbis compressed audio packet data into file cannam@86: form. cannam@86: cannam@86: The \xref{vorbis:spec:intro} provides an overview of the construction cannam@86: of Vorbis audio packets. cannam@86: cannam@86: The \href{oggstream.html}{Ogg cannam@86: bitstream overview} and \href{framing.html}{Ogg logical cannam@86: bitstream and framing spec} provide detailed descriptions of Ogg cannam@86: transport streams. This specification document assumes a working cannam@86: knowledge of the concepts covered in these named backround cannam@86: documents. Please read them first. cannam@86: cannam@86: \subsubsection{Restrictions} cannam@86: cannam@86: The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis cannam@86: streams use Ogg transport streams in degenerate, unmultiplexed cannam@86: form only. That is: cannam@86: cannam@86: \begin{itemize} cannam@86: \item cannam@86: A meta-headerless Ogg file encapsulates the Vorbis I packets cannam@86: cannam@86: \item cannam@86: The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). cannam@86: cannam@86: \item cannam@86: The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) cannam@86: cannam@86: \end{itemize} cannam@86: cannam@86: cannam@86: This is not to say that it is not currently possible to multiplex cannam@86: Vorbis with other media types into a multi-stream Ogg file. At the cannam@86: time this document was written, Ogg was becoming a popular container cannam@86: for low-bitrate movies consisting of DivX video and Vorbis audio. cannam@86: However, a 'Vorbis I audio file' is taken to imply Vorbis audio cannam@86: existing alone within a degenerate Ogg stream. A compliant 'Vorbis cannam@86: audio player' is not required to implement Ogg support beyond the cannam@86: specific support of Vorbis within a degenrate Ogg stream (naturally, cannam@86: application authors are encouraged to support full multiplexed Ogg cannam@86: handling). cannam@86: cannam@86: cannam@86: cannam@86: cannam@86: \subsubsection{MIME type} cannam@86: cannam@86: The MIME type of Ogg files depend on the context. Specifically, complex cannam@86: multimedia and applications should use \literal{application/ogg}, cannam@86: while visual media should use \literal{video/ogg}, and audio cannam@86: \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear cannam@86: in any of those types. RTP encapsulated Vorbis should use cannam@86: \literal{audio/vorbis} + \literal{audio/vorbis-config}. cannam@86: cannam@86: cannam@86: \subsection{Encapsulation} cannam@86: cannam@86: Ogg encapsulation of a Vorbis packet stream is straightforward. cannam@86: cannam@86: \begin{itemize} cannam@86: cannam@86: \item cannam@86: The first Vorbis packet (the identification header), which cannam@86: uniquely identifies a stream as Vorbis audio, is placed alone in the cannam@86: first page of the logical Ogg stream. This results in a first Ogg cannam@86: page of exactly 58 bytes at the very beginning of the logical stream. cannam@86: cannam@86: cannam@86: \item cannam@86: This first page is marked 'beginning of stream' in the page flags. cannam@86: cannam@86: cannam@86: \item cannam@86: The second and third vorbis packets (comment and setup cannam@86: headers) may span one or more pages beginning on the second page of cannam@86: the logical stream. However many pages they span, the third header cannam@86: packet finishes the page on which it ends. The next (first audio) packet cannam@86: must begin on a fresh page. cannam@86: cannam@86: cannam@86: \item cannam@86: The granule position of these first pages containing only headers is zero. cannam@86: cannam@86: cannam@86: \item cannam@86: The first audio packet of the logical stream begins a fresh Ogg page. cannam@86: cannam@86: cannam@86: \item cannam@86: Packets are placed into ogg pages in order until the end of stream. cannam@86: cannam@86: cannam@86: \item cannam@86: The last page is marked 'end of stream' in the page flags. cannam@86: cannam@86: cannam@86: \item cannam@86: Vorbis packets may span page boundaries. cannam@86: cannam@86: cannam@86: \item cannam@86: The granule position of pages containing Vorbis audio is in units cannam@86: of PCM audio samples (per channel; a stereo stream's granule position cannam@86: does not increment at twice the speed of a mono stream). cannam@86: cannam@86: cannam@86: \item cannam@86: The granule position of a page represents the end PCM sample cannam@86: position of the last packet \emph{completed} on that cannam@86: page. The 'last PCM sample' is the last complete sample returned by cannam@86: decode, not an internal sample awaiting lapping with a cannam@86: subsequent block. A page that is entirely spanned by a single cannam@86: packet (that completes on a subsequent page) has no granule cannam@86: position, and the granule position is set to '-1'. cannam@86: cannam@86: cannam@86: Note that the last decoded (fully lapped) PCM sample from a packet cannam@86: is not necessarily the middle sample from that block. If, eg, the cannam@86: current Vorbis packet encodes a "long block" and the next Vorbis cannam@86: packet encodes a "short block", the last decodable sample from the cannam@86: current packet be at position (3*long\_block\_length/4) - cannam@86: (short\_block\_length/4). cannam@86: cannam@86: cannam@86: \item cannam@86: The granule (PCM) position of the first page need not indicate cannam@86: that the stream started at position zero. Although the granule cannam@86: position belongs to the last completed packet on the page and a cannam@86: valid granule position must be positive, by cannam@86: inference it may indicate that the PCM position of the beginning cannam@86: of audio is positive or negative. cannam@86: cannam@86: cannam@86: \begin{itemize} cannam@86: \item cannam@86: A positive starting value simply indicates that this stream begins at cannam@86: some positive time offset, potentially within a larger cannam@86: program. This is a common case when connecting to the middle cannam@86: of broadcast stream. cannam@86: cannam@86: \item cannam@86: A negative value indicates that cannam@86: output samples preceeding time zero should be discarded during cannam@86: decoding; this technique is used to allow sample-granularity cannam@86: editing of the stream start time of already-encoded Vorbis cannam@86: streams. The number of samples to be discarded must not exceed cannam@86: the overlap-add span of the first two audio packets. cannam@86: cannam@86: \end{itemize} cannam@86: cannam@86: cannam@86: In both of these cases in which the initial audio PCM starting cannam@86: offset is nonzero, the second finished audio packet must flush the cannam@86: page on which it appears and the third packet begin a fresh page. cannam@86: This allows the decoder to always be able to perform PCM position cannam@86: adjustments before needing to return any PCM data from synthesis, cannam@86: resulting in correct positioning information without any aditional cannam@86: seeking logic. cannam@86: cannam@86: cannam@86: \begin{note} cannam@86: Failure to do so should, at worst, cause a cannam@86: decoder implementation to return incorrect positioning information cannam@86: for seeking operations at the very beginning of the stream. cannam@86: \end{note} cannam@86: cannam@86: cannam@86: \item cannam@86: A granule position on the final page in a stream that indicates cannam@86: less audio data than the final packet would normally return is used to cannam@86: end the stream on other than even frame boundaries. The difference cannam@86: between the actual available data returned and the declared amount cannam@86: indicates how many trailing samples to discard from the decoding cannam@86: process. cannam@86: cannam@86: \end{itemize}