Chris@1: % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- Chris@1: %!TEX root = Vorbis_I_spec.tex Chris@1: % $Id$ Chris@1: \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} Chris@1: Chris@1: \subsection{Overview} Chris@1: Chris@1: This document describes using Ogg logical and physical transport Chris@1: streams to encapsulate Vorbis compressed audio packet data into file Chris@1: form. Chris@1: Chris@1: The \xref{vorbis:spec:intro} provides an overview of the construction Chris@1: of Vorbis audio packets. Chris@1: Chris@1: The \href{oggstream.html}{Ogg Chris@1: bitstream overview} and \href{framing.html}{Ogg logical Chris@1: bitstream and framing spec} provide detailed descriptions of Ogg Chris@1: transport streams. This specification document assumes a working Chris@1: knowledge of the concepts covered in these named backround Chris@1: documents. Please read them first. Chris@1: Chris@1: \subsubsection{Restrictions} Chris@1: Chris@1: The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis Chris@1: streams use Ogg transport streams in degenerate, unmultiplexed Chris@1: form only. That is: Chris@1: Chris@1: \begin{itemize} Chris@1: \item Chris@1: A meta-headerless Ogg file encapsulates the Vorbis I packets Chris@1: Chris@1: \item Chris@1: The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). Chris@1: Chris@1: \item Chris@1: The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) Chris@1: Chris@1: \end{itemize} Chris@1: Chris@1: Chris@1: This is not to say that it is not currently possible to multiplex Chris@1: Vorbis with other media types into a multi-stream Ogg file. At the Chris@1: time this document was written, Ogg was becoming a popular container Chris@1: for low-bitrate movies consisting of DivX video and Vorbis audio. Chris@1: However, a 'Vorbis I audio file' is taken to imply Vorbis audio Chris@1: existing alone within a degenerate Ogg stream. A compliant 'Vorbis Chris@1: audio player' is not required to implement Ogg support beyond the Chris@1: specific support of Vorbis within a degenrate Ogg stream (naturally, Chris@1: application authors are encouraged to support full multiplexed Ogg Chris@1: handling). Chris@1: Chris@1: Chris@1: Chris@1: Chris@1: \subsubsection{MIME type} Chris@1: Chris@1: The MIME type of Ogg files depend on the context. Specifically, complex Chris@1: multimedia and applications should use \literal{application/ogg}, Chris@1: while visual media should use \literal{video/ogg}, and audio Chris@1: \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear Chris@1: in any of those types. RTP encapsulated Vorbis should use Chris@1: \literal{audio/vorbis} + \literal{audio/vorbis-config}. Chris@1: Chris@1: Chris@1: \subsection{Encapsulation} Chris@1: Chris@1: Ogg encapsulation of a Vorbis packet stream is straightforward. Chris@1: Chris@1: \begin{itemize} Chris@1: Chris@1: \item Chris@1: The first Vorbis packet (the identification header), which Chris@1: uniquely identifies a stream as Vorbis audio, is placed alone in the Chris@1: first page of the logical Ogg stream. This results in a first Ogg Chris@1: page of exactly 58 bytes at the very beginning of the logical stream. Chris@1: Chris@1: Chris@1: \item Chris@1: This first page is marked 'beginning of stream' in the page flags. Chris@1: Chris@1: Chris@1: \item Chris@1: The second and third vorbis packets (comment and setup Chris@1: headers) may span one or more pages beginning on the second page of Chris@1: the logical stream. However many pages they span, the third header Chris@1: packet finishes the page on which it ends. The next (first audio) packet Chris@1: must begin on a fresh page. Chris@1: Chris@1: Chris@1: \item Chris@1: The granule position of these first pages containing only headers is zero. Chris@1: Chris@1: Chris@1: \item Chris@1: The first audio packet of the logical stream begins a fresh Ogg page. Chris@1: Chris@1: Chris@1: \item Chris@1: Packets are placed into ogg pages in order until the end of stream. Chris@1: Chris@1: Chris@1: \item Chris@1: The last page is marked 'end of stream' in the page flags. Chris@1: Chris@1: Chris@1: \item Chris@1: Vorbis packets may span page boundaries. Chris@1: Chris@1: Chris@1: \item Chris@1: The granule position of pages containing Vorbis audio is in units Chris@1: of PCM audio samples (per channel; a stereo stream's granule position Chris@1: does not increment at twice the speed of a mono stream). Chris@1: Chris@1: Chris@1: \item Chris@1: The granule position of a page represents the end PCM sample Chris@1: position of the last packet \emph{completed} on that Chris@1: page. The 'last PCM sample' is the last complete sample returned by Chris@1: decode, not an internal sample awaiting lapping with a Chris@1: subsequent block. A page that is entirely spanned by a single Chris@1: packet (that completes on a subsequent page) has no granule Chris@1: position, and the granule position is set to '-1'. Chris@1: Chris@1: Chris@1: Note that the last decoded (fully lapped) PCM sample from a packet Chris@1: is not necessarily the middle sample from that block. If, eg, the Chris@1: current Vorbis packet encodes a "long block" and the next Vorbis Chris@1: packet encodes a "short block", the last decodable sample from the Chris@1: current packet be at position (3*long\_block\_length/4) - Chris@1: (short\_block\_length/4). Chris@1: Chris@1: Chris@1: \item Chris@1: The granule (PCM) position of the first page need not indicate Chris@1: that the stream started at position zero. Although the granule Chris@1: position belongs to the last completed packet on the page and a Chris@1: valid granule position must be positive, by Chris@1: inference it may indicate that the PCM position of the beginning Chris@1: of audio is positive or negative. Chris@1: Chris@1: Chris@1: \begin{itemize} Chris@1: \item Chris@1: A positive starting value simply indicates that this stream begins at Chris@1: some positive time offset, potentially within a larger Chris@1: program. This is a common case when connecting to the middle Chris@1: of broadcast stream. Chris@1: Chris@1: \item Chris@1: A negative value indicates that Chris@1: output samples preceeding time zero should be discarded during Chris@1: decoding; this technique is used to allow sample-granularity Chris@1: editing of the stream start time of already-encoded Vorbis Chris@1: streams. The number of samples to be discarded must not exceed Chris@1: the overlap-add span of the first two audio packets. Chris@1: Chris@1: \end{itemize} Chris@1: Chris@1: Chris@1: In both of these cases in which the initial audio PCM starting Chris@1: offset is nonzero, the second finished audio packet must flush the Chris@1: page on which it appears and the third packet begin a fresh page. Chris@1: This allows the decoder to always be able to perform PCM position Chris@1: adjustments before needing to return any PCM data from synthesis, Chris@1: resulting in correct positioning information without any aditional Chris@1: seeking logic. Chris@1: Chris@1: Chris@1: \begin{note} Chris@1: Failure to do so should, at worst, cause a Chris@1: decoder implementation to return incorrect positioning information Chris@1: for seeking operations at the very beginning of the stream. Chris@1: \end{note} Chris@1: Chris@1: Chris@1: \item Chris@1: A granule position on the final page in a stream that indicates Chris@1: less audio data than the final packet would normally return is used to Chris@1: end the stream on other than even frame boundaries. The difference Chris@1: between the actual available data returned and the declared amount Chris@1: indicates how many trailing samples to discard from the decoding Chris@1: process. Chris@1: Chris@1: \end{itemize}