annotate toolboxes/mp3readwrite/html/demo_mp3readwrite.html @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000 (2015-02-10)
parents
children
rev   line source
wolffd@0 1
wolffd@0 2 <!DOCTYPE html
wolffd@0 3 PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
wolffd@0 4 <html><head>
wolffd@0 5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
wolffd@0 6 <!--
wolffd@0 7 This HTML is auto-generated from an M-file.
wolffd@0 8 To make changes, update the M-file and republish this document.
wolffd@0 9 --><title>MP3 reading and writing</title><meta name="generator" content="MATLAB 7.10"><meta name="date" content="2010-04-09"><meta name="m-file" content="demo_mp3readwrite"><style type="text/css">
wolffd@0 10
wolffd@0 11 body {
wolffd@0 12 background-color: white;
wolffd@0 13 margin:10px;
wolffd@0 14 }
wolffd@0 15
wolffd@0 16 h1 {
wolffd@0 17 color: #990000;
wolffd@0 18 font-size: x-large;
wolffd@0 19 }
wolffd@0 20
wolffd@0 21 h2 {
wolffd@0 22 color: #990000;
wolffd@0 23 font-size: medium;
wolffd@0 24 }
wolffd@0 25
wolffd@0 26 /* Make the text shrink to fit narrow windows, but not stretch too far in
wolffd@0 27 wide windows. */
wolffd@0 28 p,h1,h2,div.content div {
wolffd@0 29 max-width: 600px;
wolffd@0 30 /* Hack for IE6 */
wolffd@0 31 width: auto !important; width: 600px;
wolffd@0 32 }
wolffd@0 33
wolffd@0 34 pre.codeinput {
wolffd@0 35 background: #EEEEEE;
wolffd@0 36 padding: 10px;
wolffd@0 37 }
wolffd@0 38 @media print {
wolffd@0 39 pre.codeinput {word-wrap:break-word; width:100%;}
wolffd@0 40 }
wolffd@0 41
wolffd@0 42 span.keyword {color: #0000FF}
wolffd@0 43 span.comment {color: #228B22}
wolffd@0 44 span.string {color: #A020F0}
wolffd@0 45 span.untermstring {color: #B20000}
wolffd@0 46 span.syscmd {color: #B28C00}
wolffd@0 47
wolffd@0 48 pre.codeoutput {
wolffd@0 49 color: #666666;
wolffd@0 50 padding: 10px;
wolffd@0 51 }
wolffd@0 52
wolffd@0 53 pre.error {
wolffd@0 54 color: red;
wolffd@0 55 }
wolffd@0 56
wolffd@0 57 p.footer {
wolffd@0 58 text-align: right;
wolffd@0 59 font-size: xx-small;
wolffd@0 60 font-weight: lighter;
wolffd@0 61 font-style: italic;
wolffd@0 62 color: gray;
wolffd@0 63 }
wolffd@0 64
wolffd@0 65 </style></head><body><div class="content"><h1>MP3 reading and writing</h1><!--introduction--><p>These function, mp3read and mp3write, aim to exactly duplicate the operation of wavread and wavwrite for accessing soundfiles, except the soundfiles are in Mpeg-Audio layer 3 (MP3) compressed format. All the hard work is done by external binaries written by others: mp3info to query the format of existing mp3 files, mpg123 to decode mp3 files, and lame to encode audio files. Binaries for these files are widely available (and may be included in this distribution).</p><p>These functions were originally developed for access to very large mp3 files (i.e. many hours long), and so avoid creating the entire uncompressed audio stream if possible. mp3read allows you to specify the range of frames you want to read (as a second argument), and mp3read will construct an mpg123 command that skips blocks to decode only the part of the file that is required. This can be much quicker (and require less memory/temporary disk) than decoding the whole file.</p><p>mpg123 also provides for "on the fly" downsampling at conversion to mono, which are supported as extra options in mp3read.</p><p>mpg123 can read MP3s across the network. This is supported if the FILE argument is a URL (e.g. beginning 'http://...').</p><p>mp3info sometimes gets the file size wrong (as returned by the mp3read(...'size') syntax). I'm not sure when this happens exactly, but it's probably a result of VBR files. In the worst case, figuring the number of samples in such a file requires scanning through the whole file, and mp3info doesn't usually do this.</p><p>For more information, including advice on handling MP4 files, see <a href="http://labrosa.ee.columbia.edu/matlab/mp3read.html">http://labrosa.ee.columbia.edu/matlab/mp3read.html</a></p><!--/introduction--><h2>Contents</h2><div><ul><li><a href="#1">Example usage</a></li><li><a href="#2">Delay, size, and alignment</a></li><li><a href="#3">External binaries</a></li><li><a href="#4">Installation</a></li></ul></div><h2>Example usage<a name="1"></a></h2><p>Here, we read a wav file in, then write it out as an MP3, then read the resulting MP3 back in, and compare it to the original file.</p><pre class="codeinput"><span class="comment">% Read an audio waveform</span>
wolffd@0 66 [d,sr] = wavread(<span class="string">'piano.wav'</span>);
wolffd@0 67 <span class="comment">% Save to mp3 (default settings)</span>
wolffd@0 68 mp3write(d,sr,<span class="string">'piano.mp3'</span>);
wolffd@0 69 <span class="comment">% Read it back again</span>
wolffd@0 70 [d2,sr] = mp3read(<span class="string">'piano.mp3'</span>);
wolffd@0 71 <span class="comment">% mp3 encoding involves some extra padding at each end; we attempt</span>
wolffd@0 72 <span class="comment">% to cut it off at the start, but can't do that at the end, because</span>
wolffd@0 73 <span class="comment">% mp3read doesn't know how long the original was. But we do, so..</span>
wolffd@0 74 <span class="comment">% Chop it down to be the same length as the original</span>
wolffd@0 75 d2 = d2(1:length(d),:);
wolffd@0 76 <span class="comment">% What is the SNR (distortion)?</span>
wolffd@0 77 ddiff = d - d2;
wolffd@0 78 disp([<span class="string">'SNR is '</span>,num2str(10*log10(sum(d(:).^2)/sum(ddiff(:).^2))),<span class="string">' dB'</span>]);
wolffd@0 79 <span class="comment">% Do they look similar?</span>
wolffd@0 80 subplot(211)
wolffd@0 81 specgram(d(:,1),1024,sr);
wolffd@0 82 subplot(212)
wolffd@0 83 plot(1:5000,d(10000+(1:5000),1),1:5000,d2(10000+(1:5000)));
wolffd@0 84 <span class="comment">% Yes, pretty close</span>
wolffd@0 85 <span class="comment">%</span>
wolffd@0 86 <span class="comment">% NB: lame followed by mpg123 causes a little attenuation; you</span>
wolffd@0 87 <span class="comment">% can get a better match by scaling up the read-back waveform:</span>
wolffd@0 88 ddiff = d - 1.052*d2;
wolffd@0 89 disp([<span class="string">'SNR is '</span>,num2str(10*log10(sum(d(:).^2)/sum(ddiff(:).^2))),<span class="string">' dB'</span>]);
wolffd@0 90 </pre><pre class="codeoutput">Warning: popenw not available, writing temporary file
wolffd@0 91 SNR is 22.632 dB
wolffd@0 92 SNR is 24.8699 dB
wolffd@0 93 </pre><img vspace="5" hspace="5" src="demo_mp3readwrite_01.png" alt=""> <h2>Delay, size, and alignment<a name="2"></a></h2><p>In mid-2006 I noticed that mp3read followed by mp3write followed by mp3read effectively delayed the waveform by 2257 samples (at 44 kHz). So I introduced code to discard the first 2257 samples to ensure that the waveforms remained time aligned. As best I could understand, mpg123 (v 0.5.9) was including the "warm-up" samples from the synthesis filterbank which are more properly discarded.</p><p>Then in late 2009 I noticed that some chord recognition code, which used mp3read to read files which were then segmented on the basis of some hand-marked timings, suddenly started getting much poorer results. It turned out that I had upgraded my version of mpg123 to v 1.9.0, and the warm-up samples had been fixed in this version. So my code was discarding 2257 <b>good</b> samples, and the data was skewed 51ms early relative to the hand labels.</p><p>Hence, the current version of mp3read does not discard any samples by default -- appropriate for the recent versions of mpg123 included here. But if you know you're running an old, v 0.5.9, mpg123, you should edit the mp3read.m source to set the flag MPG123059 = 1.</p><p>Note also that the 'size' function relies on the number of blocks reported by mp3info. However, many mp3 files include additional information about the size of the file in the so-called Xing header, embedded in the first frame, which can specify that a certain number of samples from start and end should additionally be dropped. mp3info doesn't read that, and there's no way for my code to probe it except by running mpg123. Hence, the results of mp3read(fn,'size') may sometimes overestimate the length of the actual vector you'll get if you read the whole file.</p><h2>External binaries<a name="3"></a></h2><p>The m files rely on three external binaries, each of which is available for Linux, Mac OS X, or Windows:</p><p><b>mpg123</b> is a high-performance mp3 decoder. Its home page is <a href="http://www.mpg123.de/">http://www.mpg123.de/</a> .</p><p><b>mp3info</b> is a utility to read technical information on an mp3 file. Its home page is <a href="http://www.ibiblio.org/mp3info/">http://www.ibiblio.org/mp3info/</a> .</p><p><b>lame</b> is an open-source MP3 encoder. Its homepage is <a href="http://lame.sourceforge.net/">http://lame.sourceforge.net/</a> .</p><p>The various authors of these packages are gratefully acknowledged for doing all the hard work to make these Matlab functions possible.</p><h2>Installation<a name="4"></a></h2><p>The two routines, mp3read.m and mp3write.m, will look for their binaries (mpg123 and mp3info for mp3read; lame for mp3write) in the same directory where they are installed. Binaries for different architectures are distinguished by their extension, which is the standard Matlab computer code e.g. ".mac" for Mac PPC OS X, ".glnx86" for i386-linux. The exception is Windows, where the binaries have the extension ".exe".</p><p>Temporary files will be written to (a) a directory taken from the environment variable TMPDIR (b) /tmp if it exists, or (c) the current directory. This can easily be changed by editing the m files.</p><pre class="codeinput"><span class="comment">% Last updated: $Date: 2009/03/15 18:29:58 $</span>
wolffd@0 94 <span class="comment">% Dan Ellis &lt;dpwe@ee.columbia.edu&gt;</span>
wolffd@0 95 </pre><p class="footer"><br>
wolffd@0 96 Published with MATLAB&reg; 7.10<br></p></div><!--
wolffd@0 97 ##### SOURCE BEGIN #####
wolffd@0 98 %% MP3 reading and writing
wolffd@0 99 %
wolffd@0 100 % These function, mp3read and mp3write, aim to exactly duplicate
wolffd@0 101 % the operation of wavread and wavwrite for accessing soundfiles,
wolffd@0 102 % except the soundfiles are in Mpeg-Audio layer 3 (MP3) compressed
wolffd@0 103 % format. All the hard work is done by external binaries written
wolffd@0 104 % by others: mp3info to query the format of existing mp3 files,
wolffd@0 105 % mpg123 to decode mp3 files, and lame to encode audio files.
wolffd@0 106 % Binaries for these files are widely available (and may be
wolffd@0 107 % included in this distribution).
wolffd@0 108 %
wolffd@0 109 % These functions were originally developed for access to very
wolffd@0 110 % large mp3 files (i.e. many hours long), and so avoid creating
wolffd@0 111 % the entire uncompressed audio stream if possible. mp3read
wolffd@0 112 % allows you to specify the range of frames you want to read
wolffd@0 113 % (as a second argument), and mp3read will construct an mpg123
wolffd@0 114 % command that skips blocks to decode only the part of the file
wolffd@0 115 % that is required. This can be much quicker (and require less
wolffd@0 116 % memory/temporary disk) than decoding the whole file.
wolffd@0 117 %
wolffd@0 118 % mpg123 also provides for "on the fly" downsampling at conversion
wolffd@0 119 % to mono, which are supported as extra options in mp3read.
wolffd@0 120 %
wolffd@0 121 % mpg123 can read MP3s across the network. This is supported
wolffd@0 122 % if the FILE argument is a URL (e.g. beginning 'http://...').
wolffd@0 123 %
wolffd@0 124 % mp3info sometimes gets the file size wrong (as returned by the
wolffd@0 125 % mp3read(...'size') syntax). I'm not sure when this happens
wolffd@0 126 % exactly, but it's probably a result of VBR files. In the worst
wolffd@0 127 % case, figuring the number of samples in such a file requires
wolffd@0 128 % scanning through the whole file, and mp3info doesn't usually do
wolffd@0 129 % this.
wolffd@0 130 %
wolffd@0 131 % For more information, including advice on handling MP4 files,
wolffd@0 132 % see http://labrosa.ee.columbia.edu/matlab/mp3read.html
wolffd@0 133
wolffd@0 134 %% Example usage
wolffd@0 135 % Here, we read a wav file in, then write it out as an MP3, then
wolffd@0 136 % read the resulting MP3 back in, and compare it to the original
wolffd@0 137 % file.
wolffd@0 138
wolffd@0 139 % Read an audio waveform
wolffd@0 140 [d,sr] = wavread('piano.wav');
wolffd@0 141 % Save to mp3 (default settings)
wolffd@0 142 mp3write(d,sr,'piano.mp3');
wolffd@0 143 % Read it back again
wolffd@0 144 [d2,sr] = mp3read('piano.mp3');
wolffd@0 145 % mp3 encoding involves some extra padding at each end; we attempt
wolffd@0 146 % to cut it off at the start, but can't do that at the end, because
wolffd@0 147 % mp3read doesn't know how long the original was. But we do, so..
wolffd@0 148 % Chop it down to be the same length as the original
wolffd@0 149 d2 = d2(1:length(d),:);
wolffd@0 150 % What is the SNR (distortion)?
wolffd@0 151 ddiff = d - d2;
wolffd@0 152 disp(['SNR is ',num2str(10*log10(sum(d(:).^2)/sum(ddiff(:).^2))),' dB']);
wolffd@0 153 % Do they look similar?
wolffd@0 154 subplot(211)
wolffd@0 155 specgram(d(:,1),1024,sr);
wolffd@0 156 subplot(212)
wolffd@0 157 plot(1:5000,d(10000+(1:5000),1),1:5000,d2(10000+(1:5000)));
wolffd@0 158 % Yes, pretty close
wolffd@0 159 %
wolffd@0 160 % NB: lame followed by mpg123 causes a little attenuation; you
wolffd@0 161 % can get a better match by scaling up the read-back waveform:
wolffd@0 162 ddiff = d - 1.052*d2;
wolffd@0 163 disp(['SNR is ',num2str(10*log10(sum(d(:).^2)/sum(ddiff(:).^2))),' dB']);
wolffd@0 164
wolffd@0 165 %% Delay, size, and alignment
wolffd@0 166 %
wolffd@0 167 % In mid-2006 I noticed that mp3read followed by mp3write followed by
wolffd@0 168 % mp3read effectively delayed the waveform by 2257 samples (at 44
wolffd@0 169 % kHz). So I introduced code to discard the first 2257 samples to ensure
wolffd@0 170 % that the waveforms remained time aligned. As best I could understand,
wolffd@0 171 % mpg123 (v 0.5.9) was including the "warm-up" samples from the
wolffd@0 172 % synthesis filterbank which are more properly discarded.
wolffd@0 173 %
wolffd@0 174 % Then in late 2009 I noticed that some chord recognition code, which
wolffd@0 175 % used mp3read to read files which were then segmented on the basis of
wolffd@0 176 % some hand-marked timings, suddenly started getting much poorer
wolffd@0 177 % results. It turned out that I had upgraded my version of mpg123 to v
wolffd@0 178 % 1.9.0, and the warm-up samples had been fixed in this version. So my
wolffd@0 179 % code was discarding 2257 *good* samples, and the data was skewed 51ms
wolffd@0 180 % early relative to the hand labels.
wolffd@0 181 %
wolffd@0 182 % Hence, the current version of mp3read does not
wolffd@0 183 % discard any samples by default REPLACE_WITH_DASH_DASH appropriate for the recent versions
wolffd@0 184 % of mpg123 included here. But if you know you're running an old, v
wolffd@0 185 % 0.5.9, mpg123, you should edit the mp3read.m source to set the flag
wolffd@0 186 % MPG123059 = 1.
wolffd@0 187 %
wolffd@0 188 % Note also that the 'size' function relies on the number of
wolffd@0 189 % blocks reported by mp3info. However, many mp3 files include
wolffd@0 190 % additional information about the size of the file in the
wolffd@0 191 % so-called Xing header, embedded in the first frame, which can
wolffd@0 192 % specify that a certain number of samples from start and end
wolffd@0 193 % should additionally be dropped. mp3info doesn't read that,
wolffd@0 194 % and there's no way for my code to probe it except by running
wolffd@0 195 % mpg123. Hence, the results of mp3read(fn,'size') may sometimes
wolffd@0 196 % overestimate the length of the actual vector you'll get if
wolffd@0 197 % you read the whole file.
wolffd@0 198
wolffd@0 199 %% External binaries
wolffd@0 200 % The m files rely on three external binaries, each of which is
wolffd@0 201 % available for Linux, Mac OS X, or Windows:
wolffd@0 202 %
wolffd@0 203 % *mpg123* is a high-performance mp3 decoder. Its home page is
wolffd@0 204 % http://www.mpg123.de/ .
wolffd@0 205 %
wolffd@0 206 % *mp3info* is a utility to read technical information on an mp3
wolffd@0 207 % file. Its home page is http://www.ibiblio.org/mp3info/ .
wolffd@0 208 %
wolffd@0 209 % *lame* is an open-source MP3 encoder. Its homepage is
wolffd@0 210 % http://lame.sourceforge.net/ .
wolffd@0 211 %
wolffd@0 212 % The various authors of these packages are gratefully acknowledged
wolffd@0 213 % for doing all the hard work to make these Matlab functions possible.
wolffd@0 214
wolffd@0 215 %% Installation
wolffd@0 216 % The two routines, mp3read.m and mp3write.m, will look for their
wolffd@0 217 % binaries (mpg123 and mp3info for mp3read; lame for mp3write) in
wolffd@0 218 % the same directory where they are installed. Binaries for
wolffd@0 219 % different architectures are distinguished by their extension,
wolffd@0 220 % which is the standard Matlab computer code e.g. ".mac" for Mac
wolffd@0 221 % PPC OS X, ".glnx86" for i386-linux. The exception is Windows,
wolffd@0 222 % where the binaries have the extension ".exe".
wolffd@0 223 %
wolffd@0 224 % Temporary files
wolffd@0 225 % will be written to (a) a directory taken from the environment
wolffd@0 226 % variable TMPDIR (b) /tmp if it exists, or (c) the current
wolffd@0 227 % directory. This can easily be changed by editing the m files.
wolffd@0 228
wolffd@0 229 % Last updated: $Date: 2009/03/15 18:29:58 $
wolffd@0 230 % Dan Ellis <dpwe@ee.columbia.edu>
wolffd@0 231
wolffd@0 232 ##### SOURCE END #####
wolffd@0 233 --></body></html>