peterf@2
|
1 Tutorial
|
peterf@2
|
2 ========
|
peterf@2
|
3
|
peterf@2
|
4 This section covers the fundamentals of developing with *librosa*, including
|
peterf@2
|
5 a package overview, basic and advanced usage, and integration with the *scikit-learn*
|
peterf@2
|
6 package. We will assume basic familiarity with Python and NumPy/SciPy.
|
peterf@2
|
7
|
peterf@2
|
8
|
peterf@2
|
9 Overview
|
peterf@2
|
10 --------
|
peterf@2
|
11
|
peterf@2
|
12 The *librosa* package is structured as collection of submodules:
|
peterf@2
|
13
|
peterf@2
|
14 - librosa
|
peterf@2
|
15
|
peterf@2
|
16 - :ref:`librosa.beat <beat>`
|
peterf@2
|
17 Functions for estimating tempo and detecting beat events.
|
peterf@2
|
18
|
peterf@2
|
19 - :ref:`librosa.chord <chord>`
|
peterf@2
|
20 This submodule contains a generic class which implements supervised training
|
peterf@2
|
21 of Gaussian-emission Hidden Markov Models (HMM) commonly used in chord
|
peterf@2
|
22 recognition.
|
peterf@2
|
23
|
peterf@2
|
24 - :ref:`librosa.core <core>`
|
peterf@2
|
25 Core functionality includes functions to load audio from disk, compute various
|
peterf@2
|
26 spectrogram representations, and a variety of commonly used tools for
|
peterf@2
|
27 music analysis. For convenience, all functionality in this submodule is
|
peterf@2
|
28 directly accessible from the top-level ``librosa.*`` namespace.
|
peterf@2
|
29
|
peterf@2
|
30 - :ref:`librosa.decompose <decompose>`
|
peterf@2
|
31 Functions for harmonic-percussive source separation (HPSS) and generic
|
peterf@2
|
32 spectrogram decomposition using matrix decomposition methods implemented in
|
peterf@2
|
33 *scikit-learn*.
|
peterf@2
|
34
|
peterf@2
|
35 - :ref:`librosa.display <display>`
|
peterf@2
|
36 Visualization and display routines using `matplotlib`.
|
peterf@2
|
37
|
peterf@2
|
38 - :ref:`librosa.effects <effects>`
|
peterf@2
|
39 Time-domain audio processing, such as pitch shifting and time stretching.
|
peterf@2
|
40 This submodule also provides time-domain wrappers for the `decompose`
|
peterf@2
|
41 submodule.
|
peterf@2
|
42
|
peterf@2
|
43 - :ref:`librosa.feature <feature>`
|
peterf@2
|
44 Feature extraction and manipulation. This includes low-level feature
|
peterf@2
|
45 extraction, such as chromagrams, pseudo-constant-Q (log-frequency) transforms,
|
peterf@2
|
46 Mel spectrogram, MFCC, and tuning estimation. Also provided are feature
|
peterf@2
|
47 manipulation methods, such as delta features, memory embedding, and
|
peterf@2
|
48 event-synchronous feature alignment.
|
peterf@2
|
49
|
peterf@2
|
50 - :ref:`librosa.filters <filters>`
|
peterf@2
|
51 Filter-bank generation (chroma, pseudo-CQT, CQT, etc.). These are primarily
|
peterf@2
|
52 internal functions used by other parts of *librosa*.
|
peterf@2
|
53
|
peterf@2
|
54 - :ref:`librosa.onset <onset>`
|
peterf@2
|
55 Onset detection and onset strength computation.
|
peterf@2
|
56
|
peterf@2
|
57 - :ref:`librosa.output <output>`
|
peterf@2
|
58 Text- and wav-file output.
|
peterf@2
|
59
|
peterf@2
|
60 - :ref:`librosa.segment <segment>`
|
peterf@2
|
61 Functions useful for structural segmentation, such as recurrence matrix
|
peterf@2
|
62 construction, time-lag representation, and sequentially constrained
|
peterf@2
|
63 clustering.
|
peterf@2
|
64
|
peterf@2
|
65 - :ref:`librosa.util <util>`
|
peterf@2
|
66 Helper utilities (normalization, padding, centering, etc.)
|
peterf@2
|
67
|
peterf@2
|
68
|
peterf@2
|
69 .. _quickstart:
|
peterf@2
|
70
|
peterf@2
|
71 Quickstart
|
peterf@2
|
72 ----------
|
peterf@2
|
73 Before diving into the details, we'll walk through a brief example program
|
peterf@2
|
74
|
peterf@2
|
75 .. code-block:: python
|
peterf@2
|
76 :linenos:
|
peterf@2
|
77
|
peterf@2
|
78 # Beat tracking example
|
peterf@2
|
79 import librosa
|
peterf@2
|
80
|
peterf@2
|
81 # 1. Get the file path to the included audio example
|
peterf@2
|
82 filename = librosa.util.example_audio_file()
|
peterf@2
|
83
|
peterf@2
|
84 # 2. Load the audio as a waveform `y`
|
peterf@2
|
85 # Store the sampling rate as `sr`
|
peterf@2
|
86 y, sr = librosa.load(filename)
|
peterf@2
|
87
|
peterf@2
|
88 # 3. Run the default beat tracker, using a hop length of 64 frames
|
peterf@2
|
89 # (64 frames at sr=22.050KHz ~= 2.9ms)
|
peterf@2
|
90 hop_length = 64
|
peterf@2
|
91 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
|
peterf@2
|
92
|
peterf@2
|
93 print 'Estimated tempo: %0.2f beats per minute' % tempo
|
peterf@2
|
94
|
peterf@2
|
95 # 4. Convert the frame indices of beat events into timestamps
|
peterf@2
|
96 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
|
peterf@2
|
97
|
peterf@2
|
98 print 'Saving output to beat_times.csv'
|
peterf@2
|
99 librosa.output.times_csv('beat_times.csv', beat_times)
|
peterf@2
|
100
|
peterf@2
|
101
|
peterf@2
|
102 The first step of the program::
|
peterf@2
|
103
|
peterf@2
|
104 filename = librosa.util.example_audio_file()
|
peterf@2
|
105
|
peterf@2
|
106 gets the path to the audio example file included with *librosa*. After this step,
|
peterf@2
|
107 ``filename`` will be a string variable containing the path to the example mp3.
|
peterf@2
|
108
|
peterf@2
|
109 The second step::
|
peterf@2
|
110
|
peterf@2
|
111 y, sr = librosa.load(filename)
|
peterf@2
|
112
|
peterf@2
|
113 loads and decodes the audio as a :term:`time series` ``y``, represented as a one-dimensional
|
peterf@2
|
114 NumPy floating point array. The variable ``sr`` contains the :term:`sampling rate` of
|
peterf@2
|
115 ``y``, that is, the number of samples per second of audio. By default, all audio is
|
peterf@2
|
116 mixed to mono and resampled to 22050 Hz at load time. This behavior can be overridden
|
peterf@2
|
117 by supplying additional arguments to ``librosa.load()``.
|
peterf@2
|
118
|
peterf@2
|
119 The next line::
|
peterf@2
|
120
|
peterf@2
|
121 hop_length = 64
|
peterf@2
|
122
|
peterf@2
|
123 sets the :term:`hop length` for the subsequent analysis. This is number of samples to
|
peterf@2
|
124 advance between subsequent audio frames. Here, we've set the hop length to 64
|
peterf@2
|
125 samples, which at 22KHz, comes to ``64.0 / 22050 ~= 2.9ms``.
|
peterf@2
|
126
|
peterf@2
|
127 Next, we run the beat tracker using the specified hop length::
|
peterf@2
|
128
|
peterf@2
|
129 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
|
peterf@2
|
130
|
peterf@2
|
131 The output of the beat tracker is an estimate of the tempo (in beats per minute),
|
peterf@2
|
132 and an array of frame numbers corresponding to detected beat events.
|
peterf@2
|
133
|
peterf@2
|
134 :term:`Frames <frame>` here correspond to short windows of the signal (``y``), each
|
peterf@2
|
135 separated by ``hop_length`` samples. Since v0.3, *librosa* uses centered frames, so
|
peterf@2
|
136 that the *k*\ th frame is centered around sample ``k * hop_length``.
|
peterf@2
|
137
|
peterf@2
|
138 The next operation converts the frame numbers ``beat_frames`` into timings::
|
peterf@2
|
139
|
peterf@2
|
140 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
|
peterf@2
|
141
|
peterf@2
|
142 Now, ``beat_times`` will be an array of timestamps (in seconds) corresponding to
|
peterf@2
|
143 detected beat events.
|
peterf@2
|
144
|
peterf@2
|
145 Finally, we can store the detected beat timestamps as a comma-separated values (CSV)
|
peterf@2
|
146 file::
|
peterf@2
|
147
|
peterf@2
|
148 librosa.output.times_csv('beat_times.csv', beat_times)
|
peterf@2
|
149
|
peterf@2
|
150 The contents of ``beat_times.csv`` should look like this::
|
peterf@2
|
151
|
peterf@2
|
152 0.067
|
peterf@2
|
153 0.514
|
peterf@2
|
154 0.990
|
peterf@2
|
155 1.454
|
peterf@2
|
156 1.910
|
peterf@2
|
157 ...
|
peterf@2
|
158
|
peterf@2
|
159 This is primarily useful for visualization purposes (e.g., using
|
peterf@2
|
160 `Sonic Visualiser <http://www.sonicvisualiser.org>`_) or evaluation (e.g., using
|
peterf@2
|
161 `mir_eval <https://github.com/craffel/mir_eval>`_).
|
peterf@2
|
162
|
peterf@2
|
163
|
peterf@2
|
164 Advanced usage
|
peterf@2
|
165 --------------
|
peterf@2
|
166
|
peterf@2
|
167 Here we'll cover a more advanced example, integrating harmonic-percussive separation,
|
peterf@2
|
168 multiple spectral features, and beat-synchronous feature aggregation.
|
peterf@2
|
169
|
peterf@2
|
170 .. code-block:: python
|
peterf@2
|
171 :linenos:
|
peterf@2
|
172
|
peterf@2
|
173 # Feature extraction example
|
peterf@2
|
174
|
peterf@2
|
175 import numpy as np
|
peterf@2
|
176 import librosa
|
peterf@2
|
177
|
peterf@2
|
178 # Load the example clip
|
peterf@2
|
179 y, sr = librosa.load(librosa.util.example_audio_file())
|
peterf@2
|
180
|
peterf@2
|
181 # Separate harmonics and percussives into two waveforms
|
peterf@2
|
182 y_harmonic, y_percussive = librosa.effects.hpss(y)
|
peterf@2
|
183
|
peterf@2
|
184 # Set the hop length
|
peterf@2
|
185 hop_length = 64
|
peterf@2
|
186
|
peterf@2
|
187 # Beat track on the percussive signal
|
peterf@2
|
188 tempo, beat_frames = librosa.beat.beat_track(y=y_percussive,
|
peterf@2
|
189 sr=sr,
|
peterf@2
|
190 hop_length=hop_length)
|
peterf@2
|
191
|
peterf@2
|
192 # Compute MFCC features from the raw signal
|
peterf@2
|
193 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
|
peterf@2
|
194
|
peterf@2
|
195 # And the first-order differences (delta features)
|
peterf@2
|
196 mfcc_delta = librosa.feature.delta(mfcc)
|
peterf@2
|
197
|
peterf@2
|
198 # Stack and synchronize between beat events
|
peterf@2
|
199 # This time, we'll use the mean value (default) instead of median
|
peterf@2
|
200 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
|
peterf@2
|
201 beat_frames)
|
peterf@2
|
202
|
peterf@2
|
203 # Compute chroma features from the harmonic signal
|
peterf@2
|
204 chromagram = librosa.feature.chromagram(y=y_harmonic,
|
peterf@2
|
205 sr=sr,
|
peterf@2
|
206 hop_length=hop_length)
|
peterf@2
|
207
|
peterf@2
|
208 # Aggregate chroma features between beat events
|
peterf@2
|
209 # We'll use the median value of each feature between beat frames
|
peterf@2
|
210 beat_chroma = librosa.feature.sync(chromagram,
|
peterf@2
|
211 beat_frames,
|
peterf@2
|
212 aggregate=np.median)
|
peterf@2
|
213
|
peterf@2
|
214 # Finally, stack all beat-synchronous features together
|
peterf@2
|
215 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
|
peterf@2
|
216
|
peterf@2
|
217
|
peterf@2
|
218 This example builds on tools we've already covered in the :ref:`quickstart example
|
peterf@2
|
219 <quickstart>`, so here we'll focus just on the new parts.
|
peterf@2
|
220
|
peterf@2
|
221 The first difference is the use of the :ref:`effects module <effects>` for time-series
|
peterf@2
|
222 harmonic-percussive separation::
|
peterf@2
|
223
|
peterf@2
|
224 y_harmonic, y_percussive = librosa.effects.hpss(y)
|
peterf@2
|
225
|
peterf@2
|
226 The result of this line is that the time series ``y`` has been separated into two time
|
peterf@2
|
227 series, containing the harmonic (tonal) and percussive (transient) portions of the
|
peterf@2
|
228 signal. Each of ``y_harmonic`` and ``y_percussive`` have the same shape and duration
|
peterf@2
|
229 as ``y``.
|
peterf@2
|
230
|
peterf@2
|
231 The motivation for this kind of operation is two-fold: first, percussive elements
|
peterf@2
|
232 tend to be stronger indicators of rhythmic content, and can help provide more stable
|
peterf@2
|
233 beat tracking results; second, percussive elements can pollute tonal feature
|
peterf@2
|
234 representations (such as chroma) by contributing energy across all frequency bands, so
|
peterf@2
|
235 we'd be better off without them.
|
peterf@2
|
236
|
peterf@2
|
237 Next, we introduce the :ref:`feature module <feature>` and extract the Mel-frequency
|
peterf@2
|
238 cepstral coefficients from the raw signal ``y``::
|
peterf@2
|
239
|
peterf@2
|
240 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
|
peterf@2
|
241
|
peterf@2
|
242 The output of this function is the matrix ``mfcc``, which is an *numpy.ndarray* of
|
peterf@2
|
243 size ``(n_mfcc, T)`` (where ``T`` denotes the track duration in frames). Note that we
|
peterf@2
|
244 use the same ``hop_length`` here as in the beat tracker, so the detected ``beat_frames``
|
peterf@2
|
245 values correspond to columns of ``mfcc``.
|
peterf@2
|
246
|
peterf@2
|
247 The first type of feature manipulation we introduce is ``delta``, which computes
|
peterf@2
|
248 (smoothed) first-order differences among columns of its input::
|
peterf@2
|
249
|
peterf@2
|
250 mfcc_delta = librosa.feature.delta(mfcc)
|
peterf@2
|
251
|
peterf@2
|
252 The resulting matrix ``mfcc_delta`` has the same shape as the input ``mfcc``.
|
peterf@2
|
253
|
peterf@2
|
254 The second type of feature manipulation is ``sync``, which aggregates columns of its
|
peterf@2
|
255 input between sample indices (e.g., beat frames)::
|
peterf@2
|
256
|
peterf@2
|
257 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
|
peterf@2
|
258 beat_frames)
|
peterf@2
|
259
|
peterf@2
|
260 Here, we've vertically stacked the ``mfcc`` and ``mfcc_delta`` matrices together. The
|
peterf@2
|
261 result of this operation is a matrix ``beat_mfcc_delta`` with the same number of rows
|
peterf@2
|
262 as its input, but the number of columns depends on ``beat_frames``. Each column
|
peterf@2
|
263 ``beat_mfcc_delta[:, k]`` will be the *average* of input columns between
|
peterf@2
|
264 ``beat_frames[k]`` and ``beat_frames[k+1]``. (``beat_frames`` will be expanded to
|
peterf@2
|
265 span the full range ``[0, T]`` so that all data is accounted for.)
|
peterf@2
|
266
|
peterf@2
|
267 Next, we compute a chromagram using just the harmonic component::
|
peterf@2
|
268
|
peterf@2
|
269 chromagram = librosa.feature.chromagram(y=y_harmonic,
|
peterf@2
|
270 sr=sr,
|
peterf@2
|
271 hop_length=hop_length)
|
peterf@2
|
272
|
peterf@2
|
273 After this line, ``chromagram`` will be a *numpy.ndarray* of size ``(12, T)``, and
|
peterf@2
|
274 each row corresponds to a pitch class (e.g., *C*, *C#*, etc.). Each column of
|
peterf@2
|
275 ``chromagram`` is normalized by its peak value, though this behavior can be overridden
|
peterf@2
|
276 by setting the ``norm`` parameter.
|
peterf@2
|
277
|
peterf@2
|
278 Once we have the chromagram and list of beat frames, we again synchronize the chroma
|
peterf@2
|
279 between beat events::
|
peterf@2
|
280
|
peterf@2
|
281 beat_chroma = librosa.feature.sync(chromagram,
|
peterf@2
|
282 beat_frames,
|
peterf@2
|
283 aggregate=np.median)
|
peterf@2
|
284
|
peterf@2
|
285 This time, we've replaced the default aggregate operation (*average*, as used above
|
peterf@2
|
286 for MFCCs) with the *median*. In general, any statistical summarization function can
|
peterf@2
|
287 be supplied here, including `np.max()`, `np.min()`, `np.std()`, etc.
|
peterf@2
|
288
|
peterf@2
|
289 Finally, the all features are vertically stacked again::
|
peterf@2
|
290
|
peterf@2
|
291 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
|
peterf@2
|
292
|
peterf@2
|
293 resulting in a feature matrix ``beat_features`` of dimension
|
peterf@2
|
294 ``(12 + 13 + 13, # beat intervals)``.
|
peterf@2
|
295
|
peterf@2
|
296
|
peterf@2
|
297 More examples
|
peterf@2
|
298 -------------
|
peterf@2
|
299
|
peterf@2
|
300 More example scripts are provided in the `examples
|
peterf@2
|
301 <https://github.com/bmcfee/librosa/tree/master/examples>`_ folder.
|