annotate gmm_baseline_experiments/external_libs/librosa/librosa-0.3.1/docs/tutorial.rst @ 5:b523456082ca tip

Update path to dataset and reflect modified chunk naming convention.
author peterf
date Mon, 01 Feb 2016 21:35:27 +0000
parents cb535b80218a
children
rev   line source
peterf@2 1 Tutorial
peterf@2 2 ========
peterf@2 3
peterf@2 4 This section covers the fundamentals of developing with *librosa*, including
peterf@2 5 a package overview, basic and advanced usage, and integration with the *scikit-learn*
peterf@2 6 package. We will assume basic familiarity with Python and NumPy/SciPy.
peterf@2 7
peterf@2 8
peterf@2 9 Overview
peterf@2 10 --------
peterf@2 11
peterf@2 12 The *librosa* package is structured as collection of submodules:
peterf@2 13
peterf@2 14 - librosa
peterf@2 15
peterf@2 16 - :ref:`librosa.beat <beat>`
peterf@2 17 Functions for estimating tempo and detecting beat events.
peterf@2 18
peterf@2 19 - :ref:`librosa.chord <chord>`
peterf@2 20 This submodule contains a generic class which implements supervised training
peterf@2 21 of Gaussian-emission Hidden Markov Models (HMM) commonly used in chord
peterf@2 22 recognition.
peterf@2 23
peterf@2 24 - :ref:`librosa.core <core>`
peterf@2 25 Core functionality includes functions to load audio from disk, compute various
peterf@2 26 spectrogram representations, and a variety of commonly used tools for
peterf@2 27 music analysis. For convenience, all functionality in this submodule is
peterf@2 28 directly accessible from the top-level ``librosa.*`` namespace.
peterf@2 29
peterf@2 30 - :ref:`librosa.decompose <decompose>`
peterf@2 31 Functions for harmonic-percussive source separation (HPSS) and generic
peterf@2 32 spectrogram decomposition using matrix decomposition methods implemented in
peterf@2 33 *scikit-learn*.
peterf@2 34
peterf@2 35 - :ref:`librosa.display <display>`
peterf@2 36 Visualization and display routines using `matplotlib`.
peterf@2 37
peterf@2 38 - :ref:`librosa.effects <effects>`
peterf@2 39 Time-domain audio processing, such as pitch shifting and time stretching.
peterf@2 40 This submodule also provides time-domain wrappers for the `decompose`
peterf@2 41 submodule.
peterf@2 42
peterf@2 43 - :ref:`librosa.feature <feature>`
peterf@2 44 Feature extraction and manipulation. This includes low-level feature
peterf@2 45 extraction, such as chromagrams, pseudo-constant-Q (log-frequency) transforms,
peterf@2 46 Mel spectrogram, MFCC, and tuning estimation. Also provided are feature
peterf@2 47 manipulation methods, such as delta features, memory embedding, and
peterf@2 48 event-synchronous feature alignment.
peterf@2 49
peterf@2 50 - :ref:`librosa.filters <filters>`
peterf@2 51 Filter-bank generation (chroma, pseudo-CQT, CQT, etc.). These are primarily
peterf@2 52 internal functions used by other parts of *librosa*.
peterf@2 53
peterf@2 54 - :ref:`librosa.onset <onset>`
peterf@2 55 Onset detection and onset strength computation.
peterf@2 56
peterf@2 57 - :ref:`librosa.output <output>`
peterf@2 58 Text- and wav-file output.
peterf@2 59
peterf@2 60 - :ref:`librosa.segment <segment>`
peterf@2 61 Functions useful for structural segmentation, such as recurrence matrix
peterf@2 62 construction, time-lag representation, and sequentially constrained
peterf@2 63 clustering.
peterf@2 64
peterf@2 65 - :ref:`librosa.util <util>`
peterf@2 66 Helper utilities (normalization, padding, centering, etc.)
peterf@2 67
peterf@2 68
peterf@2 69 .. _quickstart:
peterf@2 70
peterf@2 71 Quickstart
peterf@2 72 ----------
peterf@2 73 Before diving into the details, we'll walk through a brief example program
peterf@2 74
peterf@2 75 .. code-block:: python
peterf@2 76 :linenos:
peterf@2 77
peterf@2 78 # Beat tracking example
peterf@2 79 import librosa
peterf@2 80
peterf@2 81 # 1. Get the file path to the included audio example
peterf@2 82 filename = librosa.util.example_audio_file()
peterf@2 83
peterf@2 84 # 2. Load the audio as a waveform `y`
peterf@2 85 # Store the sampling rate as `sr`
peterf@2 86 y, sr = librosa.load(filename)
peterf@2 87
peterf@2 88 # 3. Run the default beat tracker, using a hop length of 64 frames
peterf@2 89 # (64 frames at sr=22.050KHz ~= 2.9ms)
peterf@2 90 hop_length = 64
peterf@2 91 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2 92
peterf@2 93 print 'Estimated tempo: %0.2f beats per minute' % tempo
peterf@2 94
peterf@2 95 # 4. Convert the frame indices of beat events into timestamps
peterf@2 96 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2 97
peterf@2 98 print 'Saving output to beat_times.csv'
peterf@2 99 librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2 100
peterf@2 101
peterf@2 102 The first step of the program::
peterf@2 103
peterf@2 104 filename = librosa.util.example_audio_file()
peterf@2 105
peterf@2 106 gets the path to the audio example file included with *librosa*. After this step,
peterf@2 107 ``filename`` will be a string variable containing the path to the example mp3.
peterf@2 108
peterf@2 109 The second step::
peterf@2 110
peterf@2 111 y, sr = librosa.load(filename)
peterf@2 112
peterf@2 113 loads and decodes the audio as a :term:`time series` ``y``, represented as a one-dimensional
peterf@2 114 NumPy floating point array. The variable ``sr`` contains the :term:`sampling rate` of
peterf@2 115 ``y``, that is, the number of samples per second of audio. By default, all audio is
peterf@2 116 mixed to mono and resampled to 22050 Hz at load time. This behavior can be overridden
peterf@2 117 by supplying additional arguments to ``librosa.load()``.
peterf@2 118
peterf@2 119 The next line::
peterf@2 120
peterf@2 121 hop_length = 64
peterf@2 122
peterf@2 123 sets the :term:`hop length` for the subsequent analysis. This is number of samples to
peterf@2 124 advance between subsequent audio frames. Here, we've set the hop length to 64
peterf@2 125 samples, which at 22KHz, comes to ``64.0 / 22050 ~= 2.9ms``.
peterf@2 126
peterf@2 127 Next, we run the beat tracker using the specified hop length::
peterf@2 128
peterf@2 129 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2 130
peterf@2 131 The output of the beat tracker is an estimate of the tempo (in beats per minute),
peterf@2 132 and an array of frame numbers corresponding to detected beat events.
peterf@2 133
peterf@2 134 :term:`Frames <frame>` here correspond to short windows of the signal (``y``), each
peterf@2 135 separated by ``hop_length`` samples. Since v0.3, *librosa* uses centered frames, so
peterf@2 136 that the *k*\ th frame is centered around sample ``k * hop_length``.
peterf@2 137
peterf@2 138 The next operation converts the frame numbers ``beat_frames`` into timings::
peterf@2 139
peterf@2 140 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2 141
peterf@2 142 Now, ``beat_times`` will be an array of timestamps (in seconds) corresponding to
peterf@2 143 detected beat events.
peterf@2 144
peterf@2 145 Finally, we can store the detected beat timestamps as a comma-separated values (CSV)
peterf@2 146 file::
peterf@2 147
peterf@2 148 librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2 149
peterf@2 150 The contents of ``beat_times.csv`` should look like this::
peterf@2 151
peterf@2 152 0.067
peterf@2 153 0.514
peterf@2 154 0.990
peterf@2 155 1.454
peterf@2 156 1.910
peterf@2 157 ...
peterf@2 158
peterf@2 159 This is primarily useful for visualization purposes (e.g., using
peterf@2 160 `Sonic Visualiser <http://www.sonicvisualiser.org>`_) or evaluation (e.g., using
peterf@2 161 `mir_eval <https://github.com/craffel/mir_eval>`_).
peterf@2 162
peterf@2 163
peterf@2 164 Advanced usage
peterf@2 165 --------------
peterf@2 166
peterf@2 167 Here we'll cover a more advanced example, integrating harmonic-percussive separation,
peterf@2 168 multiple spectral features, and beat-synchronous feature aggregation.
peterf@2 169
peterf@2 170 .. code-block:: python
peterf@2 171 :linenos:
peterf@2 172
peterf@2 173 # Feature extraction example
peterf@2 174
peterf@2 175 import numpy as np
peterf@2 176 import librosa
peterf@2 177
peterf@2 178 # Load the example clip
peterf@2 179 y, sr = librosa.load(librosa.util.example_audio_file())
peterf@2 180
peterf@2 181 # Separate harmonics and percussives into two waveforms
peterf@2 182 y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2 183
peterf@2 184 # Set the hop length
peterf@2 185 hop_length = 64
peterf@2 186
peterf@2 187 # Beat track on the percussive signal
peterf@2 188 tempo, beat_frames = librosa.beat.beat_track(y=y_percussive,
peterf@2 189 sr=sr,
peterf@2 190 hop_length=hop_length)
peterf@2 191
peterf@2 192 # Compute MFCC features from the raw signal
peterf@2 193 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2 194
peterf@2 195 # And the first-order differences (delta features)
peterf@2 196 mfcc_delta = librosa.feature.delta(mfcc)
peterf@2 197
peterf@2 198 # Stack and synchronize between beat events
peterf@2 199 # This time, we'll use the mean value (default) instead of median
peterf@2 200 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2 201 beat_frames)
peterf@2 202
peterf@2 203 # Compute chroma features from the harmonic signal
peterf@2 204 chromagram = librosa.feature.chromagram(y=y_harmonic,
peterf@2 205 sr=sr,
peterf@2 206 hop_length=hop_length)
peterf@2 207
peterf@2 208 # Aggregate chroma features between beat events
peterf@2 209 # We'll use the median value of each feature between beat frames
peterf@2 210 beat_chroma = librosa.feature.sync(chromagram,
peterf@2 211 beat_frames,
peterf@2 212 aggregate=np.median)
peterf@2 213
peterf@2 214 # Finally, stack all beat-synchronous features together
peterf@2 215 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2 216
peterf@2 217
peterf@2 218 This example builds on tools we've already covered in the :ref:`quickstart example
peterf@2 219 <quickstart>`, so here we'll focus just on the new parts.
peterf@2 220
peterf@2 221 The first difference is the use of the :ref:`effects module <effects>` for time-series
peterf@2 222 harmonic-percussive separation::
peterf@2 223
peterf@2 224 y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2 225
peterf@2 226 The result of this line is that the time series ``y`` has been separated into two time
peterf@2 227 series, containing the harmonic (tonal) and percussive (transient) portions of the
peterf@2 228 signal. Each of ``y_harmonic`` and ``y_percussive`` have the same shape and duration
peterf@2 229 as ``y``.
peterf@2 230
peterf@2 231 The motivation for this kind of operation is two-fold: first, percussive elements
peterf@2 232 tend to be stronger indicators of rhythmic content, and can help provide more stable
peterf@2 233 beat tracking results; second, percussive elements can pollute tonal feature
peterf@2 234 representations (such as chroma) by contributing energy across all frequency bands, so
peterf@2 235 we'd be better off without them.
peterf@2 236
peterf@2 237 Next, we introduce the :ref:`feature module <feature>` and extract the Mel-frequency
peterf@2 238 cepstral coefficients from the raw signal ``y``::
peterf@2 239
peterf@2 240 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2 241
peterf@2 242 The output of this function is the matrix ``mfcc``, which is an *numpy.ndarray* of
peterf@2 243 size ``(n_mfcc, T)`` (where ``T`` denotes the track duration in frames). Note that we
peterf@2 244 use the same ``hop_length`` here as in the beat tracker, so the detected ``beat_frames``
peterf@2 245 values correspond to columns of ``mfcc``.
peterf@2 246
peterf@2 247 The first type of feature manipulation we introduce is ``delta``, which computes
peterf@2 248 (smoothed) first-order differences among columns of its input::
peterf@2 249
peterf@2 250 mfcc_delta = librosa.feature.delta(mfcc)
peterf@2 251
peterf@2 252 The resulting matrix ``mfcc_delta`` has the same shape as the input ``mfcc``.
peterf@2 253
peterf@2 254 The second type of feature manipulation is ``sync``, which aggregates columns of its
peterf@2 255 input between sample indices (e.g., beat frames)::
peterf@2 256
peterf@2 257 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2 258 beat_frames)
peterf@2 259
peterf@2 260 Here, we've vertically stacked the ``mfcc`` and ``mfcc_delta`` matrices together. The
peterf@2 261 result of this operation is a matrix ``beat_mfcc_delta`` with the same number of rows
peterf@2 262 as its input, but the number of columns depends on ``beat_frames``. Each column
peterf@2 263 ``beat_mfcc_delta[:, k]`` will be the *average* of input columns between
peterf@2 264 ``beat_frames[k]`` and ``beat_frames[k+1]``. (``beat_frames`` will be expanded to
peterf@2 265 span the full range ``[0, T]`` so that all data is accounted for.)
peterf@2 266
peterf@2 267 Next, we compute a chromagram using just the harmonic component::
peterf@2 268
peterf@2 269 chromagram = librosa.feature.chromagram(y=y_harmonic,
peterf@2 270 sr=sr,
peterf@2 271 hop_length=hop_length)
peterf@2 272
peterf@2 273 After this line, ``chromagram`` will be a *numpy.ndarray* of size ``(12, T)``, and
peterf@2 274 each row corresponds to a pitch class (e.g., *C*, *C#*, etc.). Each column of
peterf@2 275 ``chromagram`` is normalized by its peak value, though this behavior can be overridden
peterf@2 276 by setting the ``norm`` parameter.
peterf@2 277
peterf@2 278 Once we have the chromagram and list of beat frames, we again synchronize the chroma
peterf@2 279 between beat events::
peterf@2 280
peterf@2 281 beat_chroma = librosa.feature.sync(chromagram,
peterf@2 282 beat_frames,
peterf@2 283 aggregate=np.median)
peterf@2 284
peterf@2 285 This time, we've replaced the default aggregate operation (*average*, as used above
peterf@2 286 for MFCCs) with the *median*. In general, any statistical summarization function can
peterf@2 287 be supplied here, including `np.max()`, `np.min()`, `np.std()`, etc.
peterf@2 288
peterf@2 289 Finally, the all features are vertically stacked again::
peterf@2 290
peterf@2 291 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2 292
peterf@2 293 resulting in a feature matrix ``beat_features`` of dimension
peterf@2 294 ``(12 + 13 + 13, # beat intervals)``.
peterf@2 295
peterf@2 296
peterf@2 297 More examples
peterf@2 298 -------------
peterf@2 299
peterf@2 300 More example scripts are provided in the `examples
peterf@2 301 <https://github.com/bmcfee/librosa/tree/master/examples>`_ folder.