chime-home-dataset-annotation-and-baseline-evaluation-code: gmm_baseline_experiments/external_libs/librosa/librosa-0.3.1/docs/tutorial.rst annotate

annotate gmm_baseline_experiments/external_libs/librosa/librosa-0.3.1/docs/tutorial.rst @ 5:b523456082ca tip

Update path to dataset and reflect modified chunk naming convention.

author	peterf
date	Mon, 01 Feb 2016 21:35:27 +0000
parents	cb535b80218a
children

rev	line source
peterf@2	1 Tutorial
peterf@2	2 ========
peterf@2	3
peterf@2	4 This section covers the fundamentals of developing with librosa, including
peterf@2	5 a package overview, basic and advanced usage, and integration with the scikit-learn
peterf@2	6 package. We will assume basic familiarity with Python and NumPy/SciPy.
peterf@2	7
peterf@2	8
peterf@2	9 Overview
peterf@2	10 --------
peterf@2	11
peterf@2	12 The librosa package is structured as collection of submodules:
peterf@2	13
peterf@2	14 - librosa
peterf@2	15
peterf@2	16 - :ref:`librosa.beat <beat>`
peterf@2	17 Functions for estimating tempo and detecting beat events.
peterf@2	18
peterf@2	19 - :ref:`librosa.chord <chord>`
peterf@2	20 This submodule contains a generic class which implements supervised training
peterf@2	21 of Gaussian-emission Hidden Markov Models (HMM) commonly used in chord
peterf@2	22 recognition.
peterf@2	23
peterf@2	24 - :ref:`librosa.core <core>`
peterf@2	25 Core functionality includes functions to load audio from disk, compute various
peterf@2	26 spectrogram representations, and a variety of commonly used tools for
peterf@2	27 music analysis. For convenience, all functionality in this submodule is
peterf@2	28 directly accessible from the top-level ``librosa.*`` namespace.
peterf@2	29
peterf@2	30 - :ref:`librosa.decompose <decompose>`
peterf@2	31 Functions for harmonic-percussive source separation (HPSS) and generic
peterf@2	32 spectrogram decomposition using matrix decomposition methods implemented in
peterf@2	33 scikit-learn.
peterf@2	34
peterf@2	35 - :ref:`librosa.display <display>`
peterf@2	36 Visualization and display routines using `matplotlib`.
peterf@2	37
peterf@2	38 - :ref:`librosa.effects <effects>`
peterf@2	39 Time-domain audio processing, such as pitch shifting and time stretching.
peterf@2	40 This submodule also provides time-domain wrappers for the `decompose`
peterf@2	41 submodule.
peterf@2	42
peterf@2	43 - :ref:`librosa.feature <feature>`
peterf@2	44 Feature extraction and manipulation. This includes low-level feature
peterf@2	45 extraction, such as chromagrams, pseudo-constant-Q (log-frequency) transforms,
peterf@2	46 Mel spectrogram, MFCC, and tuning estimation. Also provided are feature
peterf@2	47 manipulation methods, such as delta features, memory embedding, and
peterf@2	48 event-synchronous feature alignment.
peterf@2	49
peterf@2	50 - :ref:`librosa.filters <filters>`
peterf@2	51 Filter-bank generation (chroma, pseudo-CQT, CQT, etc.). These are primarily
peterf@2	52 internal functions used by other parts of librosa.
peterf@2	53
peterf@2	54 - :ref:`librosa.onset <onset>`
peterf@2	55 Onset detection and onset strength computation.
peterf@2	56
peterf@2	57 - :ref:`librosa.output <output>`
peterf@2	58 Text- and wav-file output.
peterf@2	59
peterf@2	60 - :ref:`librosa.segment <segment>`
peterf@2	61 Functions useful for structural segmentation, such as recurrence matrix
peterf@2	62 construction, time-lag representation, and sequentially constrained
peterf@2	63 clustering.
peterf@2	64
peterf@2	65 - :ref:`librosa.util <util>`
peterf@2	66 Helper utilities (normalization, padding, centering, etc.)
peterf@2	67
peterf@2	68
peterf@2	69 .. _quickstart:
peterf@2	70
peterf@2	71 Quickstart
peterf@2	72 ----------
peterf@2	73 Before diving into the details, we'll walk through a brief example program
peterf@2	74
peterf@2	75 .. code-block:: python
peterf@2	76 :linenos:
peterf@2	77
peterf@2	78 # Beat tracking example
peterf@2	79 import librosa
peterf@2	80
peterf@2	81 # 1. Get the file path to the included audio example
peterf@2	82 filename = librosa.util.example_audio_file()
peterf@2	83
peterf@2	84 # 2. Load the audio as a waveform `y`
peterf@2	85 # Store the sampling rate as `sr`
peterf@2	86 y, sr = librosa.load(filename)
peterf@2	87
peterf@2	88 # 3. Run the default beat tracker, using a hop length of 64 frames
peterf@2	89 # (64 frames at sr=22.050KHz ~= 2.9ms)
peterf@2	90 hop_length = 64
peterf@2	91 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2	92
peterf@2	93 print 'Estimated tempo: %0.2f beats per minute' % tempo
peterf@2	94
peterf@2	95 # 4. Convert the frame indices of beat events into timestamps
peterf@2	96 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2	97
peterf@2	98 print 'Saving output to beat_times.csv'
peterf@2	99 librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2	100
peterf@2	101
peterf@2	102 The first step of the program::
peterf@2	103
peterf@2	104 filename = librosa.util.example_audio_file()
peterf@2	105
peterf@2	106 gets the path to the audio example file included with librosa. After this step,
peterf@2	107 ``filename`` will be a string variable containing the path to the example mp3.
peterf@2	108
peterf@2	109 The second step::
peterf@2	110
peterf@2	111 y, sr = librosa.load(filename)
peterf@2	112
peterf@2	113 loads and decodes the audio as a :term:`time series` ``y``, represented as a one-dimensional
peterf@2	114 NumPy floating point array. The variable ``sr`` contains the :term:`sampling rate` of
peterf@2	115 ``y``, that is, the number of samples per second of audio. By default, all audio is
peterf@2	116 mixed to mono and resampled to 22050 Hz at load time. This behavior can be overridden
peterf@2	117 by supplying additional arguments to ``librosa.load()``.
peterf@2	118
peterf@2	119 The next line::
peterf@2	120
peterf@2	121 hop_length = 64
peterf@2	122
peterf@2	123 sets the :term:`hop length` for the subsequent analysis. This is number of samples to
peterf@2	124 advance between subsequent audio frames. Here, we've set the hop length to 64
peterf@2	125 samples, which at 22KHz, comes to ``64.0 / 22050 ~= 2.9ms``.
peterf@2	126
peterf@2	127 Next, we run the beat tracker using the specified hop length::
peterf@2	128
peterf@2	129 tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2	130
peterf@2	131 The output of the beat tracker is an estimate of the tempo (in beats per minute),
peterf@2	132 and an array of frame numbers corresponding to detected beat events.
peterf@2	133
peterf@2	134 :term:`Frames <frame>` here correspond to short windows of the signal (``y``), each
peterf@2	135 separated by ``hop_length`` samples. Since v0.3, librosa uses centered frames, so
peterf@2	136 that the k\ th frame is centered around sample ``k * hop_length``.
peterf@2	137
peterf@2	138 The next operation converts the frame numbers ``beat_frames`` into timings::
peterf@2	139
peterf@2	140 beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2	141
peterf@2	142 Now, ``beat_times`` will be an array of timestamps (in seconds) corresponding to
peterf@2	143 detected beat events.
peterf@2	144
peterf@2	145 Finally, we can store the detected beat timestamps as a comma-separated values (CSV)
peterf@2	146 file::
peterf@2	147
peterf@2	148 librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2	149
peterf@2	150 The contents of ``beat_times.csv`` should look like this::
peterf@2	151
peterf@2	152 0.067
peterf@2	153 0.514
peterf@2	154 0.990
peterf@2	155 1.454
peterf@2	156 1.910
peterf@2	157 ...
peterf@2	158
peterf@2	159 This is primarily useful for visualization purposes (e.g., using
peterf@2	160 `Sonic Visualiser <http://www.sonicvisualiser.org>`_) or evaluation (e.g., using
peterf@2	161 `mir_eval <https://github.com/craffel/mir_eval>`_).
peterf@2	162
peterf@2	163
peterf@2	164 Advanced usage
peterf@2	165 --------------
peterf@2	166
peterf@2	167 Here we'll cover a more advanced example, integrating harmonic-percussive separation,
peterf@2	168 multiple spectral features, and beat-synchronous feature aggregation.
peterf@2	169
peterf@2	170 .. code-block:: python
peterf@2	171 :linenos:
peterf@2	172
peterf@2	173 # Feature extraction example
peterf@2	174
peterf@2	175 import numpy as np
peterf@2	176 import librosa
peterf@2	177
peterf@2	178 # Load the example clip
peterf@2	179 y, sr = librosa.load(librosa.util.example_audio_file())
peterf@2	180
peterf@2	181 # Separate harmonics and percussives into two waveforms
peterf@2	182 y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2	183
peterf@2	184 # Set the hop length
peterf@2	185 hop_length = 64
peterf@2	186
peterf@2	187 # Beat track on the percussive signal
peterf@2	188 tempo, beat_frames = librosa.beat.beat_track(y=y_percussive,
peterf@2	189 sr=sr,
peterf@2	190 hop_length=hop_length)
peterf@2	191
peterf@2	192 # Compute MFCC features from the raw signal
peterf@2	193 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2	194
peterf@2	195 # And the first-order differences (delta features)
peterf@2	196 mfcc_delta = librosa.feature.delta(mfcc)
peterf@2	197
peterf@2	198 # Stack and synchronize between beat events
peterf@2	199 # This time, we'll use the mean value (default) instead of median
peterf@2	200 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2	201 beat_frames)
peterf@2	202
peterf@2	203 # Compute chroma features from the harmonic signal
peterf@2	204 chromagram = librosa.feature.chromagram(y=y_harmonic,
peterf@2	205 sr=sr,
peterf@2	206 hop_length=hop_length)
peterf@2	207
peterf@2	208 # Aggregate chroma features between beat events
peterf@2	209 # We'll use the median value of each feature between beat frames
peterf@2	210 beat_chroma = librosa.feature.sync(chromagram,
peterf@2	211 beat_frames,
peterf@2	212 aggregate=np.median)
peterf@2	213
peterf@2	214 # Finally, stack all beat-synchronous features together
peterf@2	215 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2	216
peterf@2	217
peterf@2	218 This example builds on tools we've already covered in the :ref:`quickstart example
peterf@2	219 <quickstart>`, so here we'll focus just on the new parts.
peterf@2	220
peterf@2	221 The first difference is the use of the :ref:`effects module <effects>` for time-series
peterf@2	222 harmonic-percussive separation::
peterf@2	223
peterf@2	224 y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2	225
peterf@2	226 The result of this line is that the time series ``y`` has been separated into two time
peterf@2	227 series, containing the harmonic (tonal) and percussive (transient) portions of the
peterf@2	228 signal. Each of ``y_harmonic`` and ``y_percussive`` have the same shape and duration
peterf@2	229 as ``y``.
peterf@2	230
peterf@2	231 The motivation for this kind of operation is two-fold: first, percussive elements
peterf@2	232 tend to be stronger indicators of rhythmic content, and can help provide more stable
peterf@2	233 beat tracking results; second, percussive elements can pollute tonal feature
peterf@2	234 representations (such as chroma) by contributing energy across all frequency bands, so
peterf@2	235 we'd be better off without them.
peterf@2	236
peterf@2	237 Next, we introduce the :ref:`feature module <feature>` and extract the Mel-frequency
peterf@2	238 cepstral coefficients from the raw signal ``y``::
peterf@2	239
peterf@2	240 mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2	241
peterf@2	242 The output of this function is the matrix ``mfcc``, which is an numpy.ndarray of
peterf@2	243 size ``(n_mfcc, T)`` (where ``T`` denotes the track duration in frames). Note that we
peterf@2	244 use the same ``hop_length`` here as in the beat tracker, so the detected ``beat_frames``
peterf@2	245 values correspond to columns of ``mfcc``.
peterf@2	246
peterf@2	247 The first type of feature manipulation we introduce is ``delta``, which computes
peterf@2	248 (smoothed) first-order differences among columns of its input::
peterf@2	249
peterf@2	250 mfcc_delta = librosa.feature.delta(mfcc)
peterf@2	251
peterf@2	252 The resulting matrix ``mfcc_delta`` has the same shape as the input ``mfcc``.
peterf@2	253
peterf@2	254 The second type of feature manipulation is ``sync``, which aggregates columns of its
peterf@2	255 input between sample indices (e.g., beat frames)::
peterf@2	256
peterf@2	257 beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2	258 beat_frames)
peterf@2	259
peterf@2	260 Here, we've vertically stacked the ``mfcc`` and ``mfcc_delta`` matrices together. The
peterf@2	261 result of this operation is a matrix ``beat_mfcc_delta`` with the same number of rows
peterf@2	262 as its input, but the number of columns depends on ``beat_frames``. Each column
peterf@2	263 ``beat_mfcc_delta[:, k]`` will be the average of input columns between
peterf@2	264 ``beat_frames[k]`` and ``beat_frames[k+1]``. (``beat_frames`` will be expanded to
peterf@2	265 span the full range ``[0, T]`` so that all data is accounted for.)
peterf@2	266
peterf@2	267 Next, we compute a chromagram using just the harmonic component::
peterf@2	268
peterf@2	269 chromagram = librosa.feature.chromagram(y=y_harmonic,
peterf@2	270 sr=sr,
peterf@2	271 hop_length=hop_length)
peterf@2	272
peterf@2	273 After this line, ``chromagram`` will be a numpy.ndarray of size ``(12, T)``, and
peterf@2	274 each row corresponds to a pitch class (e.g., C, C#, etc.). Each column of
peterf@2	275 ``chromagram`` is normalized by its peak value, though this behavior can be overridden
peterf@2	276 by setting the ``norm`` parameter.
peterf@2	277
peterf@2	278 Once we have the chromagram and list of beat frames, we again synchronize the chroma
peterf@2	279 between beat events::
peterf@2	280
peterf@2	281 beat_chroma = librosa.feature.sync(chromagram,
peterf@2	282 beat_frames,
peterf@2	283 aggregate=np.median)
peterf@2	284
peterf@2	285 This time, we've replaced the default aggregate operation (average, as used above
peterf@2	286 for MFCCs) with the median. In general, any statistical summarization function can
peterf@2	287 be supplied here, including `np.max()`, `np.min()`, `np.std()`, etc.
peterf@2	288
peterf@2	289 Finally, the all features are vertically stacked again::
peterf@2	290
peterf@2	291 beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2	292
peterf@2	293 resulting in a feature matrix ``beat_features`` of dimension
peterf@2	294 ``(12 + 13 + 13, # beat intervals)``.
peterf@2	295
peterf@2	296
peterf@2	297 More examples
peterf@2	298 -------------
peterf@2	299
peterf@2	300 More example scripts are provided in the `examples
peterf@2	301 <https://github.com/bmcfee/librosa/tree/master/examples>`_ folder.

Mercurial > hg > chime-home-dataset-annotation-and-baseline-evaluation-code

annotate gmm_baseline_experiments/external_libs/librosa/librosa-0.3.1/docs/tutorial.rst @ 5:b523456082ca tip