peterf@2: Tutorial peterf@2: ======== peterf@2: peterf@2: This section covers the fundamentals of developing with *librosa*, including peterf@2: a package overview, basic and advanced usage, and integration with the *scikit-learn* peterf@2: package. We will assume basic familiarity with Python and NumPy/SciPy. peterf@2: peterf@2: peterf@2: Overview peterf@2: -------- peterf@2: peterf@2: The *librosa* package is structured as collection of submodules: peterf@2: peterf@2: - librosa peterf@2: peterf@2: - :ref:`librosa.beat ` peterf@2: Functions for estimating tempo and detecting beat events. peterf@2: peterf@2: - :ref:`librosa.chord ` peterf@2: This submodule contains a generic class which implements supervised training peterf@2: of Gaussian-emission Hidden Markov Models (HMM) commonly used in chord peterf@2: recognition. peterf@2: peterf@2: - :ref:`librosa.core ` peterf@2: Core functionality includes functions to load audio from disk, compute various peterf@2: spectrogram representations, and a variety of commonly used tools for peterf@2: music analysis. For convenience, all functionality in this submodule is peterf@2: directly accessible from the top-level ``librosa.*`` namespace. peterf@2: peterf@2: - :ref:`librosa.decompose ` peterf@2: Functions for harmonic-percussive source separation (HPSS) and generic peterf@2: spectrogram decomposition using matrix decomposition methods implemented in peterf@2: *scikit-learn*. peterf@2: peterf@2: - :ref:`librosa.display ` peterf@2: Visualization and display routines using `matplotlib`. peterf@2: peterf@2: - :ref:`librosa.effects ` peterf@2: Time-domain audio processing, such as pitch shifting and time stretching. peterf@2: This submodule also provides time-domain wrappers for the `decompose` peterf@2: submodule. peterf@2: peterf@2: - :ref:`librosa.feature ` peterf@2: Feature extraction and manipulation. This includes low-level feature peterf@2: extraction, such as chromagrams, pseudo-constant-Q (log-frequency) transforms, peterf@2: Mel spectrogram, MFCC, and tuning estimation. Also provided are feature peterf@2: manipulation methods, such as delta features, memory embedding, and peterf@2: event-synchronous feature alignment. peterf@2: peterf@2: - :ref:`librosa.filters ` peterf@2: Filter-bank generation (chroma, pseudo-CQT, CQT, etc.). These are primarily peterf@2: internal functions used by other parts of *librosa*. peterf@2: peterf@2: - :ref:`librosa.onset ` peterf@2: Onset detection and onset strength computation. peterf@2: peterf@2: - :ref:`librosa.output ` peterf@2: Text- and wav-file output. peterf@2: peterf@2: - :ref:`librosa.segment ` peterf@2: Functions useful for structural segmentation, such as recurrence matrix peterf@2: construction, time-lag representation, and sequentially constrained peterf@2: clustering. peterf@2: peterf@2: - :ref:`librosa.util ` peterf@2: Helper utilities (normalization, padding, centering, etc.) peterf@2: peterf@2: peterf@2: .. _quickstart: peterf@2: peterf@2: Quickstart peterf@2: ---------- peterf@2: Before diving into the details, we'll walk through a brief example program peterf@2: peterf@2: .. code-block:: python peterf@2: :linenos: peterf@2: peterf@2: # Beat tracking example peterf@2: import librosa peterf@2: peterf@2: # 1. Get the file path to the included audio example peterf@2: filename = librosa.util.example_audio_file() peterf@2: peterf@2: # 2. Load the audio as a waveform `y` peterf@2: # Store the sampling rate as `sr` peterf@2: y, sr = librosa.load(filename) peterf@2: peterf@2: # 3. Run the default beat tracker, using a hop length of 64 frames peterf@2: # (64 frames at sr=22.050KHz ~= 2.9ms) peterf@2: hop_length = 64 peterf@2: tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length) peterf@2: peterf@2: print 'Estimated tempo: %0.2f beats per minute' % tempo peterf@2: peterf@2: # 4. Convert the frame indices of beat events into timestamps peterf@2: beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length) peterf@2: peterf@2: print 'Saving output to beat_times.csv' peterf@2: librosa.output.times_csv('beat_times.csv', beat_times) peterf@2: peterf@2: peterf@2: The first step of the program:: peterf@2: peterf@2: filename = librosa.util.example_audio_file() peterf@2: peterf@2: gets the path to the audio example file included with *librosa*. After this step, peterf@2: ``filename`` will be a string variable containing the path to the example mp3. peterf@2: peterf@2: The second step:: peterf@2: peterf@2: y, sr = librosa.load(filename) peterf@2: peterf@2: loads and decodes the audio as a :term:`time series` ``y``, represented as a one-dimensional peterf@2: NumPy floating point array. The variable ``sr`` contains the :term:`sampling rate` of peterf@2: ``y``, that is, the number of samples per second of audio. By default, all audio is peterf@2: mixed to mono and resampled to 22050 Hz at load time. This behavior can be overridden peterf@2: by supplying additional arguments to ``librosa.load()``. peterf@2: peterf@2: The next line:: peterf@2: peterf@2: hop_length = 64 peterf@2: peterf@2: sets the :term:`hop length` for the subsequent analysis. This is number of samples to peterf@2: advance between subsequent audio frames. Here, we've set the hop length to 64 peterf@2: samples, which at 22KHz, comes to ``64.0 / 22050 ~= 2.9ms``. peterf@2: peterf@2: Next, we run the beat tracker using the specified hop length:: peterf@2: peterf@2: tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length) peterf@2: peterf@2: The output of the beat tracker is an estimate of the tempo (in beats per minute), peterf@2: and an array of frame numbers corresponding to detected beat events. peterf@2: peterf@2: :term:`Frames ` here correspond to short windows of the signal (``y``), each peterf@2: separated by ``hop_length`` samples. Since v0.3, *librosa* uses centered frames, so peterf@2: that the *k*\ th frame is centered around sample ``k * hop_length``. peterf@2: peterf@2: The next operation converts the frame numbers ``beat_frames`` into timings:: peterf@2: peterf@2: beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length) peterf@2: peterf@2: Now, ``beat_times`` will be an array of timestamps (in seconds) corresponding to peterf@2: detected beat events. peterf@2: peterf@2: Finally, we can store the detected beat timestamps as a comma-separated values (CSV) peterf@2: file:: peterf@2: peterf@2: librosa.output.times_csv('beat_times.csv', beat_times) peterf@2: peterf@2: The contents of ``beat_times.csv`` should look like this:: peterf@2: peterf@2: 0.067 peterf@2: 0.514 peterf@2: 0.990 peterf@2: 1.454 peterf@2: 1.910 peterf@2: ... peterf@2: peterf@2: This is primarily useful for visualization purposes (e.g., using peterf@2: `Sonic Visualiser `_) or evaluation (e.g., using peterf@2: `mir_eval `_). peterf@2: peterf@2: peterf@2: Advanced usage peterf@2: -------------- peterf@2: peterf@2: Here we'll cover a more advanced example, integrating harmonic-percussive separation, peterf@2: multiple spectral features, and beat-synchronous feature aggregation. peterf@2: peterf@2: .. code-block:: python peterf@2: :linenos: peterf@2: peterf@2: # Feature extraction example peterf@2: peterf@2: import numpy as np peterf@2: import librosa peterf@2: peterf@2: # Load the example clip peterf@2: y, sr = librosa.load(librosa.util.example_audio_file()) peterf@2: peterf@2: # Separate harmonics and percussives into two waveforms peterf@2: y_harmonic, y_percussive = librosa.effects.hpss(y) peterf@2: peterf@2: # Set the hop length peterf@2: hop_length = 64 peterf@2: peterf@2: # Beat track on the percussive signal peterf@2: tempo, beat_frames = librosa.beat.beat_track(y=y_percussive, peterf@2: sr=sr, peterf@2: hop_length=hop_length) peterf@2: peterf@2: # Compute MFCC features from the raw signal peterf@2: mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13) peterf@2: peterf@2: # And the first-order differences (delta features) peterf@2: mfcc_delta = librosa.feature.delta(mfcc) peterf@2: peterf@2: # Stack and synchronize between beat events peterf@2: # This time, we'll use the mean value (default) instead of median peterf@2: beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]), peterf@2: beat_frames) peterf@2: peterf@2: # Compute chroma features from the harmonic signal peterf@2: chromagram = librosa.feature.chromagram(y=y_harmonic, peterf@2: sr=sr, peterf@2: hop_length=hop_length) peterf@2: peterf@2: # Aggregate chroma features between beat events peterf@2: # We'll use the median value of each feature between beat frames peterf@2: beat_chroma = librosa.feature.sync(chromagram, peterf@2: beat_frames, peterf@2: aggregate=np.median) peterf@2: peterf@2: # Finally, stack all beat-synchronous features together peterf@2: beat_features = np.vstack([beat_chroma, beat_mfcc_delta]) peterf@2: peterf@2: peterf@2: This example builds on tools we've already covered in the :ref:`quickstart example peterf@2: `, so here we'll focus just on the new parts. peterf@2: peterf@2: The first difference is the use of the :ref:`effects module ` for time-series peterf@2: harmonic-percussive separation:: peterf@2: peterf@2: y_harmonic, y_percussive = librosa.effects.hpss(y) peterf@2: peterf@2: The result of this line is that the time series ``y`` has been separated into two time peterf@2: series, containing the harmonic (tonal) and percussive (transient) portions of the peterf@2: signal. Each of ``y_harmonic`` and ``y_percussive`` have the same shape and duration peterf@2: as ``y``. peterf@2: peterf@2: The motivation for this kind of operation is two-fold: first, percussive elements peterf@2: tend to be stronger indicators of rhythmic content, and can help provide more stable peterf@2: beat tracking results; second, percussive elements can pollute tonal feature peterf@2: representations (such as chroma) by contributing energy across all frequency bands, so peterf@2: we'd be better off without them. peterf@2: peterf@2: Next, we introduce the :ref:`feature module ` and extract the Mel-frequency peterf@2: cepstral coefficients from the raw signal ``y``:: peterf@2: peterf@2: mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13) peterf@2: peterf@2: The output of this function is the matrix ``mfcc``, which is an *numpy.ndarray* of peterf@2: size ``(n_mfcc, T)`` (where ``T`` denotes the track duration in frames). Note that we peterf@2: use the same ``hop_length`` here as in the beat tracker, so the detected ``beat_frames`` peterf@2: values correspond to columns of ``mfcc``. peterf@2: peterf@2: The first type of feature manipulation we introduce is ``delta``, which computes peterf@2: (smoothed) first-order differences among columns of its input:: peterf@2: peterf@2: mfcc_delta = librosa.feature.delta(mfcc) peterf@2: peterf@2: The resulting matrix ``mfcc_delta`` has the same shape as the input ``mfcc``. peterf@2: peterf@2: The second type of feature manipulation is ``sync``, which aggregates columns of its peterf@2: input between sample indices (e.g., beat frames):: peterf@2: peterf@2: beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]), peterf@2: beat_frames) peterf@2: peterf@2: Here, we've vertically stacked the ``mfcc`` and ``mfcc_delta`` matrices together. The peterf@2: result of this operation is a matrix ``beat_mfcc_delta`` with the same number of rows peterf@2: as its input, but the number of columns depends on ``beat_frames``. Each column peterf@2: ``beat_mfcc_delta[:, k]`` will be the *average* of input columns between peterf@2: ``beat_frames[k]`` and ``beat_frames[k+1]``. (``beat_frames`` will be expanded to peterf@2: span the full range ``[0, T]`` so that all data is accounted for.) peterf@2: peterf@2: Next, we compute a chromagram using just the harmonic component:: peterf@2: peterf@2: chromagram = librosa.feature.chromagram(y=y_harmonic, peterf@2: sr=sr, peterf@2: hop_length=hop_length) peterf@2: peterf@2: After this line, ``chromagram`` will be a *numpy.ndarray* of size ``(12, T)``, and peterf@2: each row corresponds to a pitch class (e.g., *C*, *C#*, etc.). Each column of peterf@2: ``chromagram`` is normalized by its peak value, though this behavior can be overridden peterf@2: by setting the ``norm`` parameter. peterf@2: peterf@2: Once we have the chromagram and list of beat frames, we again synchronize the chroma peterf@2: between beat events:: peterf@2: peterf@2: beat_chroma = librosa.feature.sync(chromagram, peterf@2: beat_frames, peterf@2: aggregate=np.median) peterf@2: peterf@2: This time, we've replaced the default aggregate operation (*average*, as used above peterf@2: for MFCCs) with the *median*. In general, any statistical summarization function can peterf@2: be supplied here, including `np.max()`, `np.min()`, `np.std()`, etc. peterf@2: peterf@2: Finally, the all features are vertically stacked again:: peterf@2: peterf@2: beat_features = np.vstack([beat_chroma, beat_mfcc_delta]) peterf@2: peterf@2: resulting in a feature matrix ``beat_features`` of dimension peterf@2: ``(12 + 13 + 13, # beat intervals)``. peterf@2: peterf@2: peterf@2: More examples peterf@2: ------------- peterf@2: peterf@2: More example scripts are provided in the `examples peterf@2: `_ folder.