peterf@2: Tutorial
peterf@2: ========
peterf@2: 
peterf@2: This section covers the fundamentals of developing with *librosa*, including
peterf@2: a package overview, basic and advanced usage, and integration with the *scikit-learn*
peterf@2: package.  We will assume basic familiarity with Python and NumPy/SciPy.
peterf@2: 
peterf@2: 
peterf@2: Overview
peterf@2: --------
peterf@2: 
peterf@2: The *librosa* package is structured as collection of submodules:
peterf@2: 
peterf@2:   - librosa
peterf@2: 
peterf@2:     - :ref:`librosa.beat <beat>`
peterf@2:         Functions for estimating tempo and detecting beat events.
peterf@2: 
peterf@2:     - :ref:`librosa.chord <chord>`
peterf@2:         This submodule contains a generic class which implements supervised training
peterf@2:         of Gaussian-emission Hidden Markov Models (HMM) commonly used in chord
peterf@2:         recognition. 
peterf@2: 
peterf@2:     - :ref:`librosa.core <core>`
peterf@2:         Core functionality includes functions to load audio from disk, compute various
peterf@2:         spectrogram representations, and a variety of commonly used tools for
peterf@2:         music analysis.  For convenience, all functionality in this submodule is
peterf@2:         directly accessible from the top-level ``librosa.*`` namespace.
peterf@2:         
peterf@2:     - :ref:`librosa.decompose <decompose>`
peterf@2:         Functions for harmonic-percussive source separation (HPSS) and generic
peterf@2:         spectrogram decomposition using matrix decomposition methods implemented in
peterf@2:         *scikit-learn*.
peterf@2: 
peterf@2:     - :ref:`librosa.display <display>`
peterf@2:         Visualization and display routines using `matplotlib`.  
peterf@2: 
peterf@2:     - :ref:`librosa.effects <effects>`
peterf@2:         Time-domain audio processing, such as pitch shifting and time stretching.
peterf@2:         This submodule also provides time-domain wrappers for the `decompose`
peterf@2:         submodule.
peterf@2: 
peterf@2:     - :ref:`librosa.feature <feature>`
peterf@2:         Feature extraction and manipulation.  This includes low-level feature
peterf@2:         extraction, such as chromagrams, pseudo-constant-Q (log-frequency) transforms,
peterf@2:         Mel spectrogram, MFCC, and tuning estimation.  Also provided are feature
peterf@2:         manipulation methods, such as delta features, memory embedding, and
peterf@2:         event-synchronous feature alignment.
peterf@2: 
peterf@2:     - :ref:`librosa.filters <filters>`
peterf@2:         Filter-bank generation (chroma, pseudo-CQT, CQT, etc.).  These are primarily
peterf@2:         internal functions used by other parts of *librosa*.
peterf@2: 
peterf@2:     - :ref:`librosa.onset <onset>`
peterf@2:         Onset detection and onset strength computation.
peterf@2: 
peterf@2:     - :ref:`librosa.output <output>`
peterf@2:         Text- and wav-file output.
peterf@2: 
peterf@2:     - :ref:`librosa.segment <segment>`
peterf@2:         Functions useful for structural segmentation, such as recurrence matrix
peterf@2:         construction, time-lag representation, and sequentially constrained
peterf@2:         clustering.
peterf@2: 
peterf@2:     - :ref:`librosa.util <util>`
peterf@2:         Helper utilities (normalization, padding, centering, etc.)
peterf@2: 
peterf@2: 
peterf@2: .. _quickstart:
peterf@2: 
peterf@2: Quickstart
peterf@2: ----------
peterf@2: Before diving into the details, we'll walk through a brief example program
peterf@2: 
peterf@2: .. code-block:: python
peterf@2:     :linenos:
peterf@2: 
peterf@2:     # Beat tracking example
peterf@2:     import librosa
peterf@2: 
peterf@2:     # 1. Get the file path to the included audio example
peterf@2:     filename = librosa.util.example_audio_file()
peterf@2: 
peterf@2:     # 2. Load the audio as a waveform `y`
peterf@2:     #    Store the sampling rate as `sr`
peterf@2:     y, sr = librosa.load(filename)
peterf@2: 
peterf@2:     # 3. Run the default beat tracker, using a hop length of 64 frames
peterf@2:     #    (64 frames at sr=22.050KHz ~= 2.9ms)
peterf@2:     hop_length = 64
peterf@2:     tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2: 
peterf@2:     print 'Estimated tempo: %0.2f beats per minute' % tempo
peterf@2: 
peterf@2:     # 4. Convert the frame indices of beat events into timestamps
peterf@2:     beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2: 
peterf@2:     print 'Saving output to beat_times.csv'
peterf@2:     librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2: 
peterf@2: 
peterf@2: The first step of the program::
peterf@2: 
peterf@2:     filename = librosa.util.example_audio_file()
peterf@2: 
peterf@2: gets the path to the audio example file included with *librosa*.  After this step,
peterf@2: ``filename`` will be a string variable containing the path to the example mp3.
peterf@2: 
peterf@2: The second step::
peterf@2: 
peterf@2:     y, sr = librosa.load(filename)
peterf@2:     
peterf@2: loads and decodes the audio as a :term:`time series` ``y``, represented as a one-dimensional
peterf@2: NumPy floating point array.  The variable ``sr`` contains the :term:`sampling rate` of
peterf@2: ``y``, that is, the number of samples per second of audio.  By default, all audio is
peterf@2: mixed to mono and resampled to 22050 Hz at load time.  This behavior can be overridden
peterf@2: by supplying additional arguments to ``librosa.load()``.
peterf@2: 
peterf@2: The next line::
peterf@2: 
peterf@2:     hop_length = 64
peterf@2: 
peterf@2: sets the :term:`hop length` for the subsequent analysis.  This is number of samples to
peterf@2: advance between subsequent audio frames.  Here, we've set the hop length to 64
peterf@2: samples, which at 22KHz, comes to ``64.0 / 22050 ~= 2.9ms``.  
peterf@2: 
peterf@2: Next, we run the beat tracker using the specified hop length::
peterf@2: 
peterf@2:     tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=hop_length)
peterf@2: 
peterf@2: The output of the beat tracker is an estimate of the tempo (in beats per minute), 
peterf@2: and an array of frame numbers corresponding to detected beat events.
peterf@2: 
peterf@2: :term:`Frames <frame>` here correspond to short windows of the signal (``y``), each 
peterf@2: separated by ``hop_length`` samples.  Since v0.3, *librosa* uses centered frames, so 
peterf@2: that the *k*\ th frame is centered around sample ``k * hop_length``.
peterf@2: 
peterf@2: The next operation converts the frame numbers ``beat_frames`` into timings::
peterf@2: 
peterf@2:     beat_times = librosa.frames_to_time(beat_frames, sr=sr, hop_length=hop_length)
peterf@2: 
peterf@2: Now, ``beat_times`` will be an array of timestamps (in seconds) corresponding to
peterf@2: detected beat events.
peterf@2: 
peterf@2: Finally, we can store the detected beat timestamps as a comma-separated values (CSV)
peterf@2: file::
peterf@2: 
peterf@2:     librosa.output.times_csv('beat_times.csv', beat_times)
peterf@2: 
peterf@2: The contents of ``beat_times.csv`` should look like this::
peterf@2: 
peterf@2:     0.067
peterf@2:     0.514
peterf@2:     0.990
peterf@2:     1.454
peterf@2:     1.910
peterf@2:     ...
peterf@2: 
peterf@2: This is primarily useful for visualization purposes (e.g., using 
peterf@2: `Sonic Visualiser <http://www.sonicvisualiser.org>`_) or evaluation (e.g., using
peterf@2: `mir_eval <https://github.com/craffel/mir_eval>`_).
peterf@2: 
peterf@2: 
peterf@2: Advanced usage
peterf@2: --------------
peterf@2: 
peterf@2: Here we'll cover a more advanced example, integrating harmonic-percussive separation,
peterf@2: multiple spectral features, and beat-synchronous feature aggregation.
peterf@2: 
peterf@2: .. code-block:: python
peterf@2:     :linenos:
peterf@2: 
peterf@2:     # Feature extraction example
peterf@2: 
peterf@2:     import numpy as np
peterf@2:     import librosa
peterf@2: 
peterf@2:     # Load the example clip
peterf@2:     y, sr = librosa.load(librosa.util.example_audio_file())
peterf@2: 
peterf@2:     # Separate harmonics and percussives into two waveforms
peterf@2:     y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2: 
peterf@2:     # Set the hop length
peterf@2:     hop_length = 64
peterf@2: 
peterf@2:     # Beat track on the percussive signal
peterf@2:     tempo, beat_frames = librosa.beat.beat_track(y=y_percussive, 
peterf@2:                                                  sr=sr,
peterf@2:                                                  hop_length=hop_length)
peterf@2: 
peterf@2:     # Compute MFCC features from the raw signal
peterf@2:     mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2: 
peterf@2:     # And the first-order differences (delta features)
peterf@2:     mfcc_delta = librosa.feature.delta(mfcc)
peterf@2: 
peterf@2:     # Stack and synchronize between beat events
peterf@2:     # This time, we'll use the mean value (default) instead of median
peterf@2:     beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2:                                            beat_frames)
peterf@2: 
peterf@2:     # Compute chroma features from the harmonic signal
peterf@2:     chromagram = librosa.feature.chromagram(y=y_harmonic, 
peterf@2:                                             sr=sr,
peterf@2:                                             hop_length=hop_length)
peterf@2: 
peterf@2:     # Aggregate chroma features between beat events
peterf@2:     # We'll use the median value of each feature between beat frames
peterf@2:     beat_chroma = librosa.feature.sync(chromagram, 
peterf@2:                                        beat_frames,
peterf@2:                                        aggregate=np.median)
peterf@2: 
peterf@2:     # Finally, stack all beat-synchronous features together
peterf@2:     beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2: 
peterf@2: 
peterf@2: This example builds on tools we've already covered in the :ref:`quickstart example
peterf@2: <quickstart>`, so here we'll focus just on the new parts.
peterf@2: 
peterf@2: The first difference is the use of the :ref:`effects module <effects>` for time-series
peterf@2: harmonic-percussive separation::
peterf@2: 
peterf@2:     y_harmonic, y_percussive = librosa.effects.hpss(y)
peterf@2: 
peterf@2: The result of this line is that the time series ``y`` has been separated into two time
peterf@2: series, containing the harmonic (tonal) and percussive (transient) portions of the
peterf@2: signal.  Each of ``y_harmonic`` and ``y_percussive`` have the same shape and duration 
peterf@2: as ``y``.
peterf@2: 
peterf@2: The motivation for this kind of operation is two-fold: first, percussive elements
peterf@2: tend to be stronger indicators of rhythmic content, and can help provide more stable
peterf@2: beat tracking results; second, percussive elements can pollute tonal feature
peterf@2: representations (such as chroma) by contributing energy across all frequency bands, so
peterf@2: we'd be better off without them.
peterf@2: 
peterf@2: Next, we introduce the :ref:`feature module <feature>` and extract the Mel-frequency
peterf@2: cepstral coefficients from the raw signal ``y``::
peterf@2: 
peterf@2:     mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
peterf@2: 
peterf@2: The output of this function is the matrix ``mfcc``, which is an *numpy.ndarray* of
peterf@2: size ``(n_mfcc, T)`` (where ``T`` denotes the track duration in frames).  Note that we 
peterf@2: use the same ``hop_length`` here as in the beat tracker, so the detected ``beat_frames`` 
peterf@2: values correspond to columns of ``mfcc``.
peterf@2: 
peterf@2: The first type of feature manipulation we introduce is ``delta``, which computes
peterf@2: (smoothed) first-order differences among columns of its input::
peterf@2: 
peterf@2:     mfcc_delta = librosa.feature.delta(mfcc)
peterf@2: 
peterf@2: The resulting matrix ``mfcc_delta`` has the same shape as the input ``mfcc``.
peterf@2: 
peterf@2: The second type of feature manipulation is ``sync``, which aggregates columns of its
peterf@2: input between sample indices (e.g., beat frames)::
peterf@2: 
peterf@2:     beat_mfcc_delta = librosa.feature.sync(np.vstack([mfcc, mfcc_delta]),
peterf@2:                                            beat_frames)
peterf@2: 
peterf@2: Here, we've vertically stacked the ``mfcc`` and ``mfcc_delta`` matrices together.  The
peterf@2: result of this operation is a matrix ``beat_mfcc_delta`` with the same number of rows
peterf@2: as its input, but the number of columns depends on ``beat_frames``.  Each column 
peterf@2: ``beat_mfcc_delta[:, k]`` will be the *average* of input columns between
peterf@2: ``beat_frames[k]`` and ``beat_frames[k+1]``.  (``beat_frames`` will be expanded to
peterf@2: span the full range ``[0, T]`` so that all data is accounted for.)
peterf@2: 
peterf@2: Next, we compute a chromagram using just the harmonic component::
peterf@2: 
peterf@2:     chromagram = librosa.feature.chromagram(y=y_harmonic, 
peterf@2:                                             sr=sr,
peterf@2:                                             hop_length=hop_length)
peterf@2: 
peterf@2: After this line, ``chromagram`` will be a *numpy.ndarray* of size ``(12, T)``, and 
peterf@2: each row corresponds to a pitch class (e.g., *C*, *C#*, etc.).  Each column of 
peterf@2: ``chromagram`` is normalized by its peak value, though this behavior can be overridden 
peterf@2: by setting the ``norm`` parameter.
peterf@2: 
peterf@2: Once we have the chromagram and list of beat frames, we again synchronize the chroma 
peterf@2: between beat events::
peterf@2: 
peterf@2:     beat_chroma = librosa.feature.sync(chromagram, 
peterf@2:                                        beat_frames,
peterf@2:                                        aggregate=np.median)
peterf@2: 
peterf@2: This time, we've replaced the default aggregate operation (*average*, as used above
peterf@2: for MFCCs) with the *median*.  In general, any statistical summarization function can
peterf@2: be supplied here, including `np.max()`, `np.min()`, `np.std()`, etc.
peterf@2: 
peterf@2: Finally, the all features are vertically stacked again::
peterf@2: 
peterf@2:     beat_features = np.vstack([beat_chroma, beat_mfcc_delta])
peterf@2: 
peterf@2: resulting in a feature matrix ``beat_features`` of dimension 
peterf@2: ``(12 + 13 + 13, # beat intervals)``.
peterf@2: 
peterf@2: 
peterf@2: More examples
peterf@2: -------------
peterf@2: 
peterf@2: More example scripts are provided in the `examples
peterf@2: <https://github.com/bmcfee/librosa/tree/master/examples>`_ folder.