Scripts to learn a feature space for world music recordings as described in [1]. The code extracts features capturing rhythmic, timbral, melodic, and harmonic aspects for a collection of sound recordings. It transforms the data using Principal Components Analysis, Non-negative Matrix Factorization and Linear Discriminant Analysis. The country of a music recording is used as the ground truth for similarity. The learned feature space is evaluated by classifying recordings by their country. In addition, the learned feature space is evaluated with an outlier detection experiment.

For any questions please contact m.x.panteli{at}gmail.com.

This code is Copyright 2016 Maria Panteli, University of London, except where specified in individual source files.
Distributed under the GNU General Public License: see the file COPYING for details.
If you use this code for academic purposes please cite [1].
(See the CITATION file for a BibTeX reference.)

1) extract_features.py: Extracts the following descriptors: scale transform (rhythm), pitch bihistogram (melody), mel-frequency cepstrum coefficients (timbre), average chroma (harmony). Requires the path for each music recording to be listed in audiolist.txt.

2) load_dataset.py: Splits the dataset in training, validation, and testing, loads the pre-computed frame-based features and does some post-processing. For example, the scale transform feature is averaged across low and high frequency bands, the pitch bihistogram is represented by an NMF decomposition with 2 components, MFCCs and average chroma are summarised over 8-second windows with 0.5-second overlap. Outputs the post-processed frame-based features, the frame-based class labels, and frame-based audio labels. Requires the path for each pre-computed rhythm, melody, timbre, harmony feature to be listed in csvlist_rhythm.txt, csvlist_melody.txt, csvlist_timbre.txt, csvlist_harmony.txt, respectively, in the same order as the audiolist.txt.

3) map_space.py: Uses the training set to train the PCA, NMF, LDA transformers and the validation set to estimate the optimal number of components. Given the optimal number of components it transforms the test set. It evaluates the transformed space by predicting the country label of the frame-based features. A vote count on the frame-based predicted labels is used to predict the recording-based label.

4) results.py: Outputs the confusion matrix of the classification task, detects outliers and prints a summary of the number of outliers per country.

[1] M. Panteli, E. Benetos, and S. Dixon. Learning a Feature Space for Similarity in World Music. In Proceedings of the 17th International Society for Music Information Retrieval Conference, pages 538-544, 2016.