danstowell@5: danstowell@5: smacpy - simple-minded audio classifier in python danstowell@5: ================================================= danstowell@5: danstowell@5: Copyright (c) 2012 Dan Stowell and Queen Mary University of London danstowell@5: (incorporating code Copyright (c) 2009 Gyorgy Fazekas and Queen Mary University of London) danstowell@6: - for licence information see the file named COPYING. danstowell@5: danstowell@5: This is a classifier that you can train on a set of labelled audio files, and then it predicts a label for further audio files. danstowell@5: It is designed with two main aims: danstowell@6: danstowell@6: 1. to provide a baseline against which to test more advanced audio classifiers; danstowell@6: 2. to provide a simple code example of a classifier which people are free to build on. danstowell@5: danstowell@33: It uses a workflow which was very common before the age of deep learning, and might still be useful for low-complexity audio tasks: take an audio clip as input, converting it frame-by-frame into MFCCs, and modelling the MFCC "bag of frames" with a GMM. danstowell@5: danstowell@5: Requirements danstowell@5: ------------ danstowell@33: * Python 2.7 or later, or Python 3 danstowell@5: * Python modules: danstowell@5: * numpy danstowell@33: * [librosa](http://librosa.org/) danstowell@20: * [sckikit-learn](http://scikit-learn.sourceforge.net/) danstowell@5: danstowell@35: It has been tested on python 2.7 and 3.8 (on Ubuntu). danstowell@7: danstowell@5: danstowell@5: Usage example 1: commandline danstowell@5: ------------- danstowell@5: If you invoke the script from the commandline (e.g. "python smacpy.py") it will assume there is a folder called "wavs" danstowell@5: and inside that folder are multiple WAV files, each of which has an underscore in the filename, danstowell@5: and the class label is the text BEFORE the underscore. danstowell@19: It will train a model using the wavs, and then test it on the same wavs (dividing the collection up so it can do a "crossvalidated" test). danstowell@5: danstowell@7: To train and test on different folders, you can run it like this: danstowell@7: danstowell@7: python smacpy.py -t trainwavs -T testwavs danstowell@7: danstowell@7: danstowell@5: Usage example 2: from your own code danstowell@5: ------------- danstowell@5: In this hypothetical example we train on four audio files, labelled as either 'usa' or 'uk', and then test on a separate audio file of someone called hubert: danstowell@5: danstowell@5: from smacpy import Smacpy danstowell@5: model = Smacpy("wavs/training", {'karen01.wav':'usa', 'john01.wav':'uk', 'steve02.wav':'usa', 'joe03.wav':'uk'}) danstowell@5: model.classify('wavs/testing/hubert01.wav') danstowell@5: