annotate README.txt.linux @ 63:b084e87b83e4

* Add README files for the various platform packages * Fix typo in cat file * Return simpler key names from key detector * Chromagram and constant Q default to unnormalized * Permit up to 48 bpo in constant Q
author Chris Cannam <c.cannam@qmul.ac.uk>
date Thu, 07 Feb 2008 10:03:04 +0000
parents
children e7c785094e7b
rev   line source
c@63 1
c@63 2 QM Vamp Plugins
c@63 3 ===============
c@63 4
c@63 5 Vamp audio feature extraction plugins from Queen Mary, University of London.
c@63 6 Version 1.4.
c@63 7
c@63 8 For more information about Vamp plugins, see http://www.vamp-plugins.org/
c@63 9 and http://www.sonicvisualiser.org/ .
c@63 10
c@63 11
c@63 12 To Install
c@63 13 ==========
c@63 14
c@63 15 This package contains plugins compiled for Linux on 32-bit x86
c@63 16 (Intel/AMD) systems using GNU libc v6.
c@63 17
c@63 18 To install them, copy the files
c@63 19
c@63 20 qm-vamp-plugins.so and
c@63 21 qm-vamp-plugins.cat
c@63 22
c@63 23 to one of the directories
c@63 24
c@63 25 /usr/local/lib/vamp/
c@63 26 /usr/lib/vamp/ or
c@63 27 $HOME/vamp/
c@63 28
c@63 29
c@63 30 License
c@63 31 =======
c@63 32
c@63 33 These plugins are provided in binary form only. You may install and
c@63 34 use the plugin binaries without fee for any purpose commercial or
c@63 35 non-commercial. You may redistribute the plugin binaries provided you
c@63 36 do so without fee and you retain this README file with your
c@63 37 distribution. You may not bundle these plugins with a commercial
c@63 38 product or distribute them on commercial terms. If you wish to
c@63 39 arrange commercial licensing terms, please contact the Centre for
c@63 40 Digital Music at Queen Mary, University of London.
c@63 41
c@63 42 Copyright (c) 2006-2008 Queen Mary, University of London. All rights
c@63 43 reserved except as described above.
c@63 44
c@63 45
c@63 46 New In This Release
c@63 47 ===================
c@63 48
c@63 49 This release contains a new plugin to estimate timbral and rhythmic
c@63 50 similarity between multiple audio tracks, a plugin for structural
c@63 51 segmentation of music audio, and a Mel-frequency cepstral coefficients
c@63 52 calculation plugin.
c@63 53
c@63 54 This release also includes significant updates to the existing key
c@63 55 detector, tempo tracker, and chromagram plugins.
c@63 56
c@63 57
c@63 58 Plugins Included
c@63 59 ================
c@63 60
c@63 61 This plugin set includes the following plugins:
c@63 62
c@63 63 * Note onset detector
c@63 64
c@63 65 * Beat tracker and tempo estimator
c@63 66
c@63 67 * Key estimator and tonal change detector
c@63 68
c@63 69 * Segmenter, to divide a track into a consistent sequence of segments
c@63 70
c@63 71 * Timbral and rhythmic similarity between audio tracks
c@63 72
c@63 73 * Chromagram, constant-Q spectrogram, and MFCC calculation plugins
c@63 74
c@63 75 More details about the plugins follow.
c@63 76
c@63 77
c@63 78 Note Onset Detector
c@63 79 -------------------
c@63 80
c@63 81 Identifier: qm-onsetdetector
c@63 82 Authors: Chris Duxbury, Juan Pablo Bello and Christian Landone
c@63 83 Category: Time > Onsets
c@63 84
c@63 85 References: C. Duxbury, J. P. Bello, M. Davies and M. Sandler.
c@63 86 Complex domain Onset Detection for Musical Signals.
c@63 87 In Proceedings of the 6th Conference on Digital Audio
c@63 88 Effects (DAFx-03). London, UK. September 2003.
c@63 89
c@63 90 D. Stowell and M. D. Plumbley.
c@63 91 Adaptive whitening for improved real-time audio onset
c@63 92 detection.
c@63 93 In Proceedings of the International Computer Music
c@63 94 Conference (ICMC'07), August 2007.
c@63 95
c@63 96 D. Barry, D. Fitzgerald, E. Coyle and B. Lawlor.
c@63 97 Drum Source Separation using Percussive Feature
c@63 98 Detection and Spectral Modulation.
c@63 99 ISSC 2005
c@63 100
c@63 101 The Note Onset Detector plugin analyses a single channel of audio and
c@63 102 estimates the locations of note onsets within the music.
c@63 103
c@63 104 It calculates an onset likelihood function for each spectral frame,
c@63 105 and picks peaks in a smoothed version of this function. The plugin is
c@63 106 non-causal, returning all results at the end of processing.
c@63 107
c@63 108 It has three outputs: the note onset positions, the onset detection
c@63 109 function used in estimating onset positions, and a smoothed version of
c@63 110 the detection function that is used in the peak-picking phase.
c@63 111
c@63 112
c@63 113 Tempo and Beat Tracker
c@63 114 ----------------------
c@63 115
c@63 116 Identifier: qm-tempotracker
c@63 117 Authors: Matthew Davies and Christian Landone
c@63 118 Category: Time > Tempo
c@63 119
c@63 120 References: M. E. P. Davies and M. D. Plumbley.
c@63 121 Context-dependent beat tracking of musical audio.
c@63 122 In IEEE Transactions on Audio, Speech and Language
c@63 123 Processing. Vol. 15, No. 3, pp1009-1020, 2007.
c@63 124
c@63 125 M. E. P. Davies and M. D. Plumbley.
c@63 126 Beat Tracking With A Two State Model.
c@63 127 In Proceedings of the IEEE International Conference
c@63 128 on Acoustics, Speech and Signal Processing (ICASSP 2005),
c@63 129 Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005.
c@63 130
c@63 131 The Tempo and Beat Tracker plugin analyses a single channel of audio
c@63 132 and estimates the locations of metrical beats and the resulting tempo.
c@63 133
c@63 134 It has three outputs: the beat positions, an ongoing estimate of tempo
c@63 135 where available, and the onset detection function used in estimating
c@63 136 beat positions.
c@63 137
c@63 138
c@63 139 Key Detector
c@63 140 ------------
c@63 141
c@63 142 Identifier: qm-keydetector
c@63 143 Authors: Katy Noland and Christian Landone
c@63 144 Category: Key and Tonality
c@63 145
c@63 146 References: K. Noland and M. Sandler.
c@63 147 Signal Processing Parameters for Tonality Estimation.
c@63 148 In Proceedings of Audio Engineering Society 122nd
c@63 149 Convention, Vienna, 2007.
c@63 150
c@63 151 The Key Detector plugin analyses a single channel of audio and
c@63 152 continuously estimates the key of the music.
c@63 153
c@63 154 It has four outputs: the tonic pitch of the key; a major or minor mode
c@63 155 flag; the key (combining the tonic and major/minor into a single
c@63 156 value); and a key strength plot which reports the degree to which the
c@63 157 chroma vector extracted from each input block correlates to the stored
c@63 158 key profiles for each major and minor key. The key profiles are drawn
c@63 159 from analysis of Book I of the Well Tempered Klavier by J S Bach,
c@63 160 recorded at A=440 equal temperament.
c@63 161
c@63 162 The outputs have the values:
c@63 163
c@63 164 Tonic pitch: C = 1, C#/Db = 2, ..., B = 12
c@63 165
c@63 166 Major/minor mode: major = 0, minor = 1
c@63 167
c@63 168 Key: C major = 1, C#/Db major = 2, ..., B major = 12
c@63 169 C minor = 13, C#/Db minor = 14, ..., B minor = 24
c@63 170
c@63 171 Key Strength Plot: 25 separate bins per feature, separated into 1-12
c@63 172 (major from C) and 14-25 (minor from C). Bin 13 is unused, not
c@63 173 for superstitious reasons but simply so as to delimit the major
c@63 174 and minor areas if they are displayed on a single plot by the
c@63 175 plugin host. Higher bin values show increased correlation with
c@63 176 the key profile for that key.
c@63 177
c@63 178 The outputs are also labelled with pitch or key as text.
c@63 179
c@63 180
c@63 181 Tonal Change
c@63 182 ------------
c@63 183
c@63 184 Identifier: qm-tonalchange
c@63 185 Authors: Chris Harte and Martin Gasser
c@63 186 Category: Key and Tonality
c@63 187
c@63 188 References: C. A. Harte, M. Gasser, and M. Sandler.
c@63 189 Detecting harmonic change in musical audio.
c@63 190 In Proceedings of the 1st ACM workshop on Audio and Music
c@63 191 Computing Multimedia, Santa Barbara, 2006.
c@63 192
c@63 193 C. A. Harte and M. Sandler.
c@63 194 Automatic chord identification using a quantised chromagram.
c@63 195 In Proceedings of the 118th Convention of the Audio
c@63 196 Engineering Society, Barcelona, Spain, May 28-31 2005.
c@63 197
c@63 198 The Tonal Change plugin analyses a single channel of audio, detecting
c@63 199 harmonic changes such as chord boundaries.
c@63 200
c@63 201 It has three outputs: a representation of the musical content in a
c@63 202 six-dimensional tonal space onto which the algorithm maps 12-bin
c@63 203 chroma vectors extracted from the audio; a function representing the
c@63 204 estimated likelihood of a tonal change occurring in each spectral
c@63 205 frame; and the resulting estimated positions of tonal changes.
c@63 206
c@63 207
c@63 208 Segmenter
c@63 209 ---------
c@63 210
c@63 211 Identifier: qm-segmenter
c@63 212 Authors: Mark Levy
c@63 213 Category: Classification
c@63 214
c@63 215 References: M. Levy and M. Sandler.
c@63 216 Structural segmentation of musical audio by constrained
c@63 217 clustering.
c@63 218 IEEE Transactions on Audio, Speech, and Language Processing,
c@63 219 February 2008.
c@63 220
c@63 221 The Segmenter plugin divides a single channel of music up into
c@63 222 structurally consistent segments. Its single output contains a
c@63 223 numeric value (the segment type) for each moment at which a new
c@63 224 segment starts.
c@63 225
c@63 226 For music with clearly tonally distinguishable sections such as verse,
c@63 227 chorus, etc., the segments with the same type may be expected to be
c@63 228 similar to one another in some structural sense (e.g. repetitions of
c@63 229 the chorus).
c@63 230
c@63 231 The type of feature used in segmentation can be selected using the
c@63 232 Feature Type parameter. The default Hybrid (Constant-Q) is generally
c@63 233 effective for modern studio recordings, while the Chromatic option may
c@63 234 be preferable for live, acoustic, or older recordings, in which
c@63 235 repeated sections may be less consistent in sound. Also available is
c@63 236 a timbral (MFCC) feature, which is more likely to result in
c@63 237 classification by instrumentation rather than musical content.
c@63 238
c@63 239 Note that this plugin does a substantial amount of processing after
c@63 240 receiving all of the input audio data, before it produces any results.
c@63 241
c@63 242
c@63 243 Similarity
c@63 244 ----------
c@63 245
c@63 246 Identifier: qm-similarity
c@63 247 Authors: Mark Levy, Kurt Jacobson and Chris Cannam
c@63 248 Category: Classification
c@63 249
c@63 250 References: M. Levy and M. Sandler.
c@63 251 Lightweight measures for timbral similarity of musical audio.
c@63 252 In Proceedings of the 1st ACM workshop on Audio and Music
c@63 253 Computing Multimedia, Santa Barbara, 2006.
c@63 254
c@63 255 K. Jacobson.
c@63 256 A Multifaceted Approach to Music Similarity.
c@63 257 In Proceedings of the Seventh International Conference on
c@63 258 Music Information Retrieval (ISMIR), 2006.
c@63 259
c@63 260 The Similarity plugin treats each channel of its audio input as a
c@63 261 separate "track", and estimates how similar the tracks are to one
c@63 262 another using a selectable similarity measure.
c@63 263
c@63 264 The plugin also returns the intermediate data used as a basis of the
c@63 265 similarity measure; it can therefore be used on a single channel of
c@63 266 input (with the resulting intermediate data then being applied in some
c@63 267 other similarity or clustering algorithm, for example) if desired, as
c@63 268 well as with multiple inputs.
c@63 269
c@63 270 The underlying audio features used for the similarity measure can be
c@63 271 selected using the Feature Type parameter. The available features are
c@63 272 Timbre (in which the distance between tracks is a symmetrised
c@63 273 Kullback-Leibler divergence between Gaussian-modelled MFCC means and
c@63 274 variances across each track); Chroma (KL divergence of mean chroma
c@63 275 histogram); Rhythm (cosine distance between "beat spectrum" measures
c@63 276 derived from a short sampled section of the track); and combined
c@63 277 "Timbre and Rhythm" and "Chroma and Rhythm".
c@63 278
c@63 279 The plugin has six outputs: a matrix of the distances between input
c@63 280 channels; a vector containing the distances between the first input
c@63 281 channel and each of the input channels; a pair of vectors containing
c@63 282 the indices of the input channels in the order of their similarity to
c@63 283 the first input channel, and the distances between the first input
c@63 284 channel and each of those channels; the means of the underlying
c@63 285 feature bins (MFCCs or chroma); the variances of the underlying
c@63 286 feature bins; and the beat spectra used for the rhythmic feature.
c@63 287
c@63 288 Because Vamp does not have the capability to return features in matrix
c@63 289 form explicitly, the matrix output is returned as a series of vector
c@63 290 features timestamped at one-second intervals. Likewise, the
c@63 291 underlying feature outputs contain one vector feature per input
c@63 292 channel, timestamped at one-second intervals (so the feature for the
c@63 293 first channel is at time 0, and so on). Examining the features that
c@63 294 the plugin actually returns, when run on some test data, may make this
c@63 295 arrangement more clear.
c@63 296
c@63 297 Note that the underlying feature values are only returned if the
c@63 298 relevant feature type is selected. That is, the means and variances
c@63 299 outputs are valid provided the pure rhythm feature is not selected;
c@63 300 the beat spectra output is valid provided rhythm is included in the
c@63 301 selected feature type.
c@63 302
c@63 303
c@63 304 Constant-Q Spectrogram
c@63 305 ----------------------
c@63 306
c@63 307 Identifier: qm-constantq
c@63 308 Authors: Christian Landone
c@63 309 Category: Visualisation
c@63 310
c@63 311 References: J. Brown.
c@63 312 Calculation of a constant Q spectral transform.
c@63 313 Journal of the Acoustical Society of America, 89(1):
c@63 314 425-434, 1991.
c@63 315
c@63 316 The Constant-Q Spectrogram plugin calculates a spectrogram based on a
c@63 317 short-time windowed constant Q spectral transform. This is a
c@63 318 spectrogram in which the ratio of centre frequency to resolution is
c@63 319 constant for each frequency bin. The frequency bins correspond to the
c@63 320 frequencies of "musical notes" rather than being linearly spaced in
c@63 321 frequency as they are for the conventional DFT spectrogram.
c@63 322
c@63 323 The pitch range and the number of frequency bins per octave may be
c@63 324 adjusted using the plugin's parameters. Note that the plugin's
c@63 325 preferred step and block sizes depend on these parameters, and the
c@63 326 plugin will not accept any other block size.
c@63 327
c@63 328
c@63 329 Chromagram
c@63 330 ----------
c@63 331
c@63 332 Identifier: qm-chromagram
c@63 333 Authors: Christian Landone
c@63 334 Category: Visualisation
c@63 335
c@63 336 The Chromagram plugin calculates a constant Q spectral transform (as
c@63 337 above) and then wraps the frequency bin values into a single octave,
c@63 338 with each bin containing the sum of the magnitudes from the
c@63 339 corresponding bin in all octaves. The number of values in each
c@63 340 feature vector returned by the plugin is therefore the same as the
c@63 341 number of bins per octave configured for the underlying constant Q
c@63 342 transform.
c@63 343
c@63 344 The pitch range and the number of frequency bins per octave for the
c@63 345 transform may be adjusted using the plugin's parameters. Note that
c@63 346 the plugin's preferred step and block sizes depend on these
c@63 347 parameters, and the plugin will not accept any other block size.
c@63 348
c@63 349
c@63 350 Mel-Frequency Cepstral Coefficients
c@63 351 -----------------------------------
c@63 352
c@63 353 Identifier: qm-mfcc
c@63 354 Authors: Nicolas Chetry and Chris Cannam
c@63 355 Category: Low Level Features
c@63 356
c@63 357 References: B. Logan.
c@63 358 Mel-Frequency Cepstral Coefficients for Music Modeling.
c@63 359 In Proceedings of the First International Symposium on Music
c@63 360 Information Retrieval (ISMIR), 2000.
c@63 361
c@63 362 The Mel-Frequency Cepstral Coefficients plugin calculates MFCCs from a
c@63 363 single channel of audio, returning one MFCC vector from each process
c@63 364 call. It also returns the overall means of the coefficient values
c@63 365 across the length of the audio input, as a separate output at the end
c@63 366 of processing.
c@63 367