Wiki » History » Version 13

Chris Cannam, 2014-10-16 02:06 PM

1 1 Chris Cannam
h1. About Sonic Annotator
2 1 Chris Cannam
3 4 Chris Cannam
{{>toc}}
4 3 Chris Cannam
5 1 Chris Cannam
Sonic Annotator is a batch tool for feature extraction and annotation of audio files.  The audio to be processed can be on the local filesystem or available or over http or ftp.  It will run available "Vamp plugins":http://vamp-plugins.org/ on a wide range of audio file types, and can write the results in a selection of formats.
6 1 Chris Cannam
7 1 Chris Cannam
h2. A Quick Tutorial
8 1 Chris Cannam
9 1 Chris Cannam
To use Sonic Annotator, you need to tell it three things: what audio files to extract features from; what features to extract; and how and where to write the results.  You can also optionally tell it to summarise the features.
10 1 Chris Cannam
11 1 Chris Cannam
h3. 1. What audio files to extract features from
12 1 Chris Cannam
13 1 Chris Cannam
Sonic Annotator accepts a list of audio files on the command line. Any argument that is not understood as a supported command-line option will be taken to be the name of an audio file.  Any number of files may be listed.
14 1 Chris Cannam
15 1 Chris Cannam
Several common audio file formats are supported, including MP3, Ogg, and a number of PCM formats such as WAV and AIFF.  AAC is supported on OS/X only, and only if not DRM protected.  WMA is not supported.
16 1 Chris Cannam
17 1 Chris Cannam
File paths do not have to be local; you can also provide remote HTTP or FTP URLs for Sonic Annotator to retrieve.
18 1 Chris Cannam
19 1 Chris Cannam
Sonic Annotator also accepts the names of playlist files (with <code>.m3u</code> extension) and will process every file found in the playlist.
20 1 Chris Cannam
21 1 Chris Cannam
A limitation of the current version of Sonic Annotator on Windows is that it requires forward slash as the path separator ("/") instead of backslash ("\") to avoid writing incorrect URLs into the output in RDF writer mode.  For example, @C:/audio/testfile.wav@.
22 1 Chris Cannam
23 1 Chris Cannam
Finally, you can provide a local directory path instead of a file, together with the @-r@ (recursive) option, for Sonic Annotator to process every audio file found in that directory or any of its subdirectories.
24 1 Chris Cannam
25 1 Chris Cannam
h3. 2. What features to extract
26 1 Chris Cannam
27 6 Chris Cannam
Sonic Annotator applies "transforms" to its input audio files, where a transform (in this terminology) consists of a Vamp plugin together with a certain set of parameters and a specified execution context including step and block size, sample rate, etc.
28 1 Chris Cannam
29 6 Chris Cannam
To use Sonic Annotator normally, you need to create a file that describes the properties of the transform you want to apply, and then tell Sonic Annotator about it by supplying the transform's filename on the command line with the <code>-t</code> option. There is also a quick way of applying the default configuration of a plugin: see "Default transforms" below.
30 1 Chris Cannam
31 7 Chris Cannam
Transforms are usually described in RDF, following the transform part of the "Vamp plugin ontology":http://purl.org/ontology/vamp/.  A Transform may use any Vamp plugin that is currently installed and available on the system.
32 1 Chris Cannam
33 7 Chris Cannam
You can obtain a list of available plugin outputs by running Sonic Annotator with the @-l@ or @--list@ option:
34 1 Chris Cannam
35 1 Chris Cannam
<pre>
36 1 Chris Cannam
  $ sonic-annotator -l
37 1 Chris Cannam
  vamp:vamp-example-plugins:amplitudefollower:amplitude
38 1 Chris Cannam
  vamp:vamp-example-plugins:fixedtempo:acf
39 1 Chris Cannam
  vamp:vamp-example-plugins:fixedtempo:detectionfunction
40 1 Chris Cannam
  vamp:vamp-example-plugins:fixedtempo:filtered_acf
41 1 Chris Cannam
  vamp:vamp-example-plugins:fixedtempo:tempo
42 1 Chris Cannam
  vamp:vamp-example-plugins:fixedtempo:candidates
43 1 Chris Cannam
  vamp:vamp-example-plugins:percussiononsets:detectionfunction
44 1 Chris Cannam
  vamp:vamp-example-plugins:percussiononsets:onsets
45 1 Chris Cannam
  vamp:vamp-example-plugins:powerspectrum:powerspectrum
46 1 Chris Cannam
  vamp:vamp-example-plugins:spectralcentroid:linearcentroid
47 1 Chris Cannam
  vamp:vamp-example-plugins:spectralcentroid:logcentroid
48 1 Chris Cannam
  vamp:vamp-example-plugins:zerocrossing:counts
49 1 Chris Cannam
  vamp:vamp-example-plugins:zerocrossing:zerocrossings
50 1 Chris Cannam
  $
51 1 Chris Cannam
</pre>
52 1 Chris Cannam
53 7 Chris Cannam
And you can obtain a skeleton transform description for one of these plugins with the @-s@ or @--skeleton@ option:
54 1 Chris Cannam
55 1 Chris Cannam
<pre>
56 1 Chris Cannam
  $ sonic-annotator -s vamp:vamp-example-plugins:fixedtempo:tempo
57 5 Chris Cannam
  @prefix xsd:      <http://www.w3.org/2001/XMLSchema> .
58 5 Chris Cannam
  @prefix vamp:     <http://purl.org/ontology/vamp/> .
59 5 Chris Cannam
  @prefix :         <#> .
60 1 Chris Cannam
61 1 Chris Cannam
  :transform a vamp:Transform ;
62 5 Chris Cannam
      vamp:plugin <http://vamp-plugins.org/rdf/plugins/vamp-example-plugins#fixedtempo> ;
63 1 Chris Cannam
      vamp:step_size "64"^^xsd:int ; 
64 1 Chris Cannam
65 1 Chris Cannam
      vamp:block_size "256"^^xsd:int ; 
66 1 Chris Cannam
      vamp:parameter_binding [
67 1 Chris Cannam
          vamp:parameter [ vamp:identifier "maxbpm" ] ;
68 1 Chris Cannam
          vamp:value "190"^^xsd:float ;
69 1 Chris Cannam
      ] ;
70 1 Chris Cannam
      vamp:parameter_binding [
71 1 Chris Cannam
          vamp:parameter [ vamp:identifier "maxdflen" ] ;
72 1 Chris Cannam
          vamp:value "10"^^xsd:float ;
73 1 Chris Cannam
      ] ;
74 1 Chris Cannam
      vamp:parameter_binding [
75 1 Chris Cannam
          vamp:parameter [ vamp:identifier "minbpm" ] ;
76 1 Chris Cannam
          vamp:value "50"^^xsd:float ;
77 1 Chris Cannam
      ] ;
78 5 Chris Cannam
      vamp:output <http://vamp-plugins.org/rdf/plugins/vamp-example-plugins#fixedtempo_output_tempo> .
79 1 Chris Cannam
  $
80 1 Chris Cannam
</pre>
81 1 Chris Cannam
82 1 Chris Cannam
The output of this example is an RDF/Turtle document describing the default settings for the Tempo output of the Fixed Tempo Estimator plugin in the Vamp plugin SDK.
83 1 Chris Cannam
84 1 Chris Cannam
(The exact format of the RDF printed may differ -- e.g. if the plugin's RDF description is not installed and so its "home" URI is not known -- but the result should be functionally equivalent to this.)
85 1 Chris Cannam
86 7 Chris Cannam
You can then run this transform by saving the RDF to a file and specifying that file with @-t@ or @--transform@. You will also need to supply at least one writer option (see "How and where to write the results" below for more about those).
87 1 Chris Cannam
88 1 Chris Cannam
<pre>
89 5 Chris Cannam
  $ sonic-annotator -s vamp:vamp-example-plugins:fixedtempo:tempo > test.n3
90 1 Chris Cannam
  $ sonic-annotator -t test.n3 audio.wav -w csv --csv-stdout
91 1 Chris Cannam
  (... logging output on stderr, then ...)
92 1 Chris Cannam
  "audio.wav",0.002902494,5.196916099,68.7916,"68.8 bpm"
93 1 Chris Cannam
  $
94 1 Chris Cannam
</pre>
95 1 Chris Cannam
96 1 Chris Cannam
The single line of output above consists of the audio file name, the timestamp and duration for a single feature, the value of that feature (the estimated tempo of the given region of time from that file, in bpm -- the plugin in question performs a single tempo estimation and nothing else) and the feature's label.
97 1 Chris Cannam
98 6 Chris Cannam
To run more than one transform on the same set of audio files, just put more than one set of transform RDF descriptions in the same file, or give the @-t@ option more than once with separate transform description files.  Remember that if you want to specify more than one transform in the same file, they will need to have distinct URIs (that is, the @:transform@ part of the example above, which may be any arbitrary name, must be distinct for each described transform). You can also list lots of transform filenames in a list file and use the @-T@ or @--transforms@ option to tell Sonic Annotator to load all of those.
99 1 Chris Cannam
100 6 Chris Cannam
h4. Default transforms
101 6 Chris Cannam
102 6 Chris Cannam
A quicker way to run a single plugin in its default configuration is to use the @-d@ (default) option:
103 6 Chris Cannam
104 1 Chris Cannam
<pre>
105 1 Chris Cannam
  $ sonic-annotator -d vamp:vamp-example-plugins:fixedtempo:tempo audio.wav -w csv --csv-stdout
106 1 Chris Cannam
  (... some log output on stderr, then ...)
107 1 Chris Cannam
  "audio.wav",0.002902494,5.196916099,68.7916,"68.8 bpm"
108 1 Chris Cannam
  $
109 1 Chris Cannam
</pre>
110 1 Chris Cannam
111 6 Chris Cannam
Although handy for experimentation, the @-d@ option is inadvisable in any "production" situation because the plugin configuration is not guaranteed to be the same each time (for example if an updated version of a plugin changes some of its defaults).  It's better to save a well-defined transform to a file and refer to that, even if it is just the transform created by the skeleton generator (the @-s@ option).
112 1 Chris Cannam
113 8 Chris Cannam
h4. Transform configuration
114 8 Chris Cannam
115 8 Chris Cannam
The following description applies to transforms expressed in RDF/Turtle format. It is also possible to describe them in an XML format, which is not documented here.
116 8 Chris Cannam
117 8 Chris Cannam
h5. Plugin identifier and output
118 8 Chris Cannam
119 8 Chris Cannam
The plugin itself, and the output to obtain features from, is specified using the @vamp:plugin@ and @vamp:output@ properties. Normally these will take URI values matching those given in the RDF file distributed along with the plugin. The @--skeleton@ option will generate valid URIs for any plugin and output that is currently installed.
120 8 Chris Cannam
121 8 Chris Cannam
h5. Plugin parameters
122 8 Chris Cannam
123 8 Chris Cannam
These associate a parameter ID, as a plain literal string, with a value, which is also a literal but that always contains a floating-point number. (Vamp plugin parameters are always numeric: boolean parameters, for example, take the value 1 for on and 0 for off.) An example is shown in the skeleton output above.
124 8 Chris Cannam
125 8 Chris Cannam
h5. Program
126 8 Chris Cannam
127 8 Chris Cannam
A small number of plugins support "programs", a means of setting several parameters at once according to a label. For example, an onset detector might have programs for instruments with "hard" or "soft" onsets each of which sets a number of separate parameters. You can set these using the @vamp:program@ property.
128 8 Chris Cannam
129 9 Chris Cannam
h5. Sample rate
130 9 Chris Cannam
131 9 Chris Cannam
The audio sample rate (in Hz) to which input audio will be resampled before being presented to the plugin. The default is to use the sample rate of the first audio file.
132 9 Chris Cannam
133 9 Chris Cannam
h5. Start time and duration
134 9 Chris Cannam
135 10 Chris Cannam
These are newly supported in Sonic Annotator v1.1. To apply a plugin to only part of an input audio file, you can provide @vamp:start_time@ and @vamp:duration@ properties specifying the range to use as input. These take rather fiddly @xsd:duration@ format values:
136 9 Chris Cannam
137 9 Chris Cannam
<pre>
138 9 Chris Cannam
:transform0 a vamp:Transform;
139 9 Chris Cannam
	vamp:plugin examples:percussiononsets ;
140 9 Chris Cannam
	vamp:output examples:percussiononsets_output_detectionfunction ;
141 9 Chris Cannam
	vamp:start "PT2.0S"^^xsd:duration ;
142 9 Chris Cannam
	vamp:duration "PT2.0S"^^xsd:duration .
143 9 Chris Cannam
</pre>
144 9 Chris Cannam
145 9 Chris Cannam
Note that the actual start and end of the audio passed to the plugin will depend on Sonic Annotator's internal processing block size.
146 9 Chris Cannam
147 9 Chris Cannam
There will usually be some additional samples included at the end, as the duration is usually not an exact multiple of the internal processing block size and the actual audio data provided is always rounded up to that.
148 9 Chris Cannam
149 9 Chris Cannam
At the start, the start time will be matched exactly if only a single transforms is being run, or if all transforms share the same start time. Otherwise there may be some extra samples at the start, because the actual start time will snap to the next earlier multiple of the block size since the start time of any transform that began sooner.
150 9 Chris Cannam
151 8 Chris Cannam
h5. Step and block size
152 8 Chris Cannam
153 8 Chris Cannam
These are straightforwardly defined using @vamp:step_size@ and @vamp:block_size@ properties.
154 8 Chris Cannam
155 8 Chris Cannam
The block size is the number of audio samples per frame (time or frequency domain) passed to the plugin's process function. If unspecified, this will either take the preferred size requested by the plugin (if any) or 1024.
156 8 Chris Cannam
157 8 Chris Cannam
The step size is the increment in audio samples from one processing frame to the next. For plugins taking time-domain input, this is usually the same as the block size (blocks do not overlap). For those taking frequency-domain input the default is half the step size, a 50% overlap.
158 8 Chris Cannam
159 8 Chris Cannam
h5. Window type
160 1 Chris Cannam
161 9 Chris Cannam
For plugins taking frequency-domain input, you can choose the window shape used for time-domain frames prior to the short-time Fourier transform. To do so, supply a @vamp:window_type@ property taking one of the literal values: @"rectangular" "bartlett" "hamming" "hanning" "blackman" "gaussian" "parzen" "nuttall"@ or @"blackman-harris"@.
162 8 Chris Cannam
163 8 Chris Cannam
h5. Plugin version
164 8 Chris Cannam
165 12 Chris Cannam
Newly supported in Sonic Annotator v1.1. To ensure repeatable results, you can specify a particular version of the plugin using the @vamp:plugin_version@ property. If the plugin actually installed is found to have a different version, Sonic Annotator will refuse to use it.
166 8 Chris Cannam
167 1 Chris Cannam
h3. 3. How and where to write the results
168 1 Chris Cannam
169 1 Chris Cannam
Sonic Annotator supports various different output modules (and it is fairly easy for the developer to add new ones).  You have to choose at least one output module; use the @-w@ (writer) option to do so.  Each module has its own set of parameters which can be adjusted on the
170 1 Chris Cannam
command line, as well as its own default rules about where to write the results.
171 1 Chris Cannam
172 1 Chris Cannam
The following writers are currently supported.  (Others exist, but are not properly implemented or not supported.)
173 1 Chris Cannam
174 1 Chris Cannam
h3. csv
175 1 Chris Cannam
176 1 Chris Cannam
Writes the results into comma-separated data files.
177 1 Chris Cannam
178 5 Chris Cannam
One file is created for each transform applied to each input audio file, named after the input audio file and transform name with <code>.csv</code>   suffix and ":" replaced by "_" throughout, placed in the same directory as the audio file.
179 1 Chris Cannam
180 5 Chris Cannam
To instruct Sonic Annotator to place the output files in another location, use <code>--csv-basedir</code> with a directory name.
181 1 Chris Cannam
182 1 Chris Cannam
To write a single file with all data in it, use <code>--csv-one-file</code>.
183 1 Chris Cannam
184 1 Chris Cannam
To write all data to standard output instead of to a file, use <code>--csv-stdout</code>.
185 1 Chris Cannam
186 5 Chris Cannam
Sonic Annotator will not write to an output file that already exists.  If you want to make it do this, use <code>--csv-force to</code> overwrite or <code>--csv-append</code> to append to it.
187 1 Chris Cannam
188 13 Chris Cannam
The data generated consists of one line for each result feature, containing the feature timestamp, feature duration if present, all of the feature's bin values in order, followed by the feature's label if present.  If the <code>--csv-one-file</code> or <code>--csv-stdout</code> option is specified, then an additional column will appear before any of the above, containing the audio file name from which the feature was extracted, if it differs from that of the previous row. To suppress   this additional column, use the --csv-omit-filenames option.
189 13 Chris Cannam
190 13 Chris Cannam
To make the CSV writer emit the end time instead of the duration   (for features with duration) use the --csv-end-times option.
191 13 Chris Cannam
192 13 Chris Cannam
To make the writer always emit end time or duration, even when the   feature lacks duration, by using the time of the following feature   as the end time, use the --csv-fill-ends option.
193 1 Chris Cannam
 
194 5 Chris Cannam
The default column separator is a comma; you can specify a different one with the <code>--csv-separator</code> option.
195 1 Chris Cannam
196 13 Chris Cannam
h3. lab
197 13 Chris Cannam
198 13 Chris Cannam
Writes the results into a tab-separated label file (.lab).
199 13 Chris Cannam
200 13 Chris Cannam
This is equivalent to using the CSV writer with a tab separator and   the options --csv-end-times --csv-omit-filenames.
201 13 Chris Cannam
202 13 Chris Cannam
It supports the --lab-basedir, --lab-one-file, --lab-stdout,   --lab-force, --lab-append, and --lab-fill-ends options, which all   behave similarly to their CSV writer equivalents.
203 13 Chris Cannam
204 1 Chris Cannam
h3. rdf
205 1 Chris Cannam
206 2 Chris Cannam
Writes the results into RDF/Turtle documents following the "Audio Features ontology":http://purl.org/ontology/af/.
207 1 Chris Cannam
208 5 Chris Cannam
One file is created for each input audio file containing the features extracted by all transforms applied to that file, named   after the input audio file with <code>.n3</code> extension, placed in the same directory as the audio file.
209 1 Chris Cannam
210 1 Chris Cannam
To instruct Sonic Annotator to place the output files in another   location, use <code>--rdf-basedir</code> with a directory name.
211 1 Chris Cannam
212 1 Chris Cannam
To write a single file with all data (from all input audio files)   in it, use <code>--rdf-one-file</code>.
213 1 Chris Cannam
214 1 Chris Cannam
To write one file for each transform applied to each input audio   file, named after the input audio file and transform name with <code>.n3</code>   suffix and ":" replaced by "_" throughout, use <code>--rdf-many-files</code>.
215 1 Chris Cannam
216 1 Chris Cannam
To write all data to standard output instead of to a file, use <code>--rdf-stdout</code>.
217 1 Chris Cannam
218 1 Chris Cannam
Sonic Annotator will not write to an output file that already   exists.  If you want to make it do this, use <code>--rdf-force</code> to   overwrite or <code>--rdf-append</code> to append to it.
219 1 Chris Cannam
220 1 Chris Cannam
Sonic Annotator will use plugin description RDF if available to   enhance its output (for example identifying note onset times as   note onset times, if the plugin's RDF says that is what it   produces, rather than writing them as plain events).  Best results   will be obtained if an RDF document is provided with your plugins   (for example, <code>vamp-example-plugins.n3</code>) and you have this installed   in the same location as the plugins.  To override this enhanced   output and write plain events for all features, use <code>--rdf-plain</code>.
221 1 Chris Cannam
222 1 Chris Cannam
The output RDF will include an <code>available_as</code> property linking the   results to the original audio signal URI.  By default, this will   point to the URI of the file or resource containing the audio that   Sonic Annotator processed, such as the <code>file:///</code> location on disk.   To override this, for example to process a local copy of a file   while generating RDF that describes a copy of it available on a   network, you can use the <code>--rdf-signal-uri</code> option to specify an   alternative signal URI.
223 13 Chris Cannam
224 13 Chris Cannam
h3. json
225 13 Chris Cannam
226 13 Chris Cannam
Writes the results into JSON format following JAMS, the JSON   Annotated Music Specification. This writer is provisional as of   Sonic Annotator v1.1.
227 13 Chris Cannam
228 13 Chris Cannam
h3. midi
229 13 Chris Cannam
230 13 Chris Cannam
Writes the results to MIDI files. All features are written as MIDI   notes.
231 13 Chris Cannam
232 13 Chris Cannam
If a feature has at least one value, its first value will be used   as the note pitch, the second value (if present) for velocity. If a   feature has units of Hz, then its pitch will be converted from   frequency to an integer value in MIDI range, otherwise it will be   written directly.
233 13 Chris Cannam
234 13 Chris Cannam
Multiple (up to 16) transforms can be written to a single MIDI   file, where they will be given separate MIDI channel numbers.
235 1 Chris Cannam
236 1 Chris Cannam
h3. 4. Optionally, how to summarise the features
237 1 Chris Cannam
238 1 Chris Cannam
Sonic Annotator can also calculate and write summaries of features, such as mean and median values.
239 1 Chris Cannam
240 1 Chris Cannam
To obtain a summary as well as the feature results, just use the <code>-S</code> option, naming the type of summary you want (<code>min</code>, <code>max</code>, <code>mean</code>, <code>median</code>, <code>mode</code>, <code>sum</code>, <code>variance</code>, <code>sd</code> or <code>count</code>).  You can also tell it to produce only the summary, not the individual features, with <code>--summary-only</code>.
241 1 Chris Cannam
242 1 Chris Cannam
Alternatively, you can specify a summary in a transform description. The following example tells Sonic Annotator to write both the times of note onsets estimated by the simple percussion onset detector example plugin, and the variance of the plugin's onset detection function.
243 1 Chris Cannam
(It will only process the audio file and run the plugin once.)
244 1 Chris Cannam
245 1 Chris Cannam
<pre>
246 5 Chris Cannam
  @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>.
247 5 Chris Cannam
  @prefix vamp: <http://purl.org/ontology/vamp/>.
248 5 Chris Cannam
  @prefix examples: <http://vamp-plugins.org/rdf/plugins/vamp-example-plugins>.
249 1 Chris Cannam
  @prefix : <#>.
250 1 Chris Cannam
251 1 Chris Cannam
  :transform1 a vamp:Transform;
252 1 Chris Cannam
     vamp:plugin examples:percussiononsets ;
253 1 Chris Cannam
     vamp:output examples:percussiononsets_output_onsets .
254 1 Chris Cannam
255 1 Chris Cannam
  :transform2 a vamp:Transform;
256 1 Chris Cannam
     vamp:plugin examples:percussiononsets ;
257 1 Chris Cannam
     vamp:output examples:percussiononsets_output_detectionfunction ;
258 1 Chris Cannam
     vamp:summary_type "variance" .
259 1 Chris Cannam
</pre>
260 1 Chris Cannam
261 1 Chris Cannam
Sonic Annotator can also summarise in segments &mdash; if you provide a comma-separated list of times as an argument to the <code>--segments</code> option, it will calculate one summary for each segment bounded by the times you provided.  For example,
262 1 Chris Cannam
263 1 Chris Cannam
<pre>
264 1 Chris Cannam
  $ sonic-annotator -d vamp:vamp-example-plugins:percussiononsets:detectionfunction \
265 1 Chris Cannam
    -S variance --sumary-only --segments 1,2,3 -w csv --csv-stdout audio.wav
266 1 Chris Cannam
  (... some log output on stderr, then ...)
267 1 Chris Cannam
  "audio.wav",0.000000000,1.000000000,variance,1723.99,"(variance, continuous-time average)"
268 1 Chris Cannam
  ,1.000000000,1.000000000,variance,1981.75,"(variance, continuous-time average)"
269 1 Chris Cannam
  ,2.000000000,1.000000000,variance,1248.79,"(variance, continuous-time average)"
270 1 Chris Cannam
  ,3.000000000,7.031020407,variance,1030.06,"(variance, continuous-time average)"
271 1 Chris Cannam
</pre>
272 1 Chris Cannam
273 1 Chris Cannam
Here the first row contains a summary covering the time period from 0 to 1 second, the second from 1 to 2 seconds, the third from 2 to 3 seconds and the fourth from 3 seconds to the end of the (short) audio file.