annotate doc/cmdline.rst @ 319:8ed121db987c

Use ABC instead of ABCMeta in util.py
author Amine Sehili <amine.sehili@gmail.com>
date Fri, 18 Oct 2019 21:48:12 +0100
parents 929c1e7477ac
children 9741b52f194a
rev   line source
amine@32 1 `auditok` Command-line Usage Guide
amine@32 2 ==================================
amine@32 3
amine@32 4 This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases.
amine@32 5
amine@32 6 .. contents:: `Contents`
amine@32 7 :depth: 3
amine@32 8
amine@32 9
amine@32 10 **********************
amine@32 11 Two-figure explanation
amine@32 12 **********************
amine@32 13
amine@35 14 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to
amine@35 15 a given threshold (red dashed line). They respectively depict the detection result when:
amine@32 16
amine@32 17 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amine@32 18
amine@32 19 .. figure:: figures/figure_1.png
amine@32 20 :align: center
amine@33 21 :alt: Output from a detector that tolerates silence periods up to 300 ms
amine@32 22 :figclass: align-center
amine@33 23 :scale: 40 %
amine@32 24
amine@32 25 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amine@32 26
amine@32 27 .. figure:: figures/figure_2.png
amine@32 28 :align: center
amine@33 29 :alt: Output from a detector that tolerates silence periods up to 200 ms
amine@32 30 :figclass: align-center
amine@33 31 :scale: 40 %
amine@32 32
amine@35 33 Beyond plotting signal and detections, you can play back audio activities as they are detected, save them or run a user command each time there is an activity,
amine@35 34 using, optionally, the file name of audio activity as an argument for the command.
amine@32 35
amine@32 36 ******************
amine@32 37 Command line usage
amine@32 38 ******************
amine@32 39
amine@32 40 Try the detector with your voice
amine@32 41 ################################
amine@32 42
amine@35 43 The first thing you want to check is perhaps how **auditok** detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@32 44
amine@32 45 .. code:: bash
amine@32 46
amine@32 47 auditok
amine@32 48
amine@35 49 This will print **id** **start-time** and **end-time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell **auditok** to read data from standard input:
amine@32 50
amine@32 51 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
amine@32 52
amine@35 53 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and **auditok**. The following table summarizes audio parameters.
amine@32 54
amine@32 55
amine@35 56 +-----------------+------------+------------------+-----------------------+
amine@35 57 | Audio parameter | sox option | `auditok` option | `auditok` default |
amine@35 58 +=================+============+==================+=======================+
amine@35 59 | Sampling rate | -r | -r | 16000 |
amine@35 60 +-----------------+------------+------------------+-----------------------+
amine@35 61 | Sample width | -b (bits) | -w (bytes) | 2 |
amine@35 62 +-----------------+------------+------------------+-----------------------+
amine@35 63 | Channels | -c | -c | 1 |
amine@35 64 +-----------------+------------+------------------+-----------------------+
amine@35 65 | Encoding | -e | None | always signed integer |
amine@35 66 +-----------------+------------+------------------+-----------------------+
amine@32 67
amine@32 68 According to this table, the previous command can be run as:
amine@32 69
amine@32 70 .. code:: bash
amine@32 71
amine@32 72 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
amine@32 73
amine@32 74 Play back detections
amine@32 75 ####################
amine@32 76
amine@32 77 .. code:: bash
amine@32 78
amine@32 79 auditok -E
amine@32 80
amine@35 81 :or:
amine@32 82
amine@32 83 .. code:: bash
amine@32 84
amine@32 85 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
amine@32 86
amine@35 87 Option `-E` stands for echo, so **auditok** will play back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amine@32 88
amine@32 89 .. code:: bash
amine@32 90
amine@32 91 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32 92
amine@35 93 The `-C` option tells **auditok** to interpret its content as a command that should be run whenever **auditok** detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amine@32 94
amine@32 95 `rec` and `play` are just an alias for `sox`.
amine@32 96
amine@32 97 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@32 98
amine@32 99 Set detection threshold
amine@32 100 #######################
amine@32 101
amine@32 102 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amine@32 103
amine@32 104 .. code:: bash
amine@32 105
amine@32 106 auditok -E -e 55
amine@32 107
amine@35 108 :or:
amine@32 109
amine@32 110 .. code:: bash
amine@32 111
amine@32 112 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32 113
amine@32 114 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amine@32 115
amine@32 116 Set format for printed detections information
amine@32 117 #############################################
amine@32 118
amine@35 119 By default, **auditok** prints the **id**, **start-time** and **end-time** of each detected activity:
amine@32 120
amine@32 121 .. code:: bash
amine@32 122
amine@32 123 1 1.87 2.67
amine@32 124 2 3.05 3.73
amine@32 125 3 3.97 4.49
amine@32 126 ...
amine@32 127
amine@35 128 If you want to customize the output format, use `--printf` option:
amine@35 129
amine@35 130 .. code:: bash
amine@32 131
amine@32 132 auditok -e 55 --printf "[{id}]: {start} to {end}"
amine@32 133
amine@35 134 :output:
amine@32 135
amine@32 136 .. code:: bash
amine@32 137
amine@32 138 [1]: 0.22 to 0.67
amine@32 139 [2]: 2.81 to 4.18
amine@32 140 [3]: 5.53 to 6.44
amine@32 141 [4]: 7.32 to 7.82
amine@32 142 ...
amine@32 143
amine@32 144 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amine@32 145
amine@32 146 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amine@32 147
amine@35 148 :output:
amine@32 149
amine@32 150 .. code:: bash
amine@32 151
amine@32 152 [1]: 00:00:01.080 to 00:00:01.760
amine@32 153 [2]: 00:00:02.420 to 00:00:03.440
amine@32 154 [3]: 00:00:04.930 to 00:00:05.570
amine@32 155 [4]: 00:00:05.690 to 00:00:06.020
amine@32 156 [5]: 00:00:07.470 to 00:00:07.980
amine@32 157 ...
amine@32 158
amine@32 159 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amine@32 160
amine@32 161 1st Practical use case example: generate a subtitles template
amine@32 162 #############################################################
amine@32 163
amine@32 164 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amine@32 165
amine@32 166 .. code:: bash
amine@32 167
amine@32 168 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amine@32 169
amine@35 170 :output:
amine@32 171
amine@32 172 .. code:: bash
amine@32 173
amine@32 174 1
amine@32 175 00:00:00.730 --> 00:00:01.460
amine@32 176 Put some text here...
amine@32 177
amine@32 178 2
amine@32 179 00:00:02.440 --> 00:00:03.900
amine@32 180 Put some text here...
amine@32 181
amine@32 182 3
amine@32 183 00:00:06.410 --> 00:00:06.970
amine@32 184 Put some text here...
amine@32 185
amine@32 186 4
amine@32 187 00:00:07.260 --> 00:00:08.340
amine@32 188 Put some text here...
amine@32 189
amine@32 190 5
amine@32 191 00:00:09.510 --> 00:00:09.820
amine@32 192 Put some text here...
amine@32 193
amine@32 194
amine@33 195 2nd Practical use case example: build a (very) basic voice control application
amine@33 196 ##############################################################################
amine@32 197
amine@32 198 `This repository <https://github.com/amsehili/gspeech-rec>`_ supplies a bash script the can send audio data to Google's
amine@32 199 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
amine@32 200 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
amine@32 201 number of commands that make up the rest of our voice control application.
amine@32 202
amine@32 203 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
amine@32 204
amine@32 205 1- Convert raw audio data to flac using **sox**:
amine@32 206
amine@32 207 .. code:: bash
amine@32 208
amine@32 209 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
amine@32 210
amine@35 211 2- Send flac audio data to Google and get its filtered transcription using `speech-rec.sh <https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh>`_ :
amine@32 212
amine@32 213 .. code:: bash
amine@32 214
amine@32 215 speech-rec.sh -i output.flac -r 16000
amine@32 216
amine@35 217 3- Use **grep** to select lines that contain *transcript*:
amine@32 218
amine@32 219 .. code:: bash
amine@32 220
amine@32 221 grep transcript
amine@32 222
amine@32 223
amine@35 224 4- Launch the following script, giving it the transcription as input:
amine@32 225
amine@32 226 .. code:: bash
amine@32 227
amine@32 228 #!/bin/bash
amine@32 229
amine@32 230 read line
amine@32 231
amine@32 232 RES=`echo "$line" | grep -i "open firefox"`
amine@32 233
amine@32 234 if [[ $RES ]]
amine@32 235 then
amine@32 236 echo "Launch command: 'firefox &' ... "
amine@32 237 firefox &
amine@32 238 exit 0
amine@32 239 fi
amine@32 240
amine@32 241 exit 0
amine@32 242
amine@35 243 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**.
amine@32 244 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
amine@32 245
amine@35 246 Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects
amine@32 247 an audio activity. Try the following command and say *open firefox*:
amine@32 248
amine@32 249
amine@32 250 .. code:: bash
amine@32 251
amine@35 252 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
amine@32 253
amine@35 254 Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and
amine@35 255 option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second.
amine@32 256
amine@35 257 With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given.
amine@32 258
amine@32 259
amine@32 260 Plot signal and detections
amine@32 261 ##########################
amine@32 262
amine@32 263 use option `-p`. Requires `matplotlib` and `numpy`.
amine@32 264
amine@32 265 .. code:: bash
amine@32 266
amine@32 267 auditok ... -p
amine@32 268
amine@32 269
amine@32 270 Save plot as image or PDF
amine@32 271 #########################
amine@32 272
amine@32 273 .. code:: bash
amine@32 274
amine@32 275 auditok ... --save-image output.png
amine@32 276
amine@32 277 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amine@32 278
amine@32 279
amine@32 280 Read data from file
amine@32 281 ###################
amine@32 282
amine@32 283 .. code:: bash
amine@32 284
amine@32 285 auditok -i input.wav ...
amine@32 286
amine@32 287 Install `pydub` for other audio formats.
amine@32 288
amine@32 289
amine@32 290 Limit the length of acquired data
amine@32 291 #################################
amine@32 292
amine@32 293 .. code:: bash
amine@32 294
amine@32 295 auditok -M 12 ...
amine@32 296
amine@35 297 Time is in seconds. This is valid for data read from an audio device, stdin or an audio file.
amine@32 298
amine@32 299
amine@32 300 Save the whole acquired audio signal
amine@32 301 ####################################
amine@32 302
amine@32 303 .. code:: bash
amine@32 304
amine@32 305 auditok -O output.wav ...
amine@32 306
amine@32 307 Install `pydub` for other audio formats.
amine@32 308
amine@32 309
amine@32 310 Save each detection into a separate audio file
amine@32 311 ##############################################
amine@32 312
amine@32 313 .. code:: bash
amine@32 314
amine@32 315 auditok -o det_{N}_{start}_{end}.wav ...
amine@32 316
amine@32 317 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
amine@32 318
amine@32 319 .. code:: bash
amine@32 320
amine@32 321 auditok -o {start}-{end}.wav ...
amine@32 322
amine@32 323 Install `pydub` for more audio formats.
amine@32 324
amine@32 325
amine@32 326 Setting detection parameters
amine@32 327 ############################
amine@32 328
amine@32 329 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amine@32 330
amine@32 331 +--------+-------------------------------------------------------+---------+------------------+
amine@32 332 | Option | Description | Unit | Default |
amine@32 333 +========+=======================================================+=========+==================+
amine@32 334 | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) |
amine@32 335 +--------+-------------------------------------------------------+---------+------------------+
amine@32 336 | `-m` | Maximum length an accepted audio activity should reach| second | 5. |
amine@32 337 +--------+-------------------------------------------------------+---------+------------------+
amine@32 338 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
amine@32 339 | | an accepted audio activity | | |
amine@32 340 +--------+-------------------------------------------------------+---------+------------------+
amine@32 341 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
amine@32 342 +--------+-------------------------------------------------------+---------+------------------+
amine@32 343 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
amine@32 344 +--------+-------------------------------------------------------+---------+------------------+
amine@32 345
amine@32 346
amine@35 347 Normally, `auditok` does keeps trailing silence of a detected activity. Trailing silence is at most as long as maximum length of a continuous silence (option `-m`) and can be important for some applications such as speech recognition. If you want to drop trailing silence anyway use option `-d`. The following two figures show the output of the detector when it keeps the trailing silence and when it drops it respectively:
amine@35 348
amine@35 349
amine@35 350 .. figure:: figures/figure_3_keep_trailing_silence.png
amine@35 351 :align: center
amine@35 352 :alt: Output from a detector that keeps trailing silence
amine@35 353 :figclass: align-center
amine@35 354 :scale: 40 %
amine@35 355
amine@35 356
amine@35 357 .. code:: bash
amine@35 358
amine@35 359 auditok ... -d
amine@35 360
amine@35 361
amine@35 362 .. figure:: figures/figure_4_drop_trailing_silence.png
amine@35 363 :align: center
amine@35 364 :alt: Output from a detector that drop trailing silence
amine@35 365 :figclass: align-center
amine@35 366 :scale: 40 %
amine@35 367
amine@35 368 You might want to only consider audio activities if they are above a certain duration. The next figure is the result of a detector that only accepts detections of 0.8 second and longer:
amine@35 369
amine@35 370 .. code:: bash
amine@35 371
amine@35 372 auditok ... -n 0.8
amine@35 373
amine@35 374
amine@35 375 .. figure:: figures/figure_5_min_800ms.png
amine@35 376 :align: center
amine@35 377 :alt: Output from a detector that detect activities of 800 ms or over
amine@35 378 :figclass: align-center
amine@35 379 :scale: 40 %
amine@35 380
amine@35 381
amine@35 382 Finally it is almost always interesting to limit the length of detected audio activities. In any case, one does not want a too long audio event such as an alarm or a drill to hog the detector. For illustration purposes, we set the maximum duration to 0.4 second for this detector, so an audio activity is delivered as soon as it reaches 0.4 second:
amine@35 383
amine@35 384 .. code:: bash
amine@35 385
amine@35 386 auditok ... -m 0.4
amine@35 387
amine@35 388
amine@35 389 .. figure:: figures/figure_6_max_400ms.png
amine@35 390 :align: center
amine@35 391 :alt: Output from a detector that delivers audio activities that reach 400 ms
amine@35 392 :figclass: align-center
amine@35 393 :scale: 40 %
amine@35 394
amine@35 395
amine@35 396 Debugging
amine@35 397 #########
amine@35 398
amine@35 399 If you want to print what happens when something is detected, use option `-D`.
amine@35 400
amine@35 401 .. code:: bash
amine@35 402
amine@35 403 auditok ... -D
amine@35 404
amine@35 405
amine@35 406 If you want to save everything into a log file, use `--debug-file file.log`.
amine@35 407
amine@35 408 .. code:: bash
amine@35 409
amine@35 410 auditok ... --debug-file file.log
amine@35 411
amine@35 412
amine@35 413
amine@35 414
amine@32 415 *******
amine@32 416 License
amine@32 417 *******
amine@32 418
amine@35 419 **auditok** is published under the GNU General Public License Version 3.
amine@32 420
amine@32 421 ******
amine@32 422 Author
amine@32 423 ******
amine@32 424 Amine Sehili (<amine.sehili@gmail.com>)