annotate doc/cmdline.rst @ 32:ea905bc19458

update doc
author Amine Sehili <amine.sehili@gmail.com>
date Tue, 01 Dec 2015 20:13:02 +0100
parents
children d28d94bf6b39
rev   line source
amine@32 1 `auditok` Command-line Usage Guide
amine@32 2 ==================================
amine@32 3
amine@32 4 This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases.
amine@32 5
amine@32 6
amine@32 7 .. contents:: `Contents`
amine@32 8 :depth: 3
amine@32 9
amine@32 10
amine@32 11 **********************
amine@32 12 Two-figure explanation
amine@32 13 **********************
amine@32 14
amine@32 15 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
amine@32 16
amine@32 17 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amine@32 18
amine@32 19 .. figure:: figures/figure_1.png
amine@32 20 :align: center
amine@32 21 :alt: alternate text
amine@32 22 :figclass: align-center
amine@32 23
amine@32 24 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amine@32 25
amine@32 26 .. figure:: figures/figure_2.png
amine@32 27 :align: center
amine@32 28 :alt: alternate text
amine@32 29 :figclass: align-center
amine@32 30
amine@32 31
amine@32 32 ******************
amine@32 33 Command line usage
amine@32 34 ******************
amine@32 35
amine@32 36 Try the detector with your voice
amine@32 37 ################################
amine@32 38
amine@32 39 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@32 40
amine@32 41 .. code:: bash
amine@32 42
amine@32 43 auditok
amine@32 44
amine@32 45 This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
amine@32 46
amine@32 47 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
amine@32 48
amine@32 49 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
amine@32 50
amine@32 51
amine@32 52 +-----------------+------------+----------------+-----------------------+
amine@32 53 | Audio parameter | sox option | auditok option | `auditok` default |
amine@32 54 +=================+============+================+=======================+
amine@32 55 | Sampling rate | -r | -r | 16000 |
amine@32 56 +-----------------+------------+----------------+-----------------------+
amine@32 57 | Sample width | -b (bits) | -w (bytes) | 2 |
amine@32 58 +-----------------+------------+----------------+-----------------------+
amine@32 59 | Channels | -c | -c | 1 |
amine@32 60 +-----------------+------------+----------------+-----------------------+
amine@32 61 | Encoding | -e | None | always signed integer |
amine@32 62 +-----------------+------------+----------------+-----------------------+
amine@32 63
amine@32 64 According to this table, the previous command can be run as:
amine@32 65
amine@32 66 .. code:: bash
amine@32 67
amine@32 68 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
amine@32 69
amine@32 70 Play back detections
amine@32 71 ####################
amine@32 72
amine@32 73 .. code:: bash
amine@32 74
amine@32 75 auditok -E
amine@32 76
amine@32 77 OR
amine@32 78
amine@32 79 .. code:: bash
amine@32 80
amine@32 81 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
amine@32 82
amine@32 83 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amine@32 84
amine@32 85 .. code:: bash
amine@32 86
amine@32 87 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32 88
amine@32 89 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amine@32 90
amine@32 91 `rec` and `play` are just an alias for `sox`.
amine@32 92
amine@32 93 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@32 94
amine@32 95 Set detection threshold
amine@32 96 #######################
amine@32 97
amine@32 98 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amine@32 99
amine@32 100 .. code:: bash
amine@32 101
amine@32 102 auditok -E -e 55
amine@32 103
amine@32 104 OR
amine@32 105
amine@32 106 .. code:: bash
amine@32 107
amine@32 108 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32 109
amine@32 110 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amine@32 111
amine@32 112 Set format for printed detections information
amine@32 113 #############################################
amine@32 114
amine@32 115 By default, `auditok` prints the `id` `start time` `end time` of each detected activity:
amine@32 116
amine@32 117 .. code:: bash
amine@32 118
amine@32 119 1 1.87 2.67
amine@32 120 2 3.05 3.73
amine@32 121 3 3.97 4.49
amine@32 122 ...
amine@32 123
amine@32 124 If you want to personalize the output format, use `--printf` option:
amine@32 125
amine@32 126 auditok -e 55 --printf "[{id}]: {start} to {end}"
amine@32 127
amine@32 128 Output:
amine@32 129
amine@32 130 .. code:: bash
amine@32 131
amine@32 132 [1]: 0.22 to 0.67
amine@32 133 [2]: 2.81 to 4.18
amine@32 134 [3]: 5.53 to 6.44
amine@32 135 [4]: 7.32 to 7.82
amine@32 136 ...
amine@32 137
amine@32 138 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amine@32 139
amine@32 140 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amine@32 141
amine@32 142 Output:
amine@32 143
amine@32 144 .. code:: bash
amine@32 145
amine@32 146 [1]: 00:00:01.080 to 00:00:01.760
amine@32 147 [2]: 00:00:02.420 to 00:00:03.440
amine@32 148 [3]: 00:00:04.930 to 00:00:05.570
amine@32 149 [4]: 00:00:05.690 to 00:00:06.020
amine@32 150 [5]: 00:00:07.470 to 00:00:07.980
amine@32 151 ...
amine@32 152
amine@32 153 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amine@32 154
amine@32 155 1st Practical use case example: generate a subtitles template
amine@32 156 #############################################################
amine@32 157
amine@32 158 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amine@32 159
amine@32 160 .. code:: bash
amine@32 161
amine@32 162 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amine@32 163
amine@32 164 Output:
amine@32 165
amine@32 166 .. code:: bash
amine@32 167
amine@32 168 1
amine@32 169 00:00:00.730 --> 00:00:01.460
amine@32 170 Put some text here...
amine@32 171
amine@32 172 2
amine@32 173 00:00:02.440 --> 00:00:03.900
amine@32 174 Put some text here...
amine@32 175
amine@32 176 3
amine@32 177 00:00:06.410 --> 00:00:06.970
amine@32 178 Put some text here...
amine@32 179
amine@32 180 4
amine@32 181 00:00:07.260 --> 00:00:08.340
amine@32 182 Put some text here...
amine@32 183
amine@32 184 5
amine@32 185 00:00:09.510 --> 00:00:09.820
amine@32 186 Put some text here...
amine@32 187
amine@32 188
amine@32 189 2nd Practical use case example: build ab (very) basic voice control application
amine@32 190 ###############################################################################
amine@32 191
amine@32 192 `This repository <https://github.com/amsehili/gspeech-rec>`_ supplies a bash script the can send audio data to Google's
amine@32 193 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
amine@32 194 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
amine@32 195 number of commands that make up the rest of our voice control application.
amine@32 196
amine@32 197 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
amine@32 198
amine@32 199 1- Convert raw audio data to flac using **sox**:
amine@32 200
amine@32 201 .. code:: bash
amine@32 202
amine@32 203 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
amine@32 204
amine@32 205 2- Send falc audio to google and get its filtred transcription using `speech-rec.sh <https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh>`_ :
amine@32 206
amine@32 207 .. code:: bash
amine@32 208
amine@32 209 speech-rec.sh -i output.flac -r 16000
amine@32 210
amine@32 211 3- Use **grep** to select lines that coantain *transcript*:
amine@32 212
amine@32 213 .. code:: bash
amine@32 214
amine@32 215 grep transcript
amine@32 216
amine@32 217
amine@32 218 4- Launch the followin script, giving it the transcription as input:
amine@32 219
amine@32 220 .. code:: bash
amine@32 221
amine@32 222 #!/bin/bash
amine@32 223
amine@32 224 read line
amine@32 225
amine@32 226 RES=`echo "$line" | grep -i "open firefox"`
amine@32 227
amine@32 228 if [[ $RES ]]
amine@32 229 then
amine@32 230 echo "Launch command: 'firefox &' ... "
amine@32 231 firefox &
amine@32 232 exit 0
amine@32 233 fi
amine@32 234
amine@32 235 exit 0
amine@32 236
amine@32 237 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **run firefox**.
amine@32 238 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
amine@32 239
amine@32 240 Now, thanks to option `-C`, we will use the three instructions with a pipe and tell auditok to run them for every time it detects
amine@32 241 an audio activity. Try the following command and say *open firefox*:
amine@32 242
amine@32 243
amine@32 244 .. code:: bash
amine@32 245
amine@32 246 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
amine@32 247
amine@32 248
amine@32 249
amine@32 250
amine@32 251 Plot signal and detections
amine@32 252 ##########################
amine@32 253
amine@32 254 use option `-p`. Requires `matplotlib` and `numpy`.
amine@32 255
amine@32 256 .. code:: bash
amine@32 257
amine@32 258 auditok ... -p
amine@32 259
amine@32 260
amine@32 261 Save plot as image or PDF
amine@32 262 #########################
amine@32 263
amine@32 264 .. code:: bash
amine@32 265
amine@32 266 auditok ... --save-image output.png
amine@32 267
amine@32 268 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amine@32 269
amine@32 270
amine@32 271 Read data from file
amine@32 272 ###################
amine@32 273
amine@32 274 .. code:: bash
amine@32 275
amine@32 276 auditok -i input.wav ...
amine@32 277
amine@32 278 Install `pydub` for other audio formats.
amine@32 279
amine@32 280
amine@32 281 Limit the length of acquired data
amine@32 282 #################################
amine@32 283
amine@32 284 .. code:: bash
amine@32 285
amine@32 286 auditok -M 12 ...
amine@32 287
amine@32 288 Time is in seconds.
amine@32 289
amine@32 290
amine@32 291 Save the whole acquired audio signal
amine@32 292 ####################################
amine@32 293
amine@32 294 .. code:: bash
amine@32 295
amine@32 296 auditok -O output.wav ...
amine@32 297
amine@32 298 Install `pydub` for other audio formats.
amine@32 299
amine@32 300
amine@32 301 Save each detection into a separate audio file
amine@32 302 ##############################################
amine@32 303
amine@32 304 .. code:: bash
amine@32 305
amine@32 306 auditok -o det_{N}_{start}_{end}.wav ...
amine@32 307
amine@32 308 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
amine@32 309
amine@32 310 .. code:: bash
amine@32 311
amine@32 312 auditok -o {start}-{end}.wav ...
amine@32 313
amine@32 314 Install `pydub` for more audio formats.
amine@32 315
amine@32 316
amine@32 317 Setting detection parameters
amine@32 318 ############################
amine@32 319
amine@32 320 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amine@32 321
amine@32 322 +--------+-------------------------------------------------------+---------+------------------+
amine@32 323 | Option | Description | Unit | Default |
amine@32 324 +========+=======================================================+=========+==================+
amine@32 325 | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) |
amine@32 326 +--------+-------------------------------------------------------+---------+------------------+
amine@32 327 | `-m` | Maximum length an accepted audio activity should reach| second | 5. |
amine@32 328 +--------+-------------------------------------------------------+---------+------------------+
amine@32 329 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
amine@32 330 | | an accepted audio activity | | |
amine@32 331 +--------+-------------------------------------------------------+---------+------------------+
amine@32 332 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
amine@32 333 +--------+-------------------------------------------------------+---------+------------------+
amine@32 334 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
amine@32 335 +--------+-------------------------------------------------------+---------+------------------+
amine@32 336
amine@32 337
amine@32 338 *******
amine@32 339 License
amine@32 340 *******
amine@32 341
amine@32 342 `auditok` is published under the GNU General Public License Version 3.
amine@32 343
amine@32 344 ******
amine@32 345 Author
amine@32 346 ******
amine@32 347 Amine Sehili (<amine.sehili@gmail.com>)