amine@32
|
1 `auditok` Command-line Usage Guide
|
amine@32
|
2 ==================================
|
amine@32
|
3
|
amine@32
|
4 This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases.
|
amine@32
|
5
|
amine@32
|
6
|
amine@32
|
7 .. contents:: `Contents`
|
amine@32
|
8 :depth: 3
|
amine@32
|
9
|
amine@32
|
10
|
amine@32
|
11 **********************
|
amine@32
|
12 Two-figure explanation
|
amine@32
|
13 **********************
|
amine@32
|
14
|
amine@32
|
15 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
|
amine@32
|
16
|
amine@32
|
17 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
|
amine@32
|
18
|
amine@32
|
19 .. figure:: figures/figure_1.png
|
amine@32
|
20 :align: center
|
amine@33
|
21 :alt: Output from a detector that tolerates silence periods up to 300 ms
|
amine@32
|
22 :figclass: align-center
|
amine@33
|
23 :scale: 40 %
|
amine@32
|
24
|
amine@32
|
25 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
|
amine@32
|
26
|
amine@32
|
27 .. figure:: figures/figure_2.png
|
amine@32
|
28 :align: center
|
amine@33
|
29 :alt: Output from a detector that tolerates silence periods up to 200 ms
|
amine@32
|
30 :figclass: align-center
|
amine@33
|
31 :scale: 40 %
|
amine@32
|
32
|
amine@32
|
33
|
amine@32
|
34 ******************
|
amine@32
|
35 Command line usage
|
amine@32
|
36 ******************
|
amine@32
|
37
|
amine@32
|
38 Try the detector with your voice
|
amine@32
|
39 ################################
|
amine@32
|
40
|
amine@32
|
41 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
|
amine@32
|
42
|
amine@32
|
43 .. code:: bash
|
amine@32
|
44
|
amine@32
|
45 auditok
|
amine@32
|
46
|
amine@32
|
47 This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
|
amine@32
|
48
|
amine@32
|
49 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
|
amine@32
|
50
|
amine@32
|
51 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
|
amine@32
|
52
|
amine@32
|
53
|
amine@32
|
54 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
55 | Audio parameter | sox option | auditok option | `auditok` default |
|
amine@32
|
56 +=================+============+================+=======================+
|
amine@32
|
57 | Sampling rate | -r | -r | 16000 |
|
amine@32
|
58 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
59 | Sample width | -b (bits) | -w (bytes) | 2 |
|
amine@32
|
60 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
61 | Channels | -c | -c | 1 |
|
amine@32
|
62 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
63 | Encoding | -e | None | always signed integer |
|
amine@32
|
64 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
65
|
amine@32
|
66 According to this table, the previous command can be run as:
|
amine@32
|
67
|
amine@32
|
68 .. code:: bash
|
amine@32
|
69
|
amine@32
|
70 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
|
amine@32
|
71
|
amine@32
|
72 Play back detections
|
amine@32
|
73 ####################
|
amine@32
|
74
|
amine@32
|
75 .. code:: bash
|
amine@32
|
76
|
amine@32
|
77 auditok -E
|
amine@32
|
78
|
amine@32
|
79 OR
|
amine@32
|
80
|
amine@32
|
81 .. code:: bash
|
amine@32
|
82
|
amine@32
|
83 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
|
amine@32
|
84
|
amine@32
|
85 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
|
amine@32
|
86
|
amine@32
|
87 .. code:: bash
|
amine@32
|
88
|
amine@32
|
89 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
|
amine@32
|
90
|
amine@32
|
91 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
|
amine@32
|
92
|
amine@32
|
93 `rec` and `play` are just an alias for `sox`.
|
amine@32
|
94
|
amine@32
|
95 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
|
amine@32
|
96
|
amine@32
|
97 Set detection threshold
|
amine@32
|
98 #######################
|
amine@32
|
99
|
amine@32
|
100 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
|
amine@32
|
101
|
amine@32
|
102 .. code:: bash
|
amine@32
|
103
|
amine@32
|
104 auditok -E -e 55
|
amine@32
|
105
|
amine@32
|
106 OR
|
amine@32
|
107
|
amine@32
|
108 .. code:: bash
|
amine@32
|
109
|
amine@32
|
110 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
|
amine@32
|
111
|
amine@32
|
112 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
|
amine@32
|
113
|
amine@32
|
114 Set format for printed detections information
|
amine@32
|
115 #############################################
|
amine@32
|
116
|
amine@32
|
117 By default, `auditok` prints the `id` `start time` `end time` of each detected activity:
|
amine@32
|
118
|
amine@32
|
119 .. code:: bash
|
amine@32
|
120
|
amine@32
|
121 1 1.87 2.67
|
amine@32
|
122 2 3.05 3.73
|
amine@32
|
123 3 3.97 4.49
|
amine@32
|
124 ...
|
amine@32
|
125
|
amine@32
|
126 If you want to personalize the output format, use `--printf` option:
|
amine@32
|
127
|
amine@32
|
128 auditok -e 55 --printf "[{id}]: {start} to {end}"
|
amine@32
|
129
|
amine@32
|
130 Output:
|
amine@32
|
131
|
amine@32
|
132 .. code:: bash
|
amine@32
|
133
|
amine@32
|
134 [1]: 0.22 to 0.67
|
amine@32
|
135 [2]: 2.81 to 4.18
|
amine@32
|
136 [3]: 5.53 to 6.44
|
amine@32
|
137 [4]: 7.32 to 7.82
|
amine@32
|
138 ...
|
amine@32
|
139
|
amine@32
|
140 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
|
amine@32
|
141
|
amine@32
|
142 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
|
amine@32
|
143
|
amine@32
|
144 Output:
|
amine@32
|
145
|
amine@32
|
146 .. code:: bash
|
amine@32
|
147
|
amine@32
|
148 [1]: 00:00:01.080 to 00:00:01.760
|
amine@32
|
149 [2]: 00:00:02.420 to 00:00:03.440
|
amine@32
|
150 [3]: 00:00:04.930 to 00:00:05.570
|
amine@32
|
151 [4]: 00:00:05.690 to 00:00:06.020
|
amine@32
|
152 [5]: 00:00:07.470 to 00:00:07.980
|
amine@32
|
153 ...
|
amine@32
|
154
|
amine@32
|
155 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
|
amine@32
|
156
|
amine@32
|
157 1st Practical use case example: generate a subtitles template
|
amine@32
|
158 #############################################################
|
amine@32
|
159
|
amine@32
|
160 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
|
amine@32
|
161
|
amine@32
|
162 .. code:: bash
|
amine@32
|
163
|
amine@32
|
164 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
|
amine@32
|
165
|
amine@32
|
166 Output:
|
amine@32
|
167
|
amine@32
|
168 .. code:: bash
|
amine@32
|
169
|
amine@32
|
170 1
|
amine@32
|
171 00:00:00.730 --> 00:00:01.460
|
amine@32
|
172 Put some text here...
|
amine@32
|
173
|
amine@32
|
174 2
|
amine@32
|
175 00:00:02.440 --> 00:00:03.900
|
amine@32
|
176 Put some text here...
|
amine@32
|
177
|
amine@32
|
178 3
|
amine@32
|
179 00:00:06.410 --> 00:00:06.970
|
amine@32
|
180 Put some text here...
|
amine@32
|
181
|
amine@32
|
182 4
|
amine@32
|
183 00:00:07.260 --> 00:00:08.340
|
amine@32
|
184 Put some text here...
|
amine@32
|
185
|
amine@32
|
186 5
|
amine@32
|
187 00:00:09.510 --> 00:00:09.820
|
amine@32
|
188 Put some text here...
|
amine@32
|
189
|
amine@32
|
190
|
amine@33
|
191 2nd Practical use case example: build a (very) basic voice control application
|
amine@33
|
192 ##############################################################################
|
amine@32
|
193
|
amine@32
|
194 `This repository <https://github.com/amsehili/gspeech-rec>`_ supplies a bash script the can send audio data to Google's
|
amine@32
|
195 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
|
amine@32
|
196 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
|
amine@32
|
197 number of commands that make up the rest of our voice control application.
|
amine@32
|
198
|
amine@32
|
199 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
|
amine@32
|
200
|
amine@32
|
201 1- Convert raw audio data to flac using **sox**:
|
amine@32
|
202
|
amine@32
|
203 .. code:: bash
|
amine@32
|
204
|
amine@32
|
205 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
|
amine@32
|
206
|
amine@32
|
207 2- Send falc audio to google and get its filtred transcription using `speech-rec.sh <https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh>`_ :
|
amine@32
|
208
|
amine@32
|
209 .. code:: bash
|
amine@32
|
210
|
amine@32
|
211 speech-rec.sh -i output.flac -r 16000
|
amine@32
|
212
|
amine@32
|
213 3- Use **grep** to select lines that coantain *transcript*:
|
amine@32
|
214
|
amine@32
|
215 .. code:: bash
|
amine@32
|
216
|
amine@32
|
217 grep transcript
|
amine@32
|
218
|
amine@32
|
219
|
amine@32
|
220 4- Launch the followin script, giving it the transcription as input:
|
amine@32
|
221
|
amine@32
|
222 .. code:: bash
|
amine@32
|
223
|
amine@32
|
224 #!/bin/bash
|
amine@32
|
225
|
amine@32
|
226 read line
|
amine@32
|
227
|
amine@32
|
228 RES=`echo "$line" | grep -i "open firefox"`
|
amine@32
|
229
|
amine@32
|
230 if [[ $RES ]]
|
amine@32
|
231 then
|
amine@32
|
232 echo "Launch command: 'firefox &' ... "
|
amine@32
|
233 firefox &
|
amine@32
|
234 exit 0
|
amine@32
|
235 fi
|
amine@32
|
236
|
amine@32
|
237 exit 0
|
amine@32
|
238
|
amine@32
|
239 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **run firefox**.
|
amine@32
|
240 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
|
amine@32
|
241
|
amine@32
|
242 Now, thanks to option `-C`, we will use the three instructions with a pipe and tell auditok to run them for every time it detects
|
amine@32
|
243 an audio activity. Try the following command and say *open firefox*:
|
amine@32
|
244
|
amine@32
|
245
|
amine@32
|
246 .. code:: bash
|
amine@32
|
247
|
amine@32
|
248 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
|
amine@32
|
249
|
amine@32
|
250
|
amine@32
|
251
|
amine@32
|
252
|
amine@32
|
253 Plot signal and detections
|
amine@32
|
254 ##########################
|
amine@32
|
255
|
amine@32
|
256 use option `-p`. Requires `matplotlib` and `numpy`.
|
amine@32
|
257
|
amine@32
|
258 .. code:: bash
|
amine@32
|
259
|
amine@32
|
260 auditok ... -p
|
amine@32
|
261
|
amine@32
|
262
|
amine@32
|
263 Save plot as image or PDF
|
amine@32
|
264 #########################
|
amine@32
|
265
|
amine@32
|
266 .. code:: bash
|
amine@32
|
267
|
amine@32
|
268 auditok ... --save-image output.png
|
amine@32
|
269
|
amine@32
|
270 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
|
amine@32
|
271
|
amine@32
|
272
|
amine@32
|
273 Read data from file
|
amine@32
|
274 ###################
|
amine@32
|
275
|
amine@32
|
276 .. code:: bash
|
amine@32
|
277
|
amine@32
|
278 auditok -i input.wav ...
|
amine@32
|
279
|
amine@32
|
280 Install `pydub` for other audio formats.
|
amine@32
|
281
|
amine@32
|
282
|
amine@32
|
283 Limit the length of acquired data
|
amine@32
|
284 #################################
|
amine@32
|
285
|
amine@32
|
286 .. code:: bash
|
amine@32
|
287
|
amine@32
|
288 auditok -M 12 ...
|
amine@32
|
289
|
amine@32
|
290 Time is in seconds.
|
amine@32
|
291
|
amine@32
|
292
|
amine@32
|
293 Save the whole acquired audio signal
|
amine@32
|
294 ####################################
|
amine@32
|
295
|
amine@32
|
296 .. code:: bash
|
amine@32
|
297
|
amine@32
|
298 auditok -O output.wav ...
|
amine@32
|
299
|
amine@32
|
300 Install `pydub` for other audio formats.
|
amine@32
|
301
|
amine@32
|
302
|
amine@32
|
303 Save each detection into a separate audio file
|
amine@32
|
304 ##############################################
|
amine@32
|
305
|
amine@32
|
306 .. code:: bash
|
amine@32
|
307
|
amine@32
|
308 auditok -o det_{N}_{start}_{end}.wav ...
|
amine@32
|
309
|
amine@32
|
310 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
|
amine@32
|
311
|
amine@32
|
312 .. code:: bash
|
amine@32
|
313
|
amine@32
|
314 auditok -o {start}-{end}.wav ...
|
amine@32
|
315
|
amine@32
|
316 Install `pydub` for more audio formats.
|
amine@32
|
317
|
amine@32
|
318
|
amine@32
|
319 Setting detection parameters
|
amine@32
|
320 ############################
|
amine@32
|
321
|
amine@32
|
322 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
|
amine@32
|
323
|
amine@32
|
324 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
325 | Option | Description | Unit | Default |
|
amine@32
|
326 +========+=======================================================+=========+==================+
|
amine@32
|
327 | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) |
|
amine@32
|
328 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
329 | `-m` | Maximum length an accepted audio activity should reach| second | 5. |
|
amine@32
|
330 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
331 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
|
amine@32
|
332 | | an accepted audio activity | | |
|
amine@32
|
333 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
334 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
|
amine@32
|
335 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
336 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
|
amine@32
|
337 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
338
|
amine@32
|
339
|
amine@32
|
340 *******
|
amine@32
|
341 License
|
amine@32
|
342 *******
|
amine@32
|
343
|
amine@32
|
344 `auditok` is published under the GNU General Public License Version 3.
|
amine@32
|
345
|
amine@32
|
346 ******
|
amine@32
|
347 Author
|
amine@32
|
348 ******
|
amine@32
|
349 Amine Sehili (<amine.sehili@gmail.com>)
|