amine@32
|
1 `auditok` Command-line Usage Guide
|
amine@32
|
2 ==================================
|
amine@32
|
3
|
amine@32
|
4 This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases.
|
amine@32
|
5
|
amine@32
|
6
|
amine@32
|
7 .. contents:: `Contents`
|
amine@32
|
8 :depth: 3
|
amine@32
|
9
|
amine@32
|
10
|
amine@32
|
11 **********************
|
amine@32
|
12 Two-figure explanation
|
amine@32
|
13 **********************
|
amine@32
|
14
|
amine@32
|
15 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
|
amine@32
|
16
|
amine@32
|
17 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
|
amine@32
|
18
|
amine@32
|
19 .. figure:: figures/figure_1.png
|
amine@32
|
20 :align: center
|
amine@32
|
21 :alt: alternate text
|
amine@32
|
22 :figclass: align-center
|
amine@32
|
23
|
amine@32
|
24 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
|
amine@32
|
25
|
amine@32
|
26 .. figure:: figures/figure_2.png
|
amine@32
|
27 :align: center
|
amine@32
|
28 :alt: alternate text
|
amine@32
|
29 :figclass: align-center
|
amine@32
|
30
|
amine@32
|
31
|
amine@32
|
32 ******************
|
amine@32
|
33 Command line usage
|
amine@32
|
34 ******************
|
amine@32
|
35
|
amine@32
|
36 Try the detector with your voice
|
amine@32
|
37 ################################
|
amine@32
|
38
|
amine@32
|
39 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
|
amine@32
|
40
|
amine@32
|
41 .. code:: bash
|
amine@32
|
42
|
amine@32
|
43 auditok
|
amine@32
|
44
|
amine@32
|
45 This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
|
amine@32
|
46
|
amine@32
|
47 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
|
amine@32
|
48
|
amine@32
|
49 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
|
amine@32
|
50
|
amine@32
|
51
|
amine@32
|
52 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
53 | Audio parameter | sox option | auditok option | `auditok` default |
|
amine@32
|
54 +=================+============+================+=======================+
|
amine@32
|
55 | Sampling rate | -r | -r | 16000 |
|
amine@32
|
56 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
57 | Sample width | -b (bits) | -w (bytes) | 2 |
|
amine@32
|
58 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
59 | Channels | -c | -c | 1 |
|
amine@32
|
60 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
61 | Encoding | -e | None | always signed integer |
|
amine@32
|
62 +-----------------+------------+----------------+-----------------------+
|
amine@32
|
63
|
amine@32
|
64 According to this table, the previous command can be run as:
|
amine@32
|
65
|
amine@32
|
66 .. code:: bash
|
amine@32
|
67
|
amine@32
|
68 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
|
amine@32
|
69
|
amine@32
|
70 Play back detections
|
amine@32
|
71 ####################
|
amine@32
|
72
|
amine@32
|
73 .. code:: bash
|
amine@32
|
74
|
amine@32
|
75 auditok -E
|
amine@32
|
76
|
amine@32
|
77 OR
|
amine@32
|
78
|
amine@32
|
79 .. code:: bash
|
amine@32
|
80
|
amine@32
|
81 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
|
amine@32
|
82
|
amine@32
|
83 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
|
amine@32
|
84
|
amine@32
|
85 .. code:: bash
|
amine@32
|
86
|
amine@32
|
87 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
|
amine@32
|
88
|
amine@32
|
89 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
|
amine@32
|
90
|
amine@32
|
91 `rec` and `play` are just an alias for `sox`.
|
amine@32
|
92
|
amine@32
|
93 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
|
amine@32
|
94
|
amine@32
|
95 Set detection threshold
|
amine@32
|
96 #######################
|
amine@32
|
97
|
amine@32
|
98 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
|
amine@32
|
99
|
amine@32
|
100 .. code:: bash
|
amine@32
|
101
|
amine@32
|
102 auditok -E -e 55
|
amine@32
|
103
|
amine@32
|
104 OR
|
amine@32
|
105
|
amine@32
|
106 .. code:: bash
|
amine@32
|
107
|
amine@32
|
108 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
|
amine@32
|
109
|
amine@32
|
110 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
|
amine@32
|
111
|
amine@32
|
112 Set format for printed detections information
|
amine@32
|
113 #############################################
|
amine@32
|
114
|
amine@32
|
115 By default, `auditok` prints the `id` `start time` `end time` of each detected activity:
|
amine@32
|
116
|
amine@32
|
117 .. code:: bash
|
amine@32
|
118
|
amine@32
|
119 1 1.87 2.67
|
amine@32
|
120 2 3.05 3.73
|
amine@32
|
121 3 3.97 4.49
|
amine@32
|
122 ...
|
amine@32
|
123
|
amine@32
|
124 If you want to personalize the output format, use `--printf` option:
|
amine@32
|
125
|
amine@32
|
126 auditok -e 55 --printf "[{id}]: {start} to {end}"
|
amine@32
|
127
|
amine@32
|
128 Output:
|
amine@32
|
129
|
amine@32
|
130 .. code:: bash
|
amine@32
|
131
|
amine@32
|
132 [1]: 0.22 to 0.67
|
amine@32
|
133 [2]: 2.81 to 4.18
|
amine@32
|
134 [3]: 5.53 to 6.44
|
amine@32
|
135 [4]: 7.32 to 7.82
|
amine@32
|
136 ...
|
amine@32
|
137
|
amine@32
|
138 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
|
amine@32
|
139
|
amine@32
|
140 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
|
amine@32
|
141
|
amine@32
|
142 Output:
|
amine@32
|
143
|
amine@32
|
144 .. code:: bash
|
amine@32
|
145
|
amine@32
|
146 [1]: 00:00:01.080 to 00:00:01.760
|
amine@32
|
147 [2]: 00:00:02.420 to 00:00:03.440
|
amine@32
|
148 [3]: 00:00:04.930 to 00:00:05.570
|
amine@32
|
149 [4]: 00:00:05.690 to 00:00:06.020
|
amine@32
|
150 [5]: 00:00:07.470 to 00:00:07.980
|
amine@32
|
151 ...
|
amine@32
|
152
|
amine@32
|
153 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
|
amine@32
|
154
|
amine@32
|
155 1st Practical use case example: generate a subtitles template
|
amine@32
|
156 #############################################################
|
amine@32
|
157
|
amine@32
|
158 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
|
amine@32
|
159
|
amine@32
|
160 .. code:: bash
|
amine@32
|
161
|
amine@32
|
162 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
|
amine@32
|
163
|
amine@32
|
164 Output:
|
amine@32
|
165
|
amine@32
|
166 .. code:: bash
|
amine@32
|
167
|
amine@32
|
168 1
|
amine@32
|
169 00:00:00.730 --> 00:00:01.460
|
amine@32
|
170 Put some text here...
|
amine@32
|
171
|
amine@32
|
172 2
|
amine@32
|
173 00:00:02.440 --> 00:00:03.900
|
amine@32
|
174 Put some text here...
|
amine@32
|
175
|
amine@32
|
176 3
|
amine@32
|
177 00:00:06.410 --> 00:00:06.970
|
amine@32
|
178 Put some text here...
|
amine@32
|
179
|
amine@32
|
180 4
|
amine@32
|
181 00:00:07.260 --> 00:00:08.340
|
amine@32
|
182 Put some text here...
|
amine@32
|
183
|
amine@32
|
184 5
|
amine@32
|
185 00:00:09.510 --> 00:00:09.820
|
amine@32
|
186 Put some text here...
|
amine@32
|
187
|
amine@32
|
188
|
amine@32
|
189 2nd Practical use case example: build ab (very) basic voice control application
|
amine@32
|
190 ###############################################################################
|
amine@32
|
191
|
amine@32
|
192 `This repository <https://github.com/amsehili/gspeech-rec>`_ supplies a bash script the can send audio data to Google's
|
amine@32
|
193 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
|
amine@32
|
194 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
|
amine@32
|
195 number of commands that make up the rest of our voice control application.
|
amine@32
|
196
|
amine@32
|
197 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
|
amine@32
|
198
|
amine@32
|
199 1- Convert raw audio data to flac using **sox**:
|
amine@32
|
200
|
amine@32
|
201 .. code:: bash
|
amine@32
|
202
|
amine@32
|
203 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
|
amine@32
|
204
|
amine@32
|
205 2- Send falc audio to google and get its filtred transcription using `speech-rec.sh <https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh>`_ :
|
amine@32
|
206
|
amine@32
|
207 .. code:: bash
|
amine@32
|
208
|
amine@32
|
209 speech-rec.sh -i output.flac -r 16000
|
amine@32
|
210
|
amine@32
|
211 3- Use **grep** to select lines that coantain *transcript*:
|
amine@32
|
212
|
amine@32
|
213 .. code:: bash
|
amine@32
|
214
|
amine@32
|
215 grep transcript
|
amine@32
|
216
|
amine@32
|
217
|
amine@32
|
218 4- Launch the followin script, giving it the transcription as input:
|
amine@32
|
219
|
amine@32
|
220 .. code:: bash
|
amine@32
|
221
|
amine@32
|
222 #!/bin/bash
|
amine@32
|
223
|
amine@32
|
224 read line
|
amine@32
|
225
|
amine@32
|
226 RES=`echo "$line" | grep -i "open firefox"`
|
amine@32
|
227
|
amine@32
|
228 if [[ $RES ]]
|
amine@32
|
229 then
|
amine@32
|
230 echo "Launch command: 'firefox &' ... "
|
amine@32
|
231 firefox &
|
amine@32
|
232 exit 0
|
amine@32
|
233 fi
|
amine@32
|
234
|
amine@32
|
235 exit 0
|
amine@32
|
236
|
amine@32
|
237 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **run firefox**.
|
amine@32
|
238 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
|
amine@32
|
239
|
amine@32
|
240 Now, thanks to option `-C`, we will use the three instructions with a pipe and tell auditok to run them for every time it detects
|
amine@32
|
241 an audio activity. Try the following command and say *open firefox*:
|
amine@32
|
242
|
amine@32
|
243
|
amine@32
|
244 .. code:: bash
|
amine@32
|
245
|
amine@32
|
246 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
|
amine@32
|
247
|
amine@32
|
248
|
amine@32
|
249
|
amine@32
|
250
|
amine@32
|
251 Plot signal and detections
|
amine@32
|
252 ##########################
|
amine@32
|
253
|
amine@32
|
254 use option `-p`. Requires `matplotlib` and `numpy`.
|
amine@32
|
255
|
amine@32
|
256 .. code:: bash
|
amine@32
|
257
|
amine@32
|
258 auditok ... -p
|
amine@32
|
259
|
amine@32
|
260
|
amine@32
|
261 Save plot as image or PDF
|
amine@32
|
262 #########################
|
amine@32
|
263
|
amine@32
|
264 .. code:: bash
|
amine@32
|
265
|
amine@32
|
266 auditok ... --save-image output.png
|
amine@32
|
267
|
amine@32
|
268 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
|
amine@32
|
269
|
amine@32
|
270
|
amine@32
|
271 Read data from file
|
amine@32
|
272 ###################
|
amine@32
|
273
|
amine@32
|
274 .. code:: bash
|
amine@32
|
275
|
amine@32
|
276 auditok -i input.wav ...
|
amine@32
|
277
|
amine@32
|
278 Install `pydub` for other audio formats.
|
amine@32
|
279
|
amine@32
|
280
|
amine@32
|
281 Limit the length of acquired data
|
amine@32
|
282 #################################
|
amine@32
|
283
|
amine@32
|
284 .. code:: bash
|
amine@32
|
285
|
amine@32
|
286 auditok -M 12 ...
|
amine@32
|
287
|
amine@32
|
288 Time is in seconds.
|
amine@32
|
289
|
amine@32
|
290
|
amine@32
|
291 Save the whole acquired audio signal
|
amine@32
|
292 ####################################
|
amine@32
|
293
|
amine@32
|
294 .. code:: bash
|
amine@32
|
295
|
amine@32
|
296 auditok -O output.wav ...
|
amine@32
|
297
|
amine@32
|
298 Install `pydub` for other audio formats.
|
amine@32
|
299
|
amine@32
|
300
|
amine@32
|
301 Save each detection into a separate audio file
|
amine@32
|
302 ##############################################
|
amine@32
|
303
|
amine@32
|
304 .. code:: bash
|
amine@32
|
305
|
amine@32
|
306 auditok -o det_{N}_{start}_{end}.wav ...
|
amine@32
|
307
|
amine@32
|
308 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
|
amine@32
|
309
|
amine@32
|
310 .. code:: bash
|
amine@32
|
311
|
amine@32
|
312 auditok -o {start}-{end}.wav ...
|
amine@32
|
313
|
amine@32
|
314 Install `pydub` for more audio formats.
|
amine@32
|
315
|
amine@32
|
316
|
amine@32
|
317 Setting detection parameters
|
amine@32
|
318 ############################
|
amine@32
|
319
|
amine@32
|
320 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
|
amine@32
|
321
|
amine@32
|
322 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
323 | Option | Description | Unit | Default |
|
amine@32
|
324 +========+=======================================================+=========+==================+
|
amine@32
|
325 | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) |
|
amine@32
|
326 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
327 | `-m` | Maximum length an accepted audio activity should reach| second | 5. |
|
amine@32
|
328 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
329 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
|
amine@32
|
330 | | an accepted audio activity | | |
|
amine@32
|
331 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
332 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
|
amine@32
|
333 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
334 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
|
amine@32
|
335 +--------+-------------------------------------------------------+---------+------------------+
|
amine@32
|
336
|
amine@32
|
337
|
amine@32
|
338 *******
|
amine@32
|
339 License
|
amine@32
|
340 *******
|
amine@32
|
341
|
amine@32
|
342 `auditok` is published under the GNU General Public License Version 3.
|
amine@32
|
343
|
amine@32
|
344 ******
|
amine@32
|
345 Author
|
amine@32
|
346 ******
|
amine@32
|
347 Amine Sehili (<amine.sehili@gmail.com>)
|