comparison README.md @ 49:809df9157e1a

Merge branch 'master' of https://github.com/amsehili/auditok
author Amine SEHILI <amine.sehili@gmail.com>
date Sun, 06 Mar 2016 14:57:03 +0100
parents 3e939c1049dc
children d4eec2afbe01
comparison
equal deleted inserted replaced
48:117856eabb9e 49:809df9157e1a
3 AUDIo TOKenizer 3 AUDIo TOKenizer
4 =============== 4 ===============
5 5
6 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API. 6 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
7 7
8 A more detailed version of this user guide as well as an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/) 8 A more detailed version of this user-guide, an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/)
9 9
10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation) 10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
11 - [Requirements](https://github.com/amsehili/auditok#requirements) 11 - [Requirements](https://github.com/amsehili/auditok#requirements)
12 - [Installation](https://github.com/amsehili/auditok#installation) 12 - [Installation](https://github.com/amsehili/auditok#installation)
13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage) 13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice) 14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections) 15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold) 16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information) 17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information)
18 - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template) 18 - [Plot signal and detections](https://github.com/amsehili/auditok#plot-signal-and-detections)
19 - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections)
20 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf) 19 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
21 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file) 20 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
22 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data) 21 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
23 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal) 22 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
24 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file) 23 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
25 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters) 24 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
25 - [Some practical use cases](https://github.com/amsehili/auditok#some-practical-use-cases)
26 - [1st practical use case: generate a subtitles template](https://github.com/amsehili/auditok#1st-practical-use-case-generate-a-subtitles-template)
27 - [2nd Practical use case example: build a (very) basic voice control application](https://github.com/amsehili/auditok#2nd-practical-use-case-example-build-a-very-basic-voice-control-application)
26 - [License](https://github.com/amsehili/auditok#license) 28 - [License](https://github.com/amsehili/auditok#license)
27 - [Author](https://github.com/amsehili/auditok#author) 29 - [Author](https://github.com/amsehili/auditok#author)
28 30
29 Two-figure explanation 31 Two-figure explanation
30 ---------------------- 32 ----------------------
148 [5]: 00:00:07.470 to 00:00:07.980 150 [5]: 00:00:07.470 to 00:00:07.980
149 ... 151 ...
150 152
151 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. 153 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
152 154
153 ### Practical use case: generate a subtitles template 155 ### Plot signal and detections
154
155 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
156
157 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
158
159 Output:
160
161 1
162 00:00:00.730 --> 00:00:01.460
163 Put some text here...
164
165 2
166 00:00:02.440 --> 00:00:03.900
167 Put some text here...
168
169 3
170 00:00:06.410 --> 00:00:06.970
171 Put some text here...
172
173 4
174 00:00:07.260 --> 00:00:08.340
175 Put some text here...
176
177 5
178 00:00:09.510 --> 00:00:09.820
179 Put some text here...
180
181 ### Plot signal and detections:
182 156
183 use option `-p`. Requires `matplotlib` and `numpy`. 157 use option `-p`. Requires `matplotlib` and `numpy`.
184 158
185 auditok ... -p 159 auditok ... -p
186 160
233 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | 207 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
234 | | an accepted audio activity | | | 208 | | an accepted audio activity | | |
235 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | 209 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
236 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | 210 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
237 211
212 Some practical use cases
213 ------------------------
214
215 ### 1st practical use case: generate a subtitles template
216
217 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
218
219 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
220
221 Output:
222
223 1
224 00:00:00.730 --> 00:00:01.460
225 Put some text here...
226
227 2
228 00:00:02.440 --> 00:00:03.900
229 Put some text here...
230
231 3
232 00:00:06.410 --> 00:00:06.970
233 Put some text here...
234
235 4
236 00:00:07.260 --> 00:00:08.340
237 Put some text here...
238
239 5
240 00:00:09.510 --> 00:00:09.820
241 Put some text here...
242
243 ### 2nd Practical use case example: build a (very) basic voice control application
244
245 [This repository](https://github.com/amsehili/gspeech-rec) supplies a bash script the can send audio data to Google's
246 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
247 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
248 number of commands that make up the rest of our voice control application.
249
250 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
251
252 1- Convert raw audio data to flac using **sox**:
253
254 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
255
256 2- Send flac audio data to Google and get its filtered transcription using [speech-rec.sh](https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh):
257
258 speech-rec.sh -i output.flac -r 16000
259
260 3- Use **grep** to select lines that contain *transcript*:
261
262 grep transcript
263
264
265 4- Launch the following script, giving it the transcription as input:
266
267 #!/bin/bash
268
269 read line
270
271 RES=`echo "$line" | grep -i "open firefox"`
272
273 if [[ $RES ]]
274 then
275 echo "Launch command: 'firefox &' ... "
276 firefox &
277 exit 0
278 fi
279
280 exit 0
281
282 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**.
283 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
284
285 Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects
286 an audio activity. Try the following command and say *open firefox*:
287
288 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
289
290 Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and
291 option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second.
292
293 With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given.
294
238 295
239 License 296 License
240 ------- 297 -------
241 `auditok` is published under the GNU General Public License Version 3. 298 `auditok` is published under the GNU General Public License Version 3.
242 299