Mercurial > hg > auditok
comparison README.md @ 49:809df9157e1a
Merge branch 'master' of https://github.com/amsehili/auditok
author | Amine SEHILI <amine.sehili@gmail.com> |
---|---|
date | Sun, 06 Mar 2016 14:57:03 +0100 |
parents | 3e939c1049dc |
children | d4eec2afbe01 |
comparison
equal
deleted
inserted
replaced
48:117856eabb9e | 49:809df9157e1a |
---|---|
3 AUDIo TOKenizer | 3 AUDIo TOKenizer |
4 =============== | 4 =============== |
5 | 5 |
6 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API. | 6 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API. |
7 | 7 |
8 A more detailed version of this user guide as well as an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/) | 8 A more detailed version of this user-guide, an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/) |
9 | 9 |
10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation) | 10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation) |
11 - [Requirements](https://github.com/amsehili/auditok#requirements) | 11 - [Requirements](https://github.com/amsehili/auditok#requirements) |
12 - [Installation](https://github.com/amsehili/auditok#installation) | 12 - [Installation](https://github.com/amsehili/auditok#installation) |
13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage) | 13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage) |
14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice) | 14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice) |
15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections) | 15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections) |
16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold) | 16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold) |
17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information) | 17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information) |
18 - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template) | 18 - [Plot signal and detections](https://github.com/amsehili/auditok#plot-signal-and-detections) |
19 - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections) | |
20 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf) | 19 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf) |
21 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file) | 20 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file) |
22 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data) | 21 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data) |
23 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal) | 22 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal) |
24 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file) | 23 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file) |
25 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters) | 24 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters) |
25 - [Some practical use cases](https://github.com/amsehili/auditok#some-practical-use-cases) | |
26 - [1st practical use case: generate a subtitles template](https://github.com/amsehili/auditok#1st-practical-use-case-generate-a-subtitles-template) | |
27 - [2nd Practical use case example: build a (very) basic voice control application](https://github.com/amsehili/auditok#2nd-practical-use-case-example-build-a-very-basic-voice-control-application) | |
26 - [License](https://github.com/amsehili/auditok#license) | 28 - [License](https://github.com/amsehili/auditok#license) |
27 - [Author](https://github.com/amsehili/auditok#author) | 29 - [Author](https://github.com/amsehili/auditok#author) |
28 | 30 |
29 Two-figure explanation | 31 Two-figure explanation |
30 ---------------------- | 32 ---------------------- |
148 [5]: 00:00:07.470 to 00:00:07.980 | 150 [5]: 00:00:07.470 to 00:00:07.980 |
149 ... | 151 ... |
150 | 152 |
151 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. | 153 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. |
152 | 154 |
153 ### Practical use case: generate a subtitles template | 155 ### Plot signal and detections |
154 | |
155 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: | |
156 | |
157 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" | |
158 | |
159 Output: | |
160 | |
161 1 | |
162 00:00:00.730 --> 00:00:01.460 | |
163 Put some text here... | |
164 | |
165 2 | |
166 00:00:02.440 --> 00:00:03.900 | |
167 Put some text here... | |
168 | |
169 3 | |
170 00:00:06.410 --> 00:00:06.970 | |
171 Put some text here... | |
172 | |
173 4 | |
174 00:00:07.260 --> 00:00:08.340 | |
175 Put some text here... | |
176 | |
177 5 | |
178 00:00:09.510 --> 00:00:09.820 | |
179 Put some text here... | |
180 | |
181 ### Plot signal and detections: | |
182 | 156 |
183 use option `-p`. Requires `matplotlib` and `numpy`. | 157 use option `-p`. Requires `matplotlib` and `numpy`. |
184 | 158 |
185 auditok ... -p | 159 auditok ... -p |
186 | 160 |
233 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | | 207 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | |
234 | | an accepted audio activity | | | | 208 | | an accepted audio activity | | | |
235 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | | 209 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | |
236 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | | 210 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | |
237 | 211 |
212 Some practical use cases | |
213 ------------------------ | |
214 | |
215 ### 1st practical use case: generate a subtitles template | |
216 | |
217 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: | |
218 | |
219 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" | |
220 | |
221 Output: | |
222 | |
223 1 | |
224 00:00:00.730 --> 00:00:01.460 | |
225 Put some text here... | |
226 | |
227 2 | |
228 00:00:02.440 --> 00:00:03.900 | |
229 Put some text here... | |
230 | |
231 3 | |
232 00:00:06.410 --> 00:00:06.970 | |
233 Put some text here... | |
234 | |
235 4 | |
236 00:00:07.260 --> 00:00:08.340 | |
237 Put some text here... | |
238 | |
239 5 | |
240 00:00:09.510 --> 00:00:09.820 | |
241 Put some text here... | |
242 | |
243 ### 2nd Practical use case example: build a (very) basic voice control application | |
244 | |
245 [This repository](https://github.com/amsehili/gspeech-rec) supplies a bash script the can send audio data to Google's | |
246 Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component | |
247 of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain | |
248 number of commands that make up the rest of our voice control application. | |
249 | |
250 Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is: | |
251 | |
252 1- Convert raw audio data to flac using **sox**: | |
253 | |
254 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac | |
255 | |
256 2- Send flac audio data to Google and get its filtered transcription using [speech-rec.sh](https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh): | |
257 | |
258 speech-rec.sh -i output.flac -r 16000 | |
259 | |
260 3- Use **grep** to select lines that contain *transcript*: | |
261 | |
262 grep transcript | |
263 | |
264 | |
265 4- Launch the following script, giving it the transcription as input: | |
266 | |
267 #!/bin/bash | |
268 | |
269 read line | |
270 | |
271 RES=`echo "$line" | grep -i "open firefox"` | |
272 | |
273 if [[ $RES ]] | |
274 then | |
275 echo "Launch command: 'firefox &' ... " | |
276 firefox & | |
277 exit 0 | |
278 fi | |
279 | |
280 exit 0 | |
281 | |
282 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**. | |
283 Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**). | |
284 | |
285 Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects | |
286 an audio activity. Try the following command and say *open firefox*: | |
287 | |
288 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh" | |
289 | |
290 Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and | |
291 option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second. | |
292 | |
293 With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given. | |
294 | |
238 | 295 |
239 License | 296 License |
240 ------- | 297 ------- |
241 `auditok` is published under the GNU General Public License Version 3. | 298 `auditok` is published under the GNU General Public License Version 3. |
242 | 299 |