auditok: README.md comparison

comparison README.md @ 49:809df9157e1a

Merge branch 'master' of https://github.com/amsehili/auditok

author	Amine SEHILI <amine.sehili@gmail.com>
date	Sun, 06 Mar 2016 14:57:03 +0100
parents	3e939c1049dc
children	d4eec2afbe01

comparison

equal deleted inserted replaced

-:117856eabb9e
+:809df9157e1a
 AUDIo TOKenizer
 ===============
 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
-A more detailed version of this user guide as well as an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/)
+A more detailed version of this user-guide, an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/)
 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
 - [Requirements](https://github.com/amsehili/auditok#requirements)
 - [Installation](https://github.com/amsehili/auditok#installation)
 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information)
-- [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template)
+- [Plot signal and detections](https://github.com/amsehili/auditok#plot-signal-and-detections)
-- [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections)
 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
+- [Some practical use cases](https://github.com/amsehili/auditok#some-practical-use-cases)
+- [1st practical use case: generate a subtitles template](https://github.com/amsehili/auditok#1st-practical-use-case-generate-a-subtitles-template)
+- [2nd Practical use case example: build a (very) basic voice control application](https://github.com/amsehili/auditok#2nd-practical-use-case-example-build-a-very-basic-voice-control-application)
 - [License](https://github.com/amsehili/auditok#license)
 - [Author](https://github.com/amsehili/auditok#author)
 Two-figure explanation
 ----------------------
 [5]: 00:00:07.470 to 00:00:07.980
 ...
 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
-### Practical use case: generate a subtitles template
+### Plot signal and detections
-Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
-auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
-Output:
-1
-00:00:00.730 --> 00:00:01.460
-Put some text here...
-2
-00:00:02.440 --> 00:00:03.900
-Put some text here...
-3
-00:00:06.410 --> 00:00:06.970
-Put some text here...
-4
-00:00:07.260 --> 00:00:08.340
-Put some text here...
-5
-00:00:09.510 --> 00:00:09.820
-Put some text here...
-### Plot signal and detections:
 use option `-p`. Requires `matplotlib` and `numpy`.
 auditok ...  -p
 | `-s`   | Maximum length of a continuous silence period within  | second  |   0.3 (300 ms)   |
 |        | an accepted audio activity                            |         |                  |
 | `-d`   | Drop trailing silence from an accepted audio activity | boolean |   False          |
 | `-a`   | Analysis window length (default value should be good) | second  |   0.01 (10 ms)   |
+Some practical use cases
+------------------------
+### 1st practical use case: generate a subtitles template
+Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
+auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
+Output:
+1
+00:00:00.730 --> 00:00:01.460
+Put some text here...
+2
+00:00:02.440 --> 00:00:03.900
+Put some text here...
+3
+00:00:06.410 --> 00:00:06.970
+Put some text here...
+4
+00:00:07.260 --> 00:00:08.340
+Put some text here...
+5
+00:00:09.510 --> 00:00:09.820
+Put some text here...
+### 2nd Practical use case example: build a (very) basic voice control application
+[This repository](https://github.com/amsehili/gspeech-rec) supplies a bash script the can send audio data to Google's
+Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component
+of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain
+number of commands that make up the rest of our voice control application.
+Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is:
+1- Convert raw audio data to flac using **sox**:
+sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
+2- Send flac audio data to Google and get its filtered transcription using [speech-rec.sh](https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh):
+speech-rec.sh -i output.flac -r 16000
+3- Use **grep** to select lines that contain *transcript*:
+grep transcript
+4- Launch the following script, giving it the transcription as input:
+#!/bin/bash
+read line
+RES=`echo "$line" | grep -i "open firefox"`
+if [[ $RES ]]
+then
+echo "Launch command: 'firefox &' ... "
+firefox &
+exit 0
+fi
+exit 0
+As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**.
+Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**).
+Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects
+an audio activity. Try the following command and say *open firefox*:
+rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh"
+Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and
+option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second.
+With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given.
 License
 -------
 `auditok` is published under the GNU General Public License Version 3.

Mercurial > hg > auditok

comparison README.md @ 49:809df9157e1a