Mercurial > hg > auditok
changeset 43:e07ee4da349b
Update README.md
author | Amine SEHILI <amsehili@users.noreply.github.com> |
---|---|
date | Thu, 03 Dec 2015 10:16:48 +0100 |
parents | eb17a4e1dc83 |
children | b9a90be0b5a2 |
files | README.md |
diffstat | 1 files changed, 88 insertions(+), 31 deletions(-) [+] |
line wrap: on
line diff
--- a/README.md Thu Dec 03 01:47:57 2015 +0100 +++ b/README.md Thu Dec 03 10:16:48 2015 +0100 @@ -15,14 +15,16 @@ - [Play back detections](https://github.com/amsehili/auditok#play-back-detections) - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold) - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information) - - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template) - - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections) + - [Plot signal and detections](https://github.com/amsehili/auditok#plot-signal-and-detections) - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf) - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file) - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data) - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal) - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file) - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters) +- [Some practical use cases](https://github.com/amsehili/auditok#some-practical-use-cases) + - [1st practical use case: generate a subtitles template](https://github.com/amsehili/auditok#1st-practical-use-case-generate-a-subtitles-template) + - [2nd Practical use case example: build a (very) basic voice control application](https://github.com/amsehili/auditok#2nd-Practical-use-case-example-build-a-(very)-basic-voice-control) - [License](https://github.com/amsehili/auditok#license) - [Author](https://github.com/amsehili/auditok#author) @@ -150,35 +152,7 @@ Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. -### Practical use case: generate a subtitles template - -Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: - - auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" - -Output: - - 1 - 00:00:00.730 --> 00:00:01.460 - Put some text here... - - 2 - 00:00:02.440 --> 00:00:03.900 - Put some text here... - - 3 - 00:00:06.410 --> 00:00:06.970 - Put some text here... - - 4 - 00:00:07.260 --> 00:00:08.340 - Put some text here... - - 5 - 00:00:09.510 --> 00:00:09.820 - Put some text here... - -### Plot signal and detections: +### Plot signal and detections use option `-p`. Requires `matplotlib` and `numpy`. @@ -235,6 +209,89 @@ | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | +Some practical use cases +------------------------ + +### 1st practical use case: generate a subtitles template + +Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: + + auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" + +Output: + + 1 + 00:00:00.730 --> 00:00:01.460 + Put some text here... + + 2 + 00:00:02.440 --> 00:00:03.900 + Put some text here... + + 3 + 00:00:06.410 --> 00:00:06.970 + Put some text here... + + 4 + 00:00:07.260 --> 00:00:08.340 + Put some text here... + + 5 + 00:00:09.510 --> 00:00:09.820 + Put some text here... + +### 2nd Practical use case example: build a (very) basic voice control application + +[This repository](https://github.com/amsehili/gspeech-rec) supplies a bash script the can send audio data to Google's +Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component +of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain +number of commands that make up the rest of our voice control application. + +Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is: + +1- Convert raw audio data to flac using **sox**: + + sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac + +2- Send flac audio data to Google and get its filtered transcription using [speech-rec.sh](https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh): + + speech-rec.sh -i output.flac -r 16000 + +3- Use **grep** to select lines that contain *transcript*: + + grep transcript + + +4- Launch the following script, giving it the transcription as input: + + #!/bin/bash + + read line + + RES=`echo "$line" | grep -i "open firefox"` + + if [[ $RES ]] + then + echo "Launch command: 'firefox &' ... " + firefox & + exit 0 + fi + + exit 0 + +As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**. +Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**). + +Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects +an audio activity. Try the following command and say *open firefox*: + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh" + +Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and +option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second. + +With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given. + License -------