auditok: README.md annotate

annotate README.md @ 322:2cb8e29e1c9c

Update pre-commit-hooks

author	Amine Sehili <amine.sehili@gmail.com>
date	Sat, 19 Oct 2019 23:28:11 +0100
parents	d4eec2afbe01
children	9741b52f194a

rev	line source
amsehili@11	1 [![Build Status](https://travis-ci.org/amsehili/auditok.svg?branch=master)](https://travis-ci.org/amsehili/auditok)
amine@37	2 [![Documentation Status](https://readthedocs.org/projects/auditok/badge/?version=latest)](http://auditok.readthedocs.org/en/latest/?badge=latest)
amsehili@11	3 AUDIo TOKenizer
amine@2	4 ===============
amine@2	5
amsehili@20	6 `auditok` is an Audio Activity Detection tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
amsehili@20	7
amsehili@45	8 A more detailed version of this user-guide, an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/)
amine@35	9
amsehili@25	10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
amsehili@25	11 - [Requirements](https://github.com/amsehili/auditok#requirements)
amsehili@25	12 - [Installation](https://github.com/amsehili/auditok#installation)
amsehili@25	13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
amsehili@25	14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
amsehili@26	15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
amsehili@26	16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
amsehili@29	17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information)
amsehili@43	18 - [Plot signal and detections](https://github.com/amsehili/auditok#plot-signal-and-detections)
amsehili@26	19 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
amsehili@26	20 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
amsehili@26	21 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
amsehili@26	22 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
amsehili@26	23 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
amsehili@45	24 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
amsehili@43	25 - [Some practical use cases](https://github.com/amsehili/auditok#some-practical-use-cases)
amsehili@43	26 - [1st practical use case: generate a subtitles template](https://github.com/amsehili/auditok#1st-practical-use-case-generate-a-subtitles-template)
amsehili@44	27 - [2nd Practical use case example: build a (very) basic voice control application](https://github.com/amsehili/auditok#2nd-practical-use-case-example-build-a-very-basic-voice-control-application)
amsehili@26	28 - [License](https://github.com/amsehili/auditok#license)
amine@41	29 - [Author](https://github.com/amsehili/auditok#author)
amsehili@25	30
amsehili@25	31 Two-figure explanation
amsehili@25	32 ----------------------
amsehili@25	33 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
amsehili@20	34
amsehili@20	35 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amsehili@20	36 ![](doc/figures/figure_1.png)
amsehili@20	37
amsehili@25	38 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amsehili@20	39 ![](doc/figures/figure_2.png)
amsehili@20	40
amine@35	41 Beyond plotting signal and detections, you can play back audio activities as they are detected, save them or run a user command each time there is an activity,
amine@35	42 using, optionally, the file name of audio activity as an argument for the command.
amine@2	43
amine@2	44 Requirements
amine@2	45 ------------
amine@40	46 `auditok` can be used with standard Python!
amine@40	47
amine@40	48 However, if you want more features, the following packages are needed:
amsehili@20	49 - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
amsehili@20	50 - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections
amine@40	51 - [matplotlib](http://matplotlib.org/): plot audio signal and detections (see figures above)
amine@40	52 - [numpy](http://www.numpy.org): required by matplotlib. Also used for math operations instead of standard python if available
amsehili@20	53 - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe.
amsehili@20	54
amine@2	55
amine@2	56 Installation
amine@2	57 ------------
amine@40	58
amine@40	59 git clone https://github.com/amsehili/auditok.git
amine@40	60 cd auditok
amine@4	61 python setup.py install
amine@2	62
amsehili@25	63 Command line usage
amine@21	64 ------------------
amine@21	65
amsehili@25	66 ### Try the detector with your voice
amsehili@25	67
amine@21	68 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@21	69
amsehili@25	70 auditok
amine@21	71
amine@35	72 This will print `id`, `start-time` and `end-time` for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
amine@21	73
amsehili@25	74 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -r 16000 -w 2 -c 1
amsehili@25	75
amsehili@25	76 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
amine@21	77
amine@35	78 \| Audio parameter \| sox option \| `auditok` option \| `auditok` default \|
amine@35	79 \| --------------- \|------------\|------------------\|-----------------------\|
amine@35	80 \| Sampling rate \| -r \| -r \| 16000 \|
amine@35	81 \| Sample width \| -b (bits) \| -w (bytes) \| 2 \|
amine@35	82 \| Channels \| -c \| -c \| 1 \|
amine@35	83 \| Encoding \| -e \| None \| always signed integer \|
amine@21	84
amsehili@25	85 According to this table, the previous command can be run as:
amine@21	86
amsehili@25	87 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i -
amine@21	88
mathieu@79	89 ### PyAudio
mathieu@79	90
mathieu@79	91 When capturing input with PyAudio, you may need to adjust the device index with -I if multiple input devices are available. Use `lsusb -t` to get the list of usb devices, or use `arecord -l` if you're using a non-usb input device. If you don't know what index to use, just try `0`, `1`, `2` and so on, outputting the audio using `-E` (echo) until you hear the sound.
mathieu@79	92
mathieu@79	93 You may also get an error `[Errno -9981] Input overflowed` from PyAudio. If that's the case, you need a bigger frame buffer.
mathieu@79	94 Use `-F` with 2048 or 4096 (the default is 1024).
mathieu@79	95
amsehili@25	96 ### Play back detections
amine@21	97
amsehili@25	98 auditok -E
amine@21	99
amine@35	100 or
amsehili@25	101
amsehili@25	102 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -E
amsehili@25	103
amsehili@25	104 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amsehili@25	105
amsehili@25	106 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@21	107
amsehili@25	108 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amsehili@25	109
amsehili@25	110 `rec` and `play` are just an alias for `sox`.
amine@21	111
amine@21	112 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@21	113
amsehili@25	114 ### Set detection threshold
amsehili@25	115
amsehili@25	116 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amsehili@25	117
amsehili@25	118 auditok -E -e 55
amsehili@25	119
amine@35	120 or
amsehili@25	121
amsehili@25	122 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amsehili@25	123
amsehili@26	124 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amsehili@25	125
amsehili@29	126 ### Set format for printed detections information
amsehili@25	127
amine@35	128 By default, `auditok` prints the `id` `start-time` `end-time` of each detected activity:
amsehili@25	129
amsehili@25	130 1 1.87 2.67
amsehili@25	131 2 3.05 3.73
amsehili@25	132 3 3.97 4.49
amsehili@25	133 ...
amsehili@25	134
amine@35	135 If you want to customize the output format, use `--printf` option:
amsehili@25	136
amsehili@25	137 auditok -e 55 --printf "[{id}]: {start} to {end}"
amsehili@25	138
amsehili@25	139 Output:
amsehili@25	140
amsehili@25	141 [1]: 0.22 to 0.67
amsehili@25	142 [2]: 2.81 to 4.18
amsehili@25	143 [3]: 5.53 to 6.44
amsehili@25	144 [4]: 7.32 to 7.82
amsehili@25	145 ...
amsehili@25	146
amsehili@28	147 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amsehili@25	148
amsehili@25	149 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amsehili@25	150
amsehili@25	151 Output:
amsehili@25	152
amsehili@25	153 [1]: 00:00:01.080 to 00:00:01.760
amsehili@25	154 [2]: 00:00:02.420 to 00:00:03.440
amsehili@25	155 [3]: 00:00:04.930 to 00:00:05.570
amsehili@25	156 [4]: 00:00:05.690 to 00:00:06.020
amsehili@25	157 [5]: 00:00:07.470 to 00:00:07.980
amsehili@25	158 ...
amsehili@25	159
amsehili@25	160 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amsehili@25	161
amsehili@43	162 ### Plot signal and detections
amine@21	163
amsehili@25	164 use option `-p`. Requires `matplotlib` and `numpy`.
amine@21	165
amsehili@25	166 auditok ... -p
amsehili@25	167
amsehili@26	168 ### Save plot as image or PDF
amsehili@25	169
amsehili@25	170 auditok ... --save-image output.png
amsehili@25	171
amsehili@25	172 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amsehili@25	173
amsehili@25	174 ### Read data from file
amine@21	175
amine@21	176 auditok -i input.wav ...
amine@21	177
amine@21	178 Install `pydub` for other audio formats.
amine@21	179
amine@21	180 ### Limit the length of aquired data
amine@21	181
amine@21	182 auditok -M 12 ...
amine@21	183
amine@21	184 Time is in seconds.
amine@21	185
amine@21	186 ### Save the whole acquired audio signal
amine@21	187
amine@21	188 auditok -O output.wav ...
amine@21	189
amine@21	190 Install `pydub` for other audio formats.
amine@21	191
amine@21	192
amine@21	193 ### Save each detection into a separate audio file
amine@21	194
amine@21	195 auditok -o det_{N}_{start}_{end}.wav ...
amine@21	196
amine@35	197 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, `start-time` and `end-time` respectively. Another example:
amine@21	198
amine@21	199 auditok -o {start}-{end}.wav ...
amine@21	200
amine@21	201 Install `pydub` for more audio formats.
amine@21	202
amine@2	203
amsehili@26	204 Setting detection parameters
amsehili@26	205 ----------------------------
amsehili@26	206
amsehili@26	207 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amsehili@26	208
amsehili@26	209
amsehili@27	210 \| Option \| Description \| Unit \| Default \|
amsehili@27	211 \| -------\|-------------------------------------------------------\|---------\|------------------\|
amsehili@27	212 \| `-n` \| Minimum length an accepted audio activity should have \| second \| 0.2 (200 ms) \|
amsehili@27	213 \| `-m` \| Maximum length an accepted audio activity should reach\| second \| 5. \|
amsehili@27	214 \| `-s` \| Maximum length of a continuous silence period within \| second \| 0.3 (300 ms) \|
amsehili@27	215 \| \| an accepted audio activity \| \| \|
amsehili@27	216 \| `-d` \| Drop trailing silence from an accepted audio activity \| boolean \| False \|
amsehili@27	217 \| `-a` \| Analysis window length (default value should be good) \| second \| 0.01 (10 ms) \|
amsehili@26	218
amsehili@43	219 Some practical use cases
amsehili@43	220 ------------------------
amsehili@43	221
amsehili@43	222 ### 1st practical use case: generate a subtitles template
amsehili@43	223
amsehili@43	224 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an srt file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amsehili@43	225
amsehili@43	226 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amsehili@43	227
amsehili@43	228 Output:
amsehili@43	229
amsehili@43	230 1
amsehili@43	231 00:00:00.730 --> 00:00:01.460
amsehili@43	232 Put some text here...
amsehili@43	233
amsehili@43	234 2
amsehili@43	235 00:00:02.440 --> 00:00:03.900
amsehili@43	236 Put some text here...
amsehili@43	237
amsehili@43	238 3
amsehili@43	239 00:00:06.410 --> 00:00:06.970
amsehili@43	240 Put some text here...
amsehili@43	241
amsehili@43	242 4
amsehili@43	243 00:00:07.260 --> 00:00:08.340
amsehili@43	244 Put some text here...
amsehili@43	245
amsehili@43	246 5
amsehili@43	247 00:00:09.510 --> 00:00:09.820
amsehili@43	248 Put some text here...
amsehili@43	249
amsehili@43	250 ### 2nd Practical use case example: build a (very) basic voice control application
amsehili@43	251
amsehili@43	252 [This repository](https://github.com/amsehili/gspeech-rec) supplies a bash script the can send audio data to Google's
amsehili@43	253 Speech Recognition service and get its transcription. In the following we will use auditok as a lower layer component
amsehili@43	254 of a voice control application. The basic idea is to tell auditok to run, for each detected audio activity, a certain
amsehili@43	255 number of commands that make up the rest of our voice control application.
amsehili@43	256
amsehili@43	257 Assume you have installed sox and downloaded the Speech Recognition script. The sequence of commands to run is:
amsehili@43	258
amsehili@43	259 1- Convert raw audio data to flac using sox:
amsehili@43	260
amsehili@43	261 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
amsehili@43	262
amsehili@43	263 2- Send flac audio data to Google and get its filtered transcription using [speech-rec.sh](https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh):
amsehili@43	264
amsehili@43	265 speech-rec.sh -i output.flac -r 16000
amsehili@43	266
amsehili@43	267 3- Use grep to select lines that contain transcript:
amsehili@43	268
amsehili@43	269 grep transcript
amsehili@43	270
amsehili@43	271
amsehili@43	272 4- Launch the following script, giving it the transcription as input:
amsehili@43	273
amsehili@43	274 #!/bin/bash
amsehili@43	275
amsehili@43	276 read line
amsehili@43	277
amsehili@43	278 RES=`echo "$line" \| grep -i "open firefox"`
amsehili@43	279
amsehili@43	280 if [[ $RES ]]
amsehili@43	281 then
amsehili@43	282 echo "Launch command: 'firefox &' ... "
amsehili@43	283 firefox &
amsehili@43	284 exit 0
amsehili@43	285 fi
amsehili@43	286
amsehili@43	287 exit 0
amsehili@43	288
amsehili@43	289 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains open firefox.
amsehili@43	290 Save a script into a file named voice-control.sh (don't forget to run a chmod u+x voice-control.sh).
amsehili@43	291
amsehili@43	292 Now, thanks to option `-C`, we will use the four instructions with a pipe and tell auditok to run them each time it detects
amsehili@43	293 an audio activity. Try the following command and say open firefox:
amsehili@43	294
amsehili@43	295 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 \| grep transcript \| ./voice-control.sh"
amsehili@43	296
amsehili@43	297 Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (auditok stops if there are no more data) and
amsehili@43	298 option `-n 1` to tell auditok to only accept tokens of 1 second or more and throw any token shorter than 1 second.
amsehili@43	299
amsehili@43	300 With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given.
amsehili@43	301
amsehili@26	302
amine@2	303 License
amine@2	304 -------
amine@2	305 `auditok` is published under the GNU General Public License Version 3.
amine@2	306
amine@2	307 Author
amine@2	308 ------
amine@2	309 Amine Sehili (<amine.sehili@gmail.com>)
amine@21	310

Mercurial > hg > auditok

annotate README.md @ 322:2cb8e29e1c9c