auditok: README.md annotate

annotate README.md @ 27:25ea38ae87e7

Update README.md

author	Amine SEHILI <amsehili@users.noreply.github.com>
date	Sun, 29 Nov 2015 01:07:29 +0100
parents	6478ac9c1b42
children	ded666b423b7

rev	line source
amsehili@11	1 [![Build Status](https://travis-ci.org/amsehili/auditok.svg?branch=master)](https://travis-ci.org/amsehili/auditok)
amsehili@11	2 AUDIo TOKenizer
amine@2	3 ===============
amine@2	4
amsehili@20	5 `auditok` is an Audio Activity Detection tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
amsehili@20	6
amsehili@25	7 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
amsehili@25	8 - [Requirements](https://github.com/amsehili/auditok#requirements)
amsehili@25	9 - [Installation](https://github.com/amsehili/auditok#installation)
amsehili@25	10 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
amsehili@25	11 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
amsehili@26	12 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
amsehili@26	13 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
amsehili@26	14 - [Set printed detection information format](https://github.com/amsehili/auditok#set-printed-detection-information-format)
amsehili@26	15 - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template)
amsehili@26	16 - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections)
amsehili@26	17 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
amsehili@26	18 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
amsehili@26	19 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
amsehili@26	20 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
amsehili@26	21 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
amsehili@26	22 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
amsehili@26	23 - [License](https://github.com/amsehili/auditok#license)
amsehili@26	24 - [uthor](https://github.com/amsehili/auditok#author)
amsehili@25	25
amsehili@25	26 Two-figure explanation
amsehili@25	27 ----------------------
amsehili@25	28 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
amsehili@20	29
amsehili@20	30 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amsehili@20	31 ![](doc/figures/figure_1.png)
amsehili@20	32
amsehili@25	33 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amsehili@20	34 ![](doc/figures/figure_2.png)
amsehili@20	35
amine@2	36
amine@2	37 Requirements
amine@2	38 ------------
amsehili@20	39 `auditok` can be used with standard Python!
amsehili@20	40 However if you want more features, the following packages are needed:
amsehili@20	41 - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
amsehili@20	42 - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections
amine@21	43 - `matplotlib`: plot audio signal and detections (see figures above)
amine@21	44 - `numpy`: required by matplotlib. Also used for math operations instead of standard python if available
amsehili@20	45 - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe.
amsehili@20	46
amine@2	47
amine@2	48 Installation
amine@2	49 ------------
amine@4	50 python setup.py install
amine@2	51
amsehili@25	52 Command line usage
amine@21	53 ------------------
amine@21	54
amsehili@25	55 ### Try the detector with your voice
amsehili@25	56
amine@21	57 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@21	58
amsehili@25	59 auditok
amine@21	60
amsehili@25	61 This will print id start time and end time for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
amine@21	62
amsehili@25	63 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -r 16000 -w 2 -c 1
amsehili@25	64
amsehili@25	65 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
amine@21	66
amsehili@27	67 \| Audio parameter \| sox option \| auditok option \| `auditok` default \|
amsehili@25	68 \| --------------- \|------------\|----------------\|-----------------------\|
amsehili@25	69 \| Sampling rate \| -r \| -r \| 16000 \|
amsehili@25	70 \| Sample width \| -b (bits) \| -w (bytes) \| 2 \|
amsehili@25	71 \| Channels \| -c \| -c \| 1 \|
amsehili@25	72 \| Encoding \| -e \| None \| always signed integer \|
amine@21	73
amsehili@25	74 According to this table, the previous command can be run as:
amine@21	75
amsehili@25	76 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i -
amine@21	77
amsehili@25	78 ### Play back detections
amine@21	79
amsehili@25	80 auditok -E
amine@21	81
amsehili@25	82 OR
amsehili@25	83
amsehili@25	84 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -E
amsehili@25	85
amsehili@25	86 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amsehili@25	87
amsehili@25	88 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@21	89
amsehili@25	90 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amsehili@25	91
amsehili@25	92 `rec` and `play` are just an alias for `sox`.
amine@21	93
amine@21	94 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@21	95
amsehili@25	96 ### Set detection threshold
amsehili@25	97
amsehili@25	98 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amsehili@25	99
amsehili@25	100 auditok -E -e 55
amsehili@25	101
amsehili@25	102 OR
amsehili@25	103
amsehili@25	104 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amsehili@25	105
amsehili@26	106 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amsehili@25	107
amsehili@25	108 ### Set detection information format
amsehili@25	109
amsehili@25	110 By default, the `auditok` prints the `id` `start time` `end time` of each detected activity:
amsehili@25	111
amsehili@25	112 1 1.87 2.67
amsehili@25	113 2 3.05 3.73
amsehili@25	114 3 3.97 4.49
amsehili@25	115 ...
amsehili@25	116
amsehili@25	117 If you want to personalize the output format, use `--printf` option:
amsehili@25	118
amsehili@25	119 auditok -e 55 --printf "[{id}]: {start} to {end}"
amsehili@25	120
amsehili@25	121 Output:
amsehili@25	122
amsehili@25	123 [1]: 0.22 to 0.67
amsehili@25	124 [2]: 2.81 to 4.18
amsehili@25	125 [3]: 5.53 to 6.44
amsehili@25	126 [4]: 7.32 to 7.82
amsehili@25	127 ...
amsehili@25	128
amsehili@25	129 Keywords `{id}`, `{start}` and `{end}` can be placed and repeate anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amsehili@25	130
amsehili@25	131 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amsehili@25	132
amsehili@25	133 Output:
amsehili@25	134
amsehili@25	135 [1]: 00:00:01.080 to 00:00:01.760
amsehili@25	136 [2]: 00:00:02.420 to 00:00:03.440
amsehili@25	137 [3]: 00:00:04.930 to 00:00:05.570
amsehili@25	138 [4]: 00:00:05.690 to 00:00:06.020
amsehili@25	139 [5]: 00:00:07.470 to 00:00:07.980
amsehili@25	140 ...
amsehili@25	141
amsehili@25	142 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amsehili@25	143
amsehili@26	144 ### Practical use case: generate a subtitles template
amsehili@25	145
amsehili@25	146 Using `--printf ` and `--time-format`, the following command, used with an input file, will generate and srt file template that can be later edited a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amsehili@25	147
amsehili@25	148 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amsehili@25	149
amsehili@25	150 Output:
amsehili@25	151
amsehili@25	152 1
amsehili@25	153 00:00:00.730 --> 00:00:01.460
amsehili@25	154 Put some text here...
amsehili@25	155
amsehili@25	156 2
amsehili@25	157 00:00:02.440 --> 00:00:03.900
amsehili@25	158 Put some text here...
amsehili@25	159
amsehili@25	160 3
amsehili@25	161 00:00:06.410 --> 00:00:06.970
amsehili@25	162 Put some text here...
amsehili@25	163
amsehili@25	164 4
amsehili@25	165 00:00:07.260 --> 00:00:08.340
amsehili@25	166 Put some text here...
amsehili@25	167
amsehili@25	168 5
amsehili@25	169 00:00:09.510 --> 00:00:09.820
amsehili@25	170 Put some text here...
amsehili@25	171
amine@21	172 ### Plot signal and detections:
amine@21	173
amsehili@25	174 use option `-p`. Requires `matplotlib` and `numpy`.
amine@21	175
amsehili@25	176 auditok ... -p
amsehili@25	177
amsehili@26	178 ### Save plot as image or PDF
amsehili@25	179
amsehili@25	180 auditok ... --save-image output.png
amsehili@25	181
amsehili@25	182 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amsehili@25	183
amsehili@25	184 ### Read data from file
amine@21	185
amine@21	186 auditok -i input.wav ...
amine@21	187
amine@21	188 Install `pydub` for other audio formats.
amine@21	189
amine@21	190 ### Limit the length of aquired data
amine@21	191
amine@21	192 auditok -M 12 ...
amine@21	193
amine@21	194 Time is in seconds.
amine@21	195
amine@21	196 ### Save the whole acquired audio signal
amine@21	197
amine@21	198 auditok -O output.wav ...
amine@21	199
amine@21	200 Install `pydub` for other audio formats.
amine@21	201
amine@21	202
amine@21	203 ### Save each detection into a separate audio file
amine@21	204
amine@21	205 auditok -o det_{N}_{start}_{end}.wav ...
amine@21	206
amine@21	207 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
amine@21	208
amine@21	209 auditok -o {start}-{end}.wav ...
amine@21	210
amine@21	211 Install `pydub` for more audio formats.
amine@21	212
amine@2	213
amsehili@26	214 Setting detection parameters
amsehili@26	215 ----------------------------
amsehili@26	216
amsehili@26	217 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amsehili@26	218
amsehili@26	219
amsehili@27	220 \| Option \| Description \| Unit \| Default \|
amsehili@27	221 \| -------\|-------------------------------------------------------\|---------\|------------------\|
amsehili@27	222 \| `-n` \| Minimum length an accepted audio activity should have \| second \| 0.2 (200 ms) \|
amsehili@27	223 \| `-m` \| Maximum length an accepted audio activity should reach\| second \| 5. \|
amsehili@27	224 \| `-s` \| Maximum length of a continuous silence period within \| second \| 0.3 (300 ms) \|
amsehili@27	225 \| \| an accepted audio activity \| \| \|
amsehili@27	226 \| `-d` \| Drop trailing silence from an accepted audio activity \| boolean \| False \|
amsehili@27	227 \| `-a` \| Analysis window length (default value should be good) \| second \| 0.01 (10 ms) \|
amsehili@26	228
amsehili@26	229
amine@2	230 License
amine@2	231 -------
amine@2	232 `auditok` is published under the GNU General Public License Version 3.
amine@2	233
amine@2	234 Author
amine@2	235 ------
amine@2	236 Amine Sehili (<amine.sehili@gmail.com>)
amine@21	237

Mercurial > hg > auditok

annotate README.md @ 27:25ea38ae87e7