annotate README.md @ 40:23dbe3bacdf7

doc update
author Amine Sehili <amine.sehili@gmail.com>
date Thu, 03 Dec 2015 01:21:44 +0100
parents c964768a000a
children ee6c9924df75
rev   line source
amsehili@11 1 [![Build Status](https://travis-ci.org/amsehili/auditok.svg?branch=master)](https://travis-ci.org/amsehili/auditok)
amine@37 2 [![Documentation Status](https://readthedocs.org/projects/auditok/badge/?version=latest)](http://auditok.readthedocs.org/en/latest/?badge=latest)
amsehili@11 3 AUDIo TOKenizer
amine@2 4 ===============
amine@2 5
amsehili@20 6 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
amsehili@20 7
amine@35 8 A more detailed version of this user guide as well as an API tutorial and API reference can be found at [Readthedocs](http://auditok.readthedocs.org/en/latest/)
amine@35 9
amsehili@25 10 - [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
amsehili@25 11 - [Requirements](https://github.com/amsehili/auditok#requirements)
amsehili@25 12 - [Installation](https://github.com/amsehili/auditok#installation)
amsehili@25 13 - [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
amsehili@25 14 - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
amsehili@26 15 - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
amsehili@26 16 - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
amsehili@29 17 - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information)
amsehili@26 18 - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template)
amsehili@26 19 - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections)
amsehili@26 20 - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
amsehili@26 21 - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
amsehili@26 22 - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
amsehili@26 23 - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
amsehili@26 24 - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
amsehili@26 25 - [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
amsehili@26 26 - [License](https://github.com/amsehili/auditok#license)
amsehili@26 27 - [uthor](https://github.com/amsehili/auditok#author)
amsehili@25 28
amsehili@25 29 Two-figure explanation
amsehili@25 30 ----------------------
amsehili@25 31 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
amsehili@20 32
amsehili@20 33 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amsehili@20 34 ![](doc/figures/figure_1.png)
amsehili@20 35
amsehili@25 36 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amsehili@20 37 ![](doc/figures/figure_2.png)
amsehili@20 38
amine@35 39 Beyond plotting signal and detections, you can play back audio activities as they are detected, save them or run a user command each time there is an activity,
amine@35 40 using, optionally, the file name of audio activity as an argument for the command.
amine@2 41
amine@2 42 Requirements
amine@2 43 ------------
amine@40 44 `auditok` can be used with standard Python!
amine@40 45
amine@40 46 However, if you want more features, the following packages are needed:
amsehili@20 47 - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
amsehili@20 48 - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections
amine@40 49 - [matplotlib](http://matplotlib.org/): plot audio signal and detections (see figures above)
amine@40 50 - [numpy](http://www.numpy.org): required by matplotlib. Also used for math operations instead of standard python if available
amsehili@20 51 - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe.
amsehili@20 52
amine@2 53
amine@2 54 Installation
amine@2 55 ------------
amine@40 56
amine@40 57 git clone https://github.com/amsehili/auditok.git
amine@40 58 cd auditok
amine@4 59 python setup.py install
amine@2 60
amsehili@25 61 Command line usage
amine@21 62 ------------------
amine@21 63
amsehili@25 64 ### Try the detector with your voice
amsehili@25 65
amine@21 66 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@21 67
amsehili@25 68 auditok
amine@21 69
amine@35 70 This will print `id`, `start-time` and `end-time` for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
amine@21 71
amsehili@25 72 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
amsehili@25 73
amsehili@25 74 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
amine@21 75
amine@35 76 | Audio parameter | sox option | `auditok` option | `auditok` default |
amine@35 77 | --------------- |------------|------------------|-----------------------|
amine@35 78 | Sampling rate | -r | -r | 16000 |
amine@35 79 | Sample width | -b (bits) | -w (bytes) | 2 |
amine@35 80 | Channels | -c | -c | 1 |
amine@35 81 | Encoding | -e | None | always signed integer |
amine@21 82
amsehili@25 83 According to this table, the previous command can be run as:
amine@21 84
amsehili@25 85 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
amine@21 86
amsehili@25 87 ### Play back detections
amine@21 88
amsehili@25 89 auditok -E
amine@21 90
amine@35 91 **or**
amsehili@25 92
amsehili@25 93 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
amsehili@25 94
amsehili@25 95 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amsehili@25 96
amsehili@25 97 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@21 98
amsehili@25 99 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amsehili@25 100
amsehili@25 101 `rec` and `play` are just an alias for `sox`.
amine@21 102
amine@21 103 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@21 104
amsehili@25 105 ### Set detection threshold
amsehili@25 106
amsehili@25 107 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amsehili@25 108
amsehili@25 109 auditok -E -e 55
amsehili@25 110
amine@35 111 **or**
amsehili@25 112
amsehili@25 113 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amsehili@25 114
amsehili@26 115 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amsehili@25 116
amsehili@29 117 ### Set format for printed detections information
amsehili@25 118
amine@35 119 By default, `auditok` prints the `id` `start-time` `end-time` of each detected activity:
amsehili@25 120
amsehili@25 121 1 1.87 2.67
amsehili@25 122 2 3.05 3.73
amsehili@25 123 3 3.97 4.49
amsehili@25 124 ...
amsehili@25 125
amine@35 126 If you want to customize the output format, use `--printf` option:
amsehili@25 127
amsehili@25 128 auditok -e 55 --printf "[{id}]: {start} to {end}"
amsehili@25 129
amsehili@25 130 Output:
amsehili@25 131
amsehili@25 132 [1]: 0.22 to 0.67
amsehili@25 133 [2]: 2.81 to 4.18
amsehili@25 134 [3]: 5.53 to 6.44
amsehili@25 135 [4]: 7.32 to 7.82
amsehili@25 136 ...
amsehili@25 137
amsehili@28 138 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amsehili@25 139
amsehili@25 140 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amsehili@25 141
amsehili@25 142 Output:
amsehili@25 143
amsehili@25 144 [1]: 00:00:01.080 to 00:00:01.760
amsehili@25 145 [2]: 00:00:02.420 to 00:00:03.440
amsehili@25 146 [3]: 00:00:04.930 to 00:00:05.570
amsehili@25 147 [4]: 00:00:05.690 to 00:00:06.020
amsehili@25 148 [5]: 00:00:07.470 to 00:00:07.980
amsehili@25 149 ...
amsehili@25 150
amsehili@25 151 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amsehili@25 152
amsehili@26 153 ### Practical use case: generate a subtitles template
amsehili@25 154
amsehili@28 155 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amsehili@25 156
amsehili@25 157 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amsehili@25 158
amsehili@25 159 Output:
amsehili@25 160
amsehili@25 161 1
amsehili@25 162 00:00:00.730 --> 00:00:01.460
amsehili@25 163 Put some text here...
amsehili@25 164
amsehili@25 165 2
amsehili@25 166 00:00:02.440 --> 00:00:03.900
amsehili@25 167 Put some text here...
amsehili@25 168
amsehili@25 169 3
amsehili@25 170 00:00:06.410 --> 00:00:06.970
amsehili@25 171 Put some text here...
amsehili@25 172
amsehili@25 173 4
amsehili@25 174 00:00:07.260 --> 00:00:08.340
amsehili@25 175 Put some text here...
amsehili@25 176
amsehili@25 177 5
amsehili@25 178 00:00:09.510 --> 00:00:09.820
amsehili@25 179 Put some text here...
amsehili@25 180
amine@21 181 ### Plot signal and detections:
amine@21 182
amsehili@25 183 use option `-p`. Requires `matplotlib` and `numpy`.
amine@21 184
amsehili@25 185 auditok ... -p
amsehili@25 186
amsehili@26 187 ### Save plot as image or PDF
amsehili@25 188
amsehili@25 189 auditok ... --save-image output.png
amsehili@25 190
amsehili@25 191 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amsehili@25 192
amsehili@25 193 ### Read data from file
amine@21 194
amine@21 195 auditok -i input.wav ...
amine@21 196
amine@21 197 Install `pydub` for other audio formats.
amine@21 198
amine@21 199 ### Limit the length of aquired data
amine@21 200
amine@21 201 auditok -M 12 ...
amine@21 202
amine@21 203 Time is in seconds.
amine@21 204
amine@21 205 ### Save the whole acquired audio signal
amine@21 206
amine@21 207 auditok -O output.wav ...
amine@21 208
amine@21 209 Install `pydub` for other audio formats.
amine@21 210
amine@21 211
amine@21 212 ### Save each detection into a separate audio file
amine@21 213
amine@21 214 auditok -o det_{N}_{start}_{end}.wav ...
amine@21 215
amine@35 216 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, `start-time` and `end-time` respectively. Another example:
amine@21 217
amine@21 218 auditok -o {start}-{end}.wav ...
amine@21 219
amine@21 220 Install `pydub` for more audio formats.
amine@21 221
amine@2 222
amsehili@26 223 Setting detection parameters
amsehili@26 224 ----------------------------
amsehili@26 225
amsehili@26 226 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amsehili@26 227
amsehili@26 228
amsehili@27 229 | Option | Description | Unit | Default |
amsehili@27 230 | -------|-------------------------------------------------------|---------|------------------|
amsehili@27 231 | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) |
amsehili@27 232 | `-m` | Maximum length an accepted audio activity should reach| second | 5. |
amsehili@27 233 | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) |
amsehili@27 234 | | an accepted audio activity | | |
amsehili@27 235 | `-d` | Drop trailing silence from an accepted audio activity | boolean | False |
amsehili@27 236 | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) |
amsehili@26 237
amsehili@26 238
amine@2 239 License
amine@2 240 -------
amine@2 241 `auditok` is published under the GNU General Public License Version 3.
amine@2 242
amine@2 243 Author
amine@2 244 ------
amine@2 245 Amine Sehili (<amine.sehili@gmail.com>)
amine@21 246