amsehili@11
|
1 [](https://travis-ci.org/amsehili/auditok)
|
amsehili@11
|
2 AUDIo TOKenizer
|
amine@2
|
3 ===============
|
amine@2
|
4
|
amsehili@20
|
5 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
|
amsehili@20
|
6
|
amsehili@20
|
7 The following two figures illustrate the detector output when:
|
amsehili@20
|
8
|
amsehili@20
|
9 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
|
amsehili@20
|
10 
|
amsehili@20
|
11
|
amsehili@20
|
12 2. the detector splits an audio activity event into many activities if the within silence is over 0.2 second:
|
amsehili@20
|
13 
|
amsehili@20
|
14
|
amine@2
|
15
|
amine@2
|
16 Requirements
|
amine@2
|
17 ------------
|
amsehili@20
|
18 `auditok` can be used with standard Python!
|
amsehili@20
|
19 However if you want more features, the following packages are needed:
|
amsehili@20
|
20 - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
|
amsehili@20
|
21 - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections
|
amine@21
|
22 - `matplotlib`: plot audio signal and detections (see figures above)
|
amine@21
|
23 - `numpy`: required by matplotlib. Also used for math operations instead of standard python if available
|
amsehili@20
|
24 - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe.
|
amsehili@20
|
25
|
amine@2
|
26
|
amine@2
|
27 Installation
|
amine@2
|
28 ------------
|
amine@4
|
29 python setup.py install
|
amine@2
|
30
|
amine@21
|
31 Command line usage:
|
amine@21
|
32 ------------------
|
amine@21
|
33
|
amine@21
|
34 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
|
amine@21
|
35
|
amine@21
|
36 auditok -D -E
|
amine@21
|
37
|
amine@21
|
38 Option `-D` means debug, whereas `-E` stands for echo, so `auditok` plays back whatever it detects.
|
amine@21
|
39
|
amine@21
|
40 If there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 45), use option `-e`:
|
amine@21
|
41
|
amine@21
|
42 auditok -D -E -e 55
|
amine@21
|
43
|
amine@21
|
44 If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`):
|
amine@21
|
45
|
amine@21
|
46 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -r 16000 -i -
|
amine@21
|
47
|
amine@21
|
48 With `-i -`, `auditok` reads data from standard input.
|
amine@21
|
49
|
amine@21
|
50 `rec` and `play` are just an alias for `sox`. Doing so you won't be able to play audio detections (`-E` requires `Pyaudio`). Fortunately, `auditok` gives the possibility to call any command every time it detects an activity, passing the activity as a file to the user supplied command:
|
amine@21
|
51
|
amine@21
|
52 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
|
amine@21
|
53
|
amine@21
|
54 The `-C` option tells `auditok` to interpret its content as a command that is run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
|
amine@21
|
55
|
amine@21
|
56 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
|
amine@21
|
57
|
amine@21
|
58 ### Plot signal and detections:
|
amine@21
|
59
|
amine@21
|
60 use option `-p`. Requires `matplotlib` and `numpy`
|
amine@21
|
61
|
amine@21
|
62 ### read data from file
|
amine@21
|
63
|
amine@21
|
64 auditok -i input.wav ...
|
amine@21
|
65
|
amine@21
|
66 Install `pydub` for other audio formats.
|
amine@21
|
67
|
amine@21
|
68 ### Limit the length of aquired data
|
amine@21
|
69
|
amine@21
|
70 auditok -M 12 ...
|
amine@21
|
71
|
amine@21
|
72 Time is in seconds.
|
amine@21
|
73
|
amine@21
|
74 ### Save the whole acquired audio signal
|
amine@21
|
75
|
amine@21
|
76 auditok -O output.wav ...
|
amine@21
|
77
|
amine@21
|
78 Install `pydub` for other audio formats.
|
amine@21
|
79
|
amine@21
|
80
|
amine@21
|
81 ### Save each detection into a separate audio file
|
amine@21
|
82
|
amine@21
|
83 auditok -o det_{N}_{start}_{end}.wav ...
|
amine@21
|
84
|
amine@21
|
85 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
|
amine@21
|
86
|
amine@21
|
87 auditok -o {start}-{end}.wav ...
|
amine@21
|
88
|
amine@21
|
89 Install `pydub` for more audio formats.
|
amine@21
|
90
|
amine@2
|
91 Demos
|
amine@2
|
92 -----
|
amine@2
|
93 This code reads data from the microphone and plays back whatever it detects.
|
amine@3
|
94
|
amine@2
|
95 python demos/echo.py
|
amine@2
|
96
|
amine@2
|
97 `echo.py` accepts two arguments: energy threshold (default=45) and duration in seconds (default=10):
|
amine@2
|
98
|
amine@2
|
99 python demos/echo.py 50 15
|
amine@2
|
100
|
amine@4
|
101 If only one argument is given it will be used for energy.
|
amine@4
|
102
|
amine@4
|
103 Try out this demo with an audio file (no argument is required):
|
amine@4
|
104
|
amine@4
|
105 python demos/audio_tokenize_demo.py
|
amine@4
|
106
|
amsehili@6
|
107 Finally, in this demo `auditok` is used to remove tailing and leading silence from an audio file:
|
amine@4
|
108
|
amine@4
|
109 python demos/audio_trim_demo.py
|
amine@2
|
110
|
amine@2
|
111 Documentation
|
amine@2
|
112 -------------
|
amine@2
|
113
|
amsehili@6
|
114 Check out this [quick start](https://github.com/amsehili/auditok/blob/master/quickstart.rst) or the [API documentation](http://amsehili.github.io/auditok/pdoc/).
|
amsehili@6
|
115
|
amine@2
|
116
|
amine@2
|
117 Contribution
|
amine@2
|
118 ------------
|
amine@2
|
119 Contributions are very appreciated !
|
amine@2
|
120
|
amine@2
|
121 License
|
amine@2
|
122 -------
|
amine@2
|
123 `auditok` is published under the GNU General Public License Version 3.
|
amine@2
|
124
|
amine@2
|
125 Author
|
amine@2
|
126 ------
|
amine@2
|
127 Amine Sehili (<amine.sehili@gmail.com>)
|
amine@21
|
128
|