amsehili@349
|
1
|
amsehili@343
|
2
|
amine@344
|
3 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
4 :align: center
|
amsehili@343
|
5
|
amine@336
|
6 .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
|
amine@336
|
7 :target: https://travis-ci.org/amsehili/auditok
|
amine@336
|
8
|
amine@336
|
9 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
10 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@336
|
11 :alt: Documentation Status
|
amine@336
|
12
|
amsehili@349
|
13 ``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API.
|
amsehili@349
|
14
|
amsehili@349
|
15 A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed:
|
amsehili@349
|
16
|
amsehili@349
|
17 - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
|
amsehili@349
|
18
|
amsehili@349
|
19 - `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play back detections.
|
amsehili@349
|
20
|
amsehili@349
|
21 - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
|
amsehili@349
|
22
|
amsehili@349
|
23 - `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections (see figures above ).
|
amsehili@349
|
24
|
amsehili@349
|
25 - `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.
|
amsehili@349
|
26
|
amsehili@349
|
27 Installation
|
amsehili@349
|
28 ------------
|
amsehili@349
|
29
|
amsehili@349
|
30 .. code:: bash
|
amsehili@349
|
31
|
amsehili@349
|
32 git clone https://github.com/amsehili/auditok.git
|
amsehili@349
|
33 cd auditok
|
amsehili@349
|
34 python setup.py install
|
amsehili@349
|
35
|
amsehili@343
|
36 Basic example
|
amsehili@343
|
37 -------------
|
amsehili@343
|
38
|
amine@336
|
39 .. code:: python
|
amine@336
|
40
|
amine@336
|
41 from auditok import split
|
amsehili@343
|
42
|
amsehili@343
|
43 # split returns a generator of AudioRegion objects
|
amine@336
|
44 audio_regions = split("audio.wav")
|
amine@336
|
45 for region in audio_regions:
|
amine@336
|
46 region.play(progress_bar=True)
|
amine@336
|
47 filename = region.save("/tmp/region_{meta.start:.3f}.wav")
|
amine@336
|
48 print("region saved as: {}".format(filename))
|
amine@336
|
49
|
amsehili@343
|
50 Example using `AudioRegion`
|
amsehili@349
|
51 ---------------------------
|
amine@336
|
52
|
amine@336
|
53 .. code:: python
|
amine@336
|
54
|
amine@336
|
55 from auditok import AudioRegion
|
amine@336
|
56 region = AudioRegion.load("audio.wav")
|
amsehili@343
|
57 regions = region.split_and_plot() # or just region.splitp()
|
amine@336
|
58
|
amsehili@349
|
59 output figure:
|
amine@336
|
60
|
amine@336
|
61 .. image:: doc/figures/example_1.png
|
amsehili@349
|
62
|
amsehili@349
|
63 Working with AudioRegions
|
amsehili@349
|
64 -------------------------
|
amsehili@349
|
65
|
amsehili@349
|
66 Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects.
|
amsehili@349
|
67
|
amsehili@349
|
68 Concatenate regions
|
amsehili@349
|
69 ===================
|
amsehili@349
|
70
|
amsehili@349
|
71 .. code:: python
|
amsehili@349
|
72
|
amsehili@349
|
73 from auditok import AudioRegion
|
amsehili@349
|
74 region_1 = AudioRegion.load("audio_1.wav")
|
amsehili@349
|
75 region_2 = AudioRegion.load("audio_2.wav")
|
amsehili@349
|
76 region_3 = region_1 + region_2
|
amsehili@349
|
77
|
amsehili@349
|
78 Particularly useful if you want to join regions returned by ``split``:
|
amsehili@349
|
79
|
amsehili@349
|
80 .. code:: python
|
amsehili@349
|
81
|
amsehili@349
|
82 from auditok import AudioRegion
|
amsehili@349
|
83 regions = AudioRegion.load("audio.wav").split()
|
amsehili@349
|
84 gapless_region = sum(regions)
|
amsehili@349
|
85
|
amsehili@349
|
86 Repeat a region
|
amsehili@349
|
87 ===============
|
amsehili@349
|
88
|
amsehili@349
|
89 Multiply by a positive integer:
|
amsehili@349
|
90
|
amsehili@349
|
91 .. code:: python
|
amsehili@349
|
92
|
amsehili@349
|
93 from auditok import AudioRegion
|
amsehili@349
|
94 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
95 region_x3 = region * 3
|
amsehili@349
|
96
|
amsehili@349
|
97 Make slices of equal size out of a region
|
amsehili@349
|
98 =========================================
|
amsehili@349
|
99
|
amsehili@349
|
100 Divide by a positive integer:
|
amsehili@349
|
101
|
amsehili@349
|
102 .. code:: python
|
amsehili@349
|
103
|
amsehili@349
|
104 from auditok import AudioRegion
|
amsehili@349
|
105 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
106 regions = regions / 5
|
amsehili@349
|
107 assert sum(regions) == region
|
amsehili@349
|
108
|
amsehili@349
|
109 Make audio slices of arbitrary size
|
amsehili@349
|
110 ===================================
|
amsehili@349
|
111
|
amsehili@349
|
112 Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc.
|
amsehili@349
|
113
|
amsehili@349
|
114 The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows:
|
amsehili@349
|
115
|
amsehili@349
|
116 .. code:: python
|
amsehili@349
|
117
|
amsehili@349
|
118 from auditok import AudioRegion
|
amsehili@349
|
119 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
120 start = 20 * 16000
|
amsehili@349
|
121 stop = 25 * 16000
|
amsehili@349
|
122 five_second_region = region[start:stop]
|
amsehili@349
|
123
|
amsehili@349
|
124 This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices:
|
amsehili@349
|
125
|
amsehili@349
|
126 .. code:: python
|
amsehili@349
|
127
|
amsehili@349
|
128 from auditok import AudioRegion
|
amsehili@349
|
129 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
130 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
|
amsehili@349
|
131 three_last_seconds = region[start:]
|
amsehili@349
|
132
|
amsehili@349
|
133 While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``).
|
amsehili@349
|
134
|
amsehili@349
|
135 With the ``millis`` view:
|
amsehili@349
|
136
|
amsehili@349
|
137 .. code:: python
|
amsehili@349
|
138
|
amsehili@349
|
139 from auditok import AudioRegion
|
amsehili@349
|
140 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
141 five_second_region = region.millis[5000:10000]
|
amsehili@349
|
142
|
amsehili@349
|
143 or with the ``seconds`` view
|
amsehili@349
|
144
|
amsehili@349
|
145 .. code:: python
|
amsehili@349
|
146
|
amsehili@349
|
147 from auditok import AudioRegion
|
amsehili@349
|
148 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
149 five_second_region = region.seconds[5:10]
|
amsehili@349
|
150
|
amsehili@349
|
151 Get an array of audio samples
|
amsehili@349
|
152 =============================
|
amsehili@349
|
153
|
amsehili@349
|
154 .. code:: python
|
amsehili@349
|
155
|
amsehili@349
|
156 from auditok import AudioRegion
|
amsehili@349
|
157 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
158 samples = region.samples
|
amsehili@349
|
159
|
amsehili@349
|
160 If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data.
|
amsehili@349
|
161
|
amsehili@349
|
162 Alternatively you can use:
|
amsehili@349
|
163
|
amsehili@349
|
164 .. code:: python
|
amsehili@349
|
165
|
amsehili@349
|
166 import numpy as np
|
amsehili@349
|
167 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
168 samples = np.asarray(region)
|
amsehili@349
|
169
|
amsehili@349
|
170 License
|
amsehili@349
|
171 -------
|
amsehili@349
|
172 MIT.
|
amsehili@349
|
173
|