amsehili@349
|
1
|
amsehili@343
|
2
|
amine@344
|
3 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
4 :align: center
|
amsehili@343
|
5
|
amine@336
|
6 .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
|
amine@336
|
7 :target: https://travis-ci.org/amsehili/auditok
|
amine@336
|
8
|
amine@336
|
9 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
10 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@336
|
11 :alt: Documentation Status
|
amine@336
|
12
|
amsehili@349
|
13 ``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API.
|
amsehili@349
|
14
|
amsehili@349
|
15 A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed:
|
amsehili@349
|
16
|
amsehili@349
|
17 - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
|
amsehili@349
|
18
|
amsehili@349
|
19 - `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play back detections.
|
amsehili@349
|
20
|
amsehili@349
|
21 - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
|
amsehili@349
|
22
|
amsehili@354
|
23 - `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections.
|
amsehili@349
|
24
|
amsehili@349
|
25 - `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.
|
amsehili@349
|
26
|
amsehili@349
|
27 Installation
|
amsehili@349
|
28 ------------
|
amsehili@349
|
29
|
amsehili@349
|
30 .. code:: bash
|
amsehili@349
|
31
|
amsehili@354
|
32 pip install git+https://github.com/amsehili/auditok
|
amsehili@354
|
33
|
amsehili@349
|
34
|
amsehili@343
|
35 Basic example
|
amsehili@343
|
36 -------------
|
amsehili@343
|
37
|
amine@336
|
38 .. code:: python
|
amine@336
|
39
|
amine@336
|
40 from auditok import split
|
amsehili@343
|
41
|
amsehili@343
|
42 # split returns a generator of AudioRegion objects
|
amine@336
|
43 audio_regions = split("audio.wav")
|
amine@336
|
44 for region in audio_regions:
|
amine@336
|
45 region.play(progress_bar=True)
|
amine@336
|
46 filename = region.save("/tmp/region_{meta.start:.3f}.wav")
|
amine@336
|
47 print("region saved as: {}".format(filename))
|
amine@336
|
48
|
amsehili@343
|
49 Example using `AudioRegion`
|
amsehili@349
|
50 ---------------------------
|
amine@336
|
51
|
amine@336
|
52 .. code:: python
|
amine@336
|
53
|
amine@336
|
54 from auditok import AudioRegion
|
amine@336
|
55 region = AudioRegion.load("audio.wav")
|
amsehili@343
|
56 regions = region.split_and_plot() # or just region.splitp()
|
amine@336
|
57
|
amsehili@349
|
58 output figure:
|
amine@336
|
59
|
amine@336
|
60 .. image:: doc/figures/example_1.png
|
amsehili@349
|
61
|
amsehili@349
|
62 Working with AudioRegions
|
amsehili@349
|
63 -------------------------
|
amsehili@349
|
64
|
amsehili@349
|
65 Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects.
|
amsehili@349
|
66
|
amsehili@349
|
67 Concatenate regions
|
amsehili@349
|
68 ===================
|
amsehili@349
|
69
|
amsehili@349
|
70 .. code:: python
|
amsehili@349
|
71
|
amsehili@349
|
72 from auditok import AudioRegion
|
amsehili@349
|
73 region_1 = AudioRegion.load("audio_1.wav")
|
amsehili@349
|
74 region_2 = AudioRegion.load("audio_2.wav")
|
amsehili@349
|
75 region_3 = region_1 + region_2
|
amsehili@349
|
76
|
amsehili@349
|
77 Particularly useful if you want to join regions returned by ``split``:
|
amsehili@349
|
78
|
amsehili@349
|
79 .. code:: python
|
amsehili@349
|
80
|
amsehili@349
|
81 from auditok import AudioRegion
|
amsehili@349
|
82 regions = AudioRegion.load("audio.wav").split()
|
amsehili@349
|
83 gapless_region = sum(regions)
|
amsehili@349
|
84
|
amsehili@349
|
85 Repeat a region
|
amsehili@349
|
86 ===============
|
amsehili@349
|
87
|
amsehili@349
|
88 Multiply by a positive integer:
|
amsehili@349
|
89
|
amsehili@349
|
90 .. code:: python
|
amsehili@349
|
91
|
amsehili@349
|
92 from auditok import AudioRegion
|
amsehili@349
|
93 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
94 region_x3 = region * 3
|
amsehili@349
|
95
|
amsehili@349
|
96 Make slices of equal size out of a region
|
amsehili@349
|
97 =========================================
|
amsehili@349
|
98
|
amsehili@349
|
99 Divide by a positive integer:
|
amsehili@349
|
100
|
amsehili@349
|
101 .. code:: python
|
amsehili@349
|
102
|
amsehili@349
|
103 from auditok import AudioRegion
|
amsehili@349
|
104 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
105 regions = regions / 5
|
amsehili@349
|
106 assert sum(regions) == region
|
amsehili@349
|
107
|
amsehili@349
|
108 Make audio slices of arbitrary size
|
amsehili@349
|
109 ===================================
|
amsehili@349
|
110
|
amsehili@349
|
111 Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc.
|
amsehili@349
|
112
|
amsehili@349
|
113 The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows:
|
amsehili@349
|
114
|
amsehili@349
|
115 .. code:: python
|
amsehili@349
|
116
|
amsehili@349
|
117 from auditok import AudioRegion
|
amsehili@349
|
118 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
119 start = 20 * 16000
|
amsehili@349
|
120 stop = 25 * 16000
|
amsehili@349
|
121 five_second_region = region[start:stop]
|
amsehili@349
|
122
|
amsehili@349
|
123 This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices:
|
amsehili@349
|
124
|
amsehili@349
|
125 .. code:: python
|
amsehili@349
|
126
|
amsehili@349
|
127 from auditok import AudioRegion
|
amsehili@349
|
128 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
129 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
|
amsehili@349
|
130 three_last_seconds = region[start:]
|
amsehili@349
|
131
|
amsehili@349
|
132 While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``).
|
amsehili@349
|
133
|
amsehili@349
|
134 With the ``millis`` view:
|
amsehili@349
|
135
|
amsehili@349
|
136 .. code:: python
|
amsehili@349
|
137
|
amsehili@349
|
138 from auditok import AudioRegion
|
amsehili@349
|
139 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
140 five_second_region = region.millis[5000:10000]
|
amsehili@349
|
141
|
amsehili@349
|
142 or with the ``seconds`` view
|
amsehili@349
|
143
|
amsehili@349
|
144 .. code:: python
|
amsehili@349
|
145
|
amsehili@349
|
146 from auditok import AudioRegion
|
amsehili@349
|
147 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
148 five_second_region = region.seconds[5:10]
|
amsehili@349
|
149
|
amsehili@349
|
150 Get an array of audio samples
|
amsehili@349
|
151 =============================
|
amsehili@349
|
152
|
amsehili@349
|
153 .. code:: python
|
amsehili@349
|
154
|
amsehili@349
|
155 from auditok import AudioRegion
|
amsehili@349
|
156 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
157 samples = region.samples
|
amsehili@349
|
158
|
amsehili@349
|
159 If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data.
|
amsehili@349
|
160
|
amsehili@349
|
161 Alternatively you can use:
|
amsehili@349
|
162
|
amsehili@349
|
163 .. code:: python
|
amsehili@349
|
164
|
amsehili@349
|
165 import numpy as np
|
amsehili@349
|
166 region = AudioRegion.load("audio.wav")
|
amsehili@349
|
167 samples = np.asarray(region)
|
amsehili@349
|
168
|
amsehili@349
|
169 License
|
amsehili@349
|
170 -------
|
amsehili@349
|
171 MIT.
|
amsehili@349
|
172
|