amine@377
|
1 Loading audio data
|
amine@377
|
2 ------------------
|
amine@377
|
3
|
amine@377
|
4 From a file
|
amine@377
|
5 ===========
|
amine@377
|
6
|
amine@377
|
7 If the first argument of `load` is a string, it should be a path to an audio
|
amine@377
|
8 file.
|
amine@369
|
9
|
amine@369
|
10 .. code:: python
|
amine@369
|
11
|
amine@377
|
12 import auditok
|
amine@377
|
13 region = auditok.load("audio.ogg")
|
amine@369
|
14
|
amine@377
|
15 If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
|
amine@377
|
16 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
|
amine@377
|
17 mandatory. In the following example we pass audio parameters with their short
|
amine@377
|
18 names:
|
amine@369
|
19
|
amine@369
|
20 .. code:: python
|
amine@369
|
21
|
amine@377
|
22 region = auditok.load("audio.dat",
|
amine@377
|
23 audio_format="raw",
|
amine@377
|
24 sr=44100,
|
amine@377
|
25 sw=2
|
amine@377
|
26 ch=1)
|
amine@377
|
27
|
amine@377
|
28 From a `bytes` object
|
amine@377
|
29 =====================
|
amine@377
|
30
|
amine@377
|
31 If the first argument is of type `bytes` it's interpreted as raw audio data:
|
amine@377
|
32
|
amine@377
|
33 .. code:: python
|
amine@377
|
34
|
amine@377
|
35 sr = 16000
|
amine@377
|
36 sw = 2
|
amine@377
|
37 ch = 1
|
amine@377
|
38 data = b"\0" * sr * sw * ch
|
amine@377
|
39 load(data, sr=sr, sw=sw, ch=ch)
|
amine@377
|
40 print(region)
|
amine@377
|
41
|
amine@377
|
42 output:
|
amine@377
|
43
|
amine@377
|
44 .. code:: bash
|
amine@377
|
45
|
amine@377
|
46 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
47
|
amine@377
|
48 From the microphone
|
amine@377
|
49 ===================
|
amine@377
|
50
|
amine@377
|
51 If the first argument is `None`, `load` will try to read data from the microphone.
|
amine@377
|
52 Audio parameters, as well as the `max_read` parameter are mandatory:
|
amine@377
|
53
|
amine@377
|
54
|
amine@377
|
55 .. code:: python
|
amine@377
|
56
|
amine@377
|
57 sr = 16000
|
amine@377
|
58 sw = 2
|
amine@377
|
59 ch = 1
|
amine@377
|
60 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
|
amine@377
|
61 print(five_sec_audio)
|
amine@377
|
62
|
amine@377
|
63 output:
|
amine@377
|
64
|
amine@377
|
65 .. code:: bash
|
amine@377
|
66
|
amine@377
|
67 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
68
|
amine@377
|
69
|
amine@377
|
70 Skip part of audio data
|
amine@377
|
71 =======================
|
amine@377
|
72
|
amine@377
|
73 If the `skip` parameter is > 0, `load` will skip that leading amount of audio
|
amine@377
|
74 data:
|
amine@377
|
75
|
amine@377
|
76 .. code:: python
|
amine@377
|
77
|
amine@377
|
78 import auditok
|
amine@377
|
79 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
|
amine@377
|
80
|
amine@377
|
81 This argument must be 0 when reading from the microphone.
|
amine@377
|
82
|
amine@377
|
83
|
amine@377
|
84 Basic split example
|
amine@377
|
85 -------------------
|
amine@377
|
86
|
amine@377
|
87 .. code:: python
|
amine@377
|
88
|
amine@377
|
89 import auditok
|
amine@377
|
90
|
amine@377
|
91 # split returns a generator of AudioRegion objects
|
amine@377
|
92 audio_regions = auditok.split(
|
amine@377
|
93 "audio.wav",
|
amine@377
|
94 min_dur=0.2, # minimum duration of a valid audio event in seconds
|
amine@377
|
95 max_dur=4, # maximum duration of an event
|
amine@377
|
96 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
|
amine@377
|
97 energy_threshold=55 # threshold of detection
|
amine@377
|
98 )
|
amine@377
|
99
|
amine@377
|
100 for i, r in enumerate(audio_regions):
|
amine@377
|
101
|
amine@377
|
102 # Regions returned by `split` have 'start' and 'end' metadata fields
|
amine@377
|
103 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
|
amine@377
|
104
|
amine@377
|
105 # play detection
|
amine@377
|
106 # r.play(progress_bar=True)
|
amine@377
|
107
|
amine@377
|
108 # region's metadata can also be used with the `save` method
|
amine@377
|
109 # (no need to explicitly specify region's object and `format` arguments)
|
amine@377
|
110 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
|
amine@377
|
111 print("region saved as: {}".format(filename))
|
amine@377
|
112
|
amine@377
|
113 output example:
|
amine@377
|
114
|
amine@377
|
115 .. code:: bash
|
amine@377
|
116
|
amine@377
|
117 Region 0: 0.700s -- 1.400s
|
amine@377
|
118 region saved as: region_0.700-1.400.wav
|
amine@377
|
119 Region 1: 3.800s -- 4.500s
|
amine@377
|
120 region saved as: region_3.800-4.500.wav
|
amine@377
|
121 Region 2: 8.750s -- 9.950s
|
amine@377
|
122 region saved as: region_8.750-9.950.wav
|
amine@377
|
123 Region 3: 11.700s -- 12.400s
|
amine@377
|
124 region saved as: region_11.700-12.400.wav
|
amine@377
|
125 Region 4: 15.050s -- 15.850s
|
amine@377
|
126 region saved as: region_15.050-15.850.wav
|
amine@377
|
127
|
amine@377
|
128
|
amine@377
|
129 Split and plot
|
amine@377
|
130 --------------
|
amine@377
|
131
|
amine@377
|
132 Visualize audio signal and detections:
|
amine@377
|
133
|
amine@377
|
134 .. code:: python
|
amine@377
|
135
|
amine@377
|
136 import auditok
|
amine@377
|
137 region = auditok.load("audio.wav") # returns an AudioRegion object
|
amine@377
|
138 regions = region.split_and_plot(...) # or just region.splitp()
|
amine@369
|
139
|
amine@369
|
140 output figure:
|
amine@369
|
141
|
amine@369
|
142 .. image:: figures/example_1.png
|
amine@369
|
143
|
amine@377
|
144
|
amine@377
|
145 Read and split data from the microphone
|
amine@377
|
146 ---------------------------------------
|
amine@377
|
147
|
amine@377
|
148 If the first argument of `split` is None, audio data is read from the microphone
|
amine@377
|
149 (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
|
amine@377
|
150
|
amine@377
|
151 .. code:: python
|
amine@377
|
152
|
amine@377
|
153 import auditok
|
amine@377
|
154
|
amine@377
|
155 sr = 16000
|
amine@377
|
156 sw = 2
|
amine@377
|
157 ch = 1
|
amine@377
|
158 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
159
|
amine@377
|
160 try:
|
amine@377
|
161 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
162 print(region)
|
amine@377
|
163 region.play(progress_bar=True) # progress bar requires `tqdm`
|
amine@377
|
164 except KeyboardInterrupt:
|
amine@377
|
165 pass
|
amine@377
|
166
|
amine@377
|
167
|
amine@377
|
168 `split` will continue reading audio data until you press ``Ctrl-C``. If you want
|
amine@377
|
169 to read a specific amount of audio data, pass the desired number of seconds with
|
amine@377
|
170 the `max_read` argument.
|
amine@377
|
171
|
amine@377
|
172
|
amine@377
|
173 Accessing recorded data after split
|
amine@377
|
174 -----------------------------------
|
amine@377
|
175
|
amine@377
|
176 Using a `Recorder` object you can get hold of acquired audio:
|
amine@377
|
177
|
amine@377
|
178
|
amine@377
|
179 .. code:: python
|
amine@377
|
180
|
amine@377
|
181 import auditok
|
amine@377
|
182
|
amine@377
|
183 sr = 16000
|
amine@377
|
184 sw = 2
|
amine@377
|
185 ch = 1
|
amine@377
|
186 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
187
|
amine@377
|
188 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
|
amine@377
|
189
|
amine@377
|
190 try:
|
amine@377
|
191 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
192 print(region)
|
amine@377
|
193 region.play(progress_bar=True) # progress bar requires `tqdm`
|
amine@377
|
194 except KeyboardInterrupt:
|
amine@377
|
195 pass
|
amine@377
|
196
|
amine@377
|
197 rec.rewind()
|
amine@377
|
198 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
|
amine@377
|
199
|
amine@377
|
200
|
amine@377
|
201 `Recorder` also accepts a `max_read` argument.
|
amine@377
|
202
|
amine@369
|
203 Working with AudioRegions
|
amine@369
|
204 -------------------------
|
amine@369
|
205
|
amine@369
|
206 Beyond splitting, there are a couple of interesting operations you can do with
|
amine@369
|
207 `AudioRegion` objects.
|
amine@369
|
208
|
amine@377
|
209
|
amine@377
|
210 Basic region information
|
amine@377
|
211 ========================
|
amine@377
|
212
|
amine@377
|
213 .. code:: python
|
amine@377
|
214
|
amine@377
|
215 import auditok
|
amine@377
|
216 region = auditok.load("audio.wav")
|
amine@377
|
217 len(region) # number of audio samples int the regions, one channel considered
|
amine@377
|
218 region.duration # duration in seconds
|
amine@377
|
219 region.sampling_rate # alias `sr`
|
amine@377
|
220 region.sample_width # alias `sw`
|
amine@377
|
221 region.channels # alias `ch`
|
amine@377
|
222
|
amine@377
|
223
|
amine@369
|
224 Concatenate regions
|
amine@369
|
225 ===================
|
amine@369
|
226
|
amine@369
|
227 .. code:: python
|
amine@369
|
228
|
amine@377
|
229 import auditok
|
amine@377
|
230 region_1 = auditok.load("audio_1.wav")
|
amine@377
|
231 region_2 = auditok.load("audio_2.wav")
|
amine@369
|
232 region_3 = region_1 + region_2
|
amine@369
|
233
|
amine@377
|
234 Particularly useful if you want to join regions returned by `split`:
|
amine@369
|
235
|
amine@369
|
236 .. code:: python
|
amine@369
|
237
|
amine@377
|
238 import auditok
|
amine@377
|
239 regions = auditok.load("audio.wav").split()
|
amine@369
|
240 gapless_region = sum(regions)
|
amine@369
|
241
|
amine@369
|
242 Repeat a region
|
amine@369
|
243 ===============
|
amine@369
|
244
|
amine@369
|
245 Multiply by a positive integer:
|
amine@369
|
246
|
amine@369
|
247 .. code:: python
|
amine@369
|
248
|
amine@377
|
249 import auditok
|
amine@377
|
250 region = auditok.load("audio.wav")
|
amine@369
|
251 region_x3 = region * 3
|
amine@369
|
252
|
amine@377
|
253 Split one region into N regions of equal size
|
amine@377
|
254 =============================================
|
amine@369
|
255
|
amine@369
|
256 Divide by a positive integer:
|
amine@369
|
257
|
amine@369
|
258 .. code:: python
|
amine@369
|
259
|
amine@377
|
260 import auditok
|
amine@377
|
261 region = auditok.load("audio.wav")
|
amine@369
|
262 regions = regions / 5
|
amine@369
|
263 assert sum(regions) == region
|
amine@369
|
264
|
amine@377
|
265 Note that if perfect division is possible, the last region might be a bit shorter
|
amine@377
|
266 than the previous N-1 regions.
|
amine@369
|
267
|
amine@377
|
268 Slice a region by samples, seconds or milliseconds
|
amine@377
|
269 ==================================================
|
amine@377
|
270
|
amine@377
|
271 Slicing an `AudioRegion` can be interesting in many situations. You can for
|
amine@377
|
272 example remove a fixed-size portion of audio data from the beginning or from the
|
amine@377
|
273 end of a region or crop a region by an arbitrary amount as a data augmentation
|
amine@369
|
274 strategy, etc.
|
amine@369
|
275
|
amine@377
|
276 The most accurate way to slice an `AudioRegion` is to use indices that
|
amine@369
|
277 directly refer to raw audio samples. In the following example, assuming that the
|
amine@369
|
278 sampling rate of audio data is 16000, you can extract a 5-second region from
|
amine@369
|
279 main region, starting from the 20th second as follows:
|
amine@369
|
280
|
amine@369
|
281 .. code:: python
|
amine@369
|
282
|
amine@377
|
283 import auditok
|
amine@377
|
284 region = auditok.load("audio.wav")
|
amine@369
|
285 start = 20 * 16000
|
amine@369
|
286 stop = 25 * 16000
|
amine@369
|
287 five_second_region = region[start:stop]
|
amine@369
|
288
|
amine@377
|
289 This allows you to practically start and stop at any audio sample of the region.
|
amine@369
|
290 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
|
amine@369
|
291 also use negative indices:
|
amine@369
|
292
|
amine@369
|
293 .. code:: python
|
amine@369
|
294
|
amine@377
|
295 import auditok
|
amine@377
|
296 region = auditok.load("audio.wav")
|
amine@369
|
297 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
|
amine@369
|
298 three_last_seconds = region[start:]
|
amine@369
|
299
|
amine@369
|
300 While slicing by raw samples is accurate, slicing with temporal indices is more
|
amine@377
|
301 intuitive. You can do so by accessing the `millis` or `seconds` views of an
|
amine@377
|
302 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
|
amine@369
|
303
|
amine@377
|
304 With the `millis` view:
|
amine@369
|
305
|
amine@369
|
306 .. code:: python
|
amine@369
|
307
|
amine@377
|
308 import auditok
|
amine@377
|
309 region = auditok.load("audio.wav")
|
amine@369
|
310 five_second_region = region.millis[5000:10000]
|
amine@369
|
311
|
amine@377
|
312 or with the `seconds` view:
|
amine@369
|
313
|
amine@369
|
314 .. code:: python
|
amine@369
|
315
|
amine@377
|
316 import auditok
|
amine@377
|
317 region = auditok.load("audio.wav")
|
amine@369
|
318 five_second_region = region.seconds[5:10]
|
amine@369
|
319
|
amine@377
|
320 `seconds` indices can also be floats:
|
amine@369
|
321
|
amine@369
|
322 .. code:: python
|
amine@369
|
323
|
amine@377
|
324 import auditok
|
amine@377
|
325 region = auditok.load("audio.wav")
|
amine@377
|
326 five_second_region = region.seconds[2.5:7.5]
|
amine@377
|
327
|
amine@377
|
328 Get arrays of audio samples
|
amine@377
|
329 ===========================
|
amine@377
|
330
|
amine@377
|
331 If `numpy` is not installed, the `samples` attributes is a list of audio samples
|
amine@377
|
332 arrays (standard `array.array` objects), one per channels. If numpy is installed,
|
amine@377
|
333 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
|
amine@377
|
334 and the second is the the sample.
|
amine@377
|
335
|
amine@377
|
336 .. code:: python
|
amine@377
|
337
|
amine@377
|
338 import auditok
|
amine@377
|
339 region = auditok.load("audio.wav")
|
amine@369
|
340 samples = region.samples
|
amine@369
|
341
|
amine@369
|
342
|
amine@377
|
343 If `numpy` is not installed you can use:
|
amine@369
|
344
|
amine@369
|
345 .. code:: python
|
amine@369
|
346
|
amine@369
|
347 import numpy as np
|
amine@377
|
348 region = auditok.load("audio.wav")
|
amine@369
|
349 samples = np.asarray(region)
|
amine@377
|
350 assert len(samples.shape) == 2
|