amine@387
|
1 Load audio data
|
amine@387
|
2 ---------------
|
amine@377
|
3
|
amine@379
|
4 Audio data is loaded with the :func:`load` function which can read from audio
|
amine@379
|
5 files, the microphone or use raw audio data.
|
amine@379
|
6
|
amine@377
|
7 From a file
|
amine@377
|
8 ===========
|
amine@377
|
9
|
amine@387
|
10 If the first argument of :func:`load` is a string, it should be a path to an
|
amine@387
|
11 audio file.
|
amine@369
|
12
|
amine@369
|
13 .. code:: python
|
amine@369
|
14
|
amine@377
|
15 import auditok
|
amine@377
|
16 region = auditok.load("audio.ogg")
|
amine@369
|
17
|
amine@387
|
18 If input file contains raw (headerless) audio data, passing `audio_format="raw"`
|
amine@377
|
19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
|
amine@377
|
20 mandatory. In the following example we pass audio parameters with their short
|
amine@377
|
21 names:
|
amine@369
|
22
|
amine@369
|
23 .. code:: python
|
amine@369
|
24
|
amine@377
|
25 region = auditok.load("audio.dat",
|
amine@377
|
26 audio_format="raw",
|
amine@379
|
27 sr=44100, # alias for `sampling_rate`
|
amine@379
|
28 sw=2 # alias for `sample_width`
|
amine@379
|
29 ch=1 # alias for `channels`
|
amine@379
|
30 )
|
amine@377
|
31
|
amine@377
|
32 From a `bytes` object
|
amine@377
|
33 =====================
|
amine@377
|
34
|
amine@379
|
35 If the type of the first argument `bytes`, it's interpreted as raw audio data:
|
amine@377
|
36
|
amine@377
|
37 .. code:: python
|
amine@377
|
38
|
amine@377
|
39 sr = 16000
|
amine@377
|
40 sw = 2
|
amine@377
|
41 ch = 1
|
amine@377
|
42 data = b"\0" * sr * sw * ch
|
amine@379
|
43 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
|
amine@377
|
44 print(region)
|
amine@387
|
45 # alternatively you can use
|
amine@387
|
46 #region = auditok.AudioRegion(data, sr, sw, ch)
|
amine@377
|
47
|
amine@377
|
48 output:
|
amine@377
|
49
|
amine@377
|
50 .. code:: bash
|
amine@377
|
51
|
amine@377
|
52 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
53
|
amine@377
|
54 From the microphone
|
amine@377
|
55 ===================
|
amine@377
|
56
|
amine@379
|
57 If the first argument is `None`, :func:`load` will try to read data from the
|
amine@379
|
58 microphone. Audio parameters, as well as the `max_read` parameter are mandatory:
|
amine@377
|
59
|
amine@377
|
60
|
amine@377
|
61 .. code:: python
|
amine@377
|
62
|
amine@377
|
63 sr = 16000
|
amine@377
|
64 sw = 2
|
amine@377
|
65 ch = 1
|
amine@377
|
66 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
|
amine@377
|
67 print(five_sec_audio)
|
amine@377
|
68
|
amine@377
|
69 output:
|
amine@377
|
70
|
amine@377
|
71 .. code:: bash
|
amine@377
|
72
|
amine@377
|
73 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
74
|
amine@377
|
75
|
amine@377
|
76 Skip part of audio data
|
amine@377
|
77 =======================
|
amine@377
|
78
|
amine@387
|
79 If the `skip` parameter is > 0, :func:`load` will skip that amount in seconds
|
amine@387
|
80 of leading audio data:
|
amine@377
|
81
|
amine@377
|
82 .. code:: python
|
amine@377
|
83
|
amine@377
|
84 import auditok
|
amine@377
|
85 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
|
amine@377
|
86
|
amine@387
|
87 This argument must be 0 when reading data from the microphone.
|
amine@387
|
88
|
amine@387
|
89
|
amine@387
|
90 Limit the amount of read audio
|
amine@387
|
91 ==============================
|
amine@387
|
92
|
amine@387
|
93 If the `max_read` parameter is > 0, :func:`load` will read at most that amount
|
amine@387
|
94 in seconds of audio data:
|
amine@387
|
95
|
amine@387
|
96 .. code:: python
|
amine@387
|
97
|
amine@387
|
98 import auditok
|
amine@387
|
99 region = auditok.load("audio.ogg", max_read=5)
|
amine@387
|
100 assert region.duration <= 5
|
amine@387
|
101
|
amine@387
|
102 This argument is mandatory when reading data from the microphone.
|
amine@377
|
103
|
amine@377
|
104
|
amine@377
|
105 Basic split example
|
amine@377
|
106 -------------------
|
amine@377
|
107
|
amine@379
|
108 In the following we'll use the :func:`split` function to tokenize an audio file,
|
amine@379
|
109 requiring that valid audio events be at least 0.2 second long, at most 4 seconds
|
amine@379
|
110 long and contain a maximum of 0.3 second of continuous silence. Limiting the size
|
amine@379
|
111 of detected events to 4 seconds means that an event of, say, 9.5 seconds will
|
amine@379
|
112 be returned as two 4-second events plus a third 1.5-second event. Moreover, a
|
amine@379
|
113 valid event might contain many *silences* as far as none of them exceeds 0.3
|
amine@379
|
114 second.
|
amine@379
|
115
|
amine@379
|
116 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
|
amine@379
|
117 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
|
amine@379
|
118 with another region (see examples below). Notice that :class:`AudioRegion` objects
|
amine@379
|
119 returned by :func:`split` have a ``start`` a ``stop`` information stored in
|
amine@379
|
120 their meta data that can be accessed like `object.meta.start`.
|
amine@379
|
121
|
amine@377
|
122 .. code:: python
|
amine@377
|
123
|
amine@377
|
124 import auditok
|
amine@377
|
125
|
amine@377
|
126 # split returns a generator of AudioRegion objects
|
amine@377
|
127 audio_regions = auditok.split(
|
amine@377
|
128 "audio.wav",
|
amine@377
|
129 min_dur=0.2, # minimum duration of a valid audio event in seconds
|
amine@377
|
130 max_dur=4, # maximum duration of an event
|
amine@377
|
131 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
|
amine@377
|
132 energy_threshold=55 # threshold of detection
|
amine@377
|
133 )
|
amine@377
|
134
|
amine@377
|
135 for i, r in enumerate(audio_regions):
|
amine@377
|
136
|
amine@377
|
137 # Regions returned by `split` have 'start' and 'end' metadata fields
|
amine@377
|
138 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
|
amine@377
|
139
|
amine@377
|
140 # play detection
|
amine@377
|
141 # r.play(progress_bar=True)
|
amine@377
|
142
|
amine@377
|
143 # region's metadata can also be used with the `save` method
|
amine@377
|
144 # (no need to explicitly specify region's object and `format` arguments)
|
amine@377
|
145 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
|
amine@377
|
146 print("region saved as: {}".format(filename))
|
amine@377
|
147
|
amine@377
|
148 output example:
|
amine@377
|
149
|
amine@377
|
150 .. code:: bash
|
amine@377
|
151
|
amine@377
|
152 Region 0: 0.700s -- 1.400s
|
amine@377
|
153 region saved as: region_0.700-1.400.wav
|
amine@377
|
154 Region 1: 3.800s -- 4.500s
|
amine@377
|
155 region saved as: region_3.800-4.500.wav
|
amine@377
|
156 Region 2: 8.750s -- 9.950s
|
amine@377
|
157 region saved as: region_8.750-9.950.wav
|
amine@377
|
158 Region 3: 11.700s -- 12.400s
|
amine@377
|
159 region saved as: region_11.700-12.400.wav
|
amine@377
|
160 Region 4: 15.050s -- 15.850s
|
amine@377
|
161 region saved as: region_15.050-15.850.wav
|
amine@377
|
162
|
amine@377
|
163
|
amine@377
|
164 Split and plot
|
amine@377
|
165 --------------
|
amine@377
|
166
|
amine@377
|
167 Visualize audio signal and detections:
|
amine@377
|
168
|
amine@377
|
169 .. code:: python
|
amine@377
|
170
|
amine@377
|
171 import auditok
|
amine@377
|
172 region = auditok.load("audio.wav") # returns an AudioRegion object
|
amine@377
|
173 regions = region.split_and_plot(...) # or just region.splitp()
|
amine@369
|
174
|
amine@369
|
175 output figure:
|
amine@369
|
176
|
amine@369
|
177 .. image:: figures/example_1.png
|
amine@369
|
178
|
amine@377
|
179
|
amine@377
|
180 Read and split data from the microphone
|
amine@377
|
181 ---------------------------------------
|
amine@377
|
182
|
amine@379
|
183 If the first argument of :func:`split` is None, audio data is read from the
|
amine@379
|
184 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
|
amine@377
|
185
|
amine@377
|
186 .. code:: python
|
amine@377
|
187
|
amine@377
|
188 import auditok
|
amine@377
|
189
|
amine@377
|
190 sr = 16000
|
amine@377
|
191 sw = 2
|
amine@377
|
192 ch = 1
|
amine@377
|
193 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
194
|
amine@377
|
195 try:
|
amine@377
|
196 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
197 print(region)
|
amine@377
|
198 region.play(progress_bar=True) # progress bar requires `tqdm`
|
amine@377
|
199 except KeyboardInterrupt:
|
amine@377
|
200 pass
|
amine@377
|
201
|
amine@377
|
202
|
amine@379
|
203 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If
|
amine@379
|
204 you want to read a specific amount of audio data, pass the desired number of
|
amine@379
|
205 seconds with the `max_read` argument.
|
amine@377
|
206
|
amine@377
|
207
|
amine@387
|
208 Access recorded data after split
|
amine@387
|
209 --------------------------------
|
amine@377
|
210
|
amine@379
|
211 Using a :class:`Recorder` object you can get hold of acquired audio data:
|
amine@377
|
212
|
amine@377
|
213
|
amine@377
|
214 .. code:: python
|
amine@377
|
215
|
amine@377
|
216 import auditok
|
amine@377
|
217
|
amine@377
|
218 sr = 16000
|
amine@377
|
219 sw = 2
|
amine@377
|
220 ch = 1
|
amine@377
|
221 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
222
|
amine@377
|
223 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
|
amine@377
|
224
|
amine@377
|
225 try:
|
amine@377
|
226 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
227 print(region)
|
amine@377
|
228 region.play(progress_bar=True) # progress bar requires `tqdm`
|
amine@377
|
229 except KeyboardInterrupt:
|
amine@377
|
230 pass
|
amine@377
|
231
|
amine@377
|
232 rec.rewind()
|
amine@377
|
233 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
|
amine@379
|
234 # alternatively you can use
|
amine@379
|
235 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
|
amine@377
|
236
|
amine@377
|
237
|
amine@379
|
238 :class:`Recorder` also accepts a `max_read` argument.
|
amine@377
|
239
|
amine@369
|
240 Working with AudioRegions
|
amine@369
|
241 -------------------------
|
amine@369
|
242
|
amine@379
|
243 The following are a couple of interesting operations you can do with
|
amine@379
|
244 :class:`AudioRegion` objects.
|
amine@369
|
245
|
amine@377
|
246
|
amine@377
|
247 Basic region information
|
amine@377
|
248 ========================
|
amine@377
|
249
|
amine@377
|
250 .. code:: python
|
amine@377
|
251
|
amine@377
|
252 import auditok
|
amine@377
|
253 region = auditok.load("audio.wav")
|
amine@377
|
254 len(region) # number of audio samples int the regions, one channel considered
|
amine@377
|
255 region.duration # duration in seconds
|
amine@377
|
256 region.sampling_rate # alias `sr`
|
amine@377
|
257 region.sample_width # alias `sw`
|
amine@377
|
258 region.channels # alias `ch`
|
amine@377
|
259
|
amine@377
|
260
|
amine@369
|
261 Concatenate regions
|
amine@369
|
262 ===================
|
amine@369
|
263
|
amine@369
|
264 .. code:: python
|
amine@369
|
265
|
amine@377
|
266 import auditok
|
amine@377
|
267 region_1 = auditok.load("audio_1.wav")
|
amine@377
|
268 region_2 = auditok.load("audio_2.wav")
|
amine@369
|
269 region_3 = region_1 + region_2
|
amine@369
|
270
|
amine@379
|
271 Particularly useful if you want to join regions returned by :func:`split`:
|
amine@369
|
272
|
amine@369
|
273 .. code:: python
|
amine@369
|
274
|
amine@377
|
275 import auditok
|
amine@377
|
276 regions = auditok.load("audio.wav").split()
|
amine@369
|
277 gapless_region = sum(regions)
|
amine@369
|
278
|
amine@369
|
279 Repeat a region
|
amine@369
|
280 ===============
|
amine@369
|
281
|
amine@369
|
282 Multiply by a positive integer:
|
amine@369
|
283
|
amine@369
|
284 .. code:: python
|
amine@369
|
285
|
amine@377
|
286 import auditok
|
amine@377
|
287 region = auditok.load("audio.wav")
|
amine@369
|
288 region_x3 = region * 3
|
amine@369
|
289
|
amine@377
|
290 Split one region into N regions of equal size
|
amine@377
|
291 =============================================
|
amine@369
|
292
|
amine@379
|
293 Divide by a positive integer (this has nothing to do with silence-based
|
amine@379
|
294 tokenization):
|
amine@369
|
295
|
amine@369
|
296 .. code:: python
|
amine@369
|
297
|
amine@377
|
298 import auditok
|
amine@377
|
299 region = auditok.load("audio.wav")
|
amine@369
|
300 regions = regions / 5
|
amine@369
|
301 assert sum(regions) == region
|
amine@369
|
302
|
amine@379
|
303 Note that if no perfect division is possible, the last region might be a bit
|
amine@379
|
304 shorter than the previous N-1 regions.
|
amine@369
|
305
|
amine@377
|
306 Slice a region by samples, seconds or milliseconds
|
amine@377
|
307 ==================================================
|
amine@377
|
308
|
amine@379
|
309 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
|
amine@377
|
310 example remove a fixed-size portion of audio data from the beginning or from the
|
amine@377
|
311 end of a region or crop a region by an arbitrary amount as a data augmentation
|
amine@379
|
312 strategy.
|
amine@369
|
313
|
amine@377
|
314 The most accurate way to slice an `AudioRegion` is to use indices that
|
amine@369
|
315 directly refer to raw audio samples. In the following example, assuming that the
|
amine@369
|
316 sampling rate of audio data is 16000, you can extract a 5-second region from
|
amine@369
|
317 main region, starting from the 20th second as follows:
|
amine@369
|
318
|
amine@369
|
319 .. code:: python
|
amine@369
|
320
|
amine@377
|
321 import auditok
|
amine@377
|
322 region = auditok.load("audio.wav")
|
amine@369
|
323 start = 20 * 16000
|
amine@369
|
324 stop = 25 * 16000
|
amine@369
|
325 five_second_region = region[start:stop]
|
amine@369
|
326
|
amine@379
|
327 This allows you to practically start and stop at any audio sample within the region.
|
amine@369
|
328 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
|
amine@369
|
329 also use negative indices:
|
amine@369
|
330
|
amine@369
|
331 .. code:: python
|
amine@369
|
332
|
amine@377
|
333 import auditok
|
amine@377
|
334 region = auditok.load("audio.wav")
|
amine@369
|
335 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
|
amine@369
|
336 three_last_seconds = region[start:]
|
amine@369
|
337
|
amine@379
|
338 While slicing by raw samples is flexible, slicing with temporal indices is more
|
amine@379
|
339 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
|
amine@377
|
340 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
|
amine@369
|
341
|
amine@379
|
342 With the ``millis`` view:
|
amine@369
|
343
|
amine@369
|
344 .. code:: python
|
amine@369
|
345
|
amine@377
|
346 import auditok
|
amine@377
|
347 region = auditok.load("audio.wav")
|
amine@369
|
348 five_second_region = region.millis[5000:10000]
|
amine@369
|
349
|
amine@379
|
350 or with the ``seconds`` view:
|
amine@369
|
351
|
amine@369
|
352 .. code:: python
|
amine@369
|
353
|
amine@377
|
354 import auditok
|
amine@377
|
355 region = auditok.load("audio.wav")
|
amine@369
|
356 five_second_region = region.seconds[5:10]
|
amine@369
|
357
|
amine@379
|
358 ``seconds`` indices can also be floats:
|
amine@369
|
359
|
amine@369
|
360 .. code:: python
|
amine@369
|
361
|
amine@377
|
362 import auditok
|
amine@377
|
363 region = auditok.load("audio.wav")
|
amine@377
|
364 five_second_region = region.seconds[2.5:7.5]
|
amine@377
|
365
|
amine@377
|
366 Get arrays of audio samples
|
amine@377
|
367 ===========================
|
amine@377
|
368
|
amine@377
|
369 If `numpy` is not installed, the `samples` attributes is a list of audio samples
|
amine@377
|
370 arrays (standard `array.array` objects), one per channels. If numpy is installed,
|
amine@377
|
371 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
|
amine@377
|
372 and the second is the the sample.
|
amine@377
|
373
|
amine@377
|
374 .. code:: python
|
amine@377
|
375
|
amine@377
|
376 import auditok
|
amine@377
|
377 region = auditok.load("audio.wav")
|
amine@369
|
378 samples = region.samples
|
amine@379
|
379 assert len(samples) == region.channels
|
amine@369
|
380
|
amine@369
|
381
|
amine@387
|
382 If `numpy` is installed you can use:
|
amine@369
|
383
|
amine@369
|
384 .. code:: python
|
amine@369
|
385
|
amine@369
|
386 import numpy as np
|
amine@377
|
387 region = auditok.load("audio.wav")
|
amine@369
|
388 samples = np.asarray(region)
|
amine@377
|
389 assert len(samples.shape) == 2
|