amine@387
|
1 Load audio data
|
amine@387
|
2 ---------------
|
amine@377
|
3
|
amine@432
|
4 Audio data is loaded using the :func:`load` function, which can read from
|
amine@432
|
5 audio files, capture from the microphone, or accept raw audio data
|
amine@432
|
6 (as a ``bytes`` object).
|
amine@379
|
7
|
amine@377
|
8 From a file
|
amine@377
|
9 ===========
|
amine@377
|
10
|
amine@441
|
11 If the first argument of :func:`load` is a string or a ``Path``, it should
|
amine@432
|
12 refer to an existing audio file.
|
amine@369
|
13
|
amine@369
|
14 .. code:: python
|
amine@369
|
15
|
amine@377
|
16 import auditok
|
amine@377
|
17 region = auditok.load("audio.ogg")
|
amine@369
|
18
|
amine@432
|
19 If the input file contains raw (headerless) audio data, specifying audio
|
amine@432
|
20 parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required.
|
amine@432
|
21 Additionally, if the file name does not end with 'raw', you should explicitly
|
amine@441
|
22 pass ``audio_format="raw"`` to the function.
|
amine@432
|
23
|
amine@432
|
24 In the example below, we provide audio parameters using their abbreviated names:
|
amine@369
|
25
|
amine@369
|
26 .. code:: python
|
amine@369
|
27
|
amine@377
|
28 region = auditok.load("audio.dat",
|
amine@377
|
29 audio_format="raw",
|
amine@379
|
30 sr=44100, # alias for `sampling_rate`
|
amine@432
|
31 sw=2, # alias for `sample_width`
|
amine@379
|
32 ch=1 # alias for `channels`
|
amine@379
|
33 )
|
amine@377
|
34
|
amine@432
|
35 Alternatively you can user :class:`AudioRegion` to load audio data:
|
amine@432
|
36
|
amine@432
|
37 .. code:: python
|
amine@432
|
38
|
amine@432
|
39 from auditok import AudioRegion
|
amine@432
|
40 region = AudioRegion.load("audio.dat",
|
amine@432
|
41 audio_format="raw",
|
amine@432
|
42 sr=44100, # alias for `sampling_rate`
|
amine@441
|
43 sw=2, # alias for `sample_width`
|
amine@432
|
44 ch=1 # alias for `channels`
|
amine@432
|
45 )
|
amine@432
|
46
|
amine@432
|
47
|
amine@441
|
48 From a ``bytes`` object
|
amine@441
|
49 =======================
|
amine@377
|
50
|
amine@441
|
51 If the first argument is of type ``bytes``, it is interpreted as raw audio data:
|
amine@377
|
52
|
amine@377
|
53 .. code:: python
|
amine@377
|
54
|
amine@377
|
55 sr = 16000
|
amine@377
|
56 sw = 2
|
amine@377
|
57 ch = 1
|
amine@377
|
58 data = b"\0" * sr * sw * ch
|
amine@379
|
59 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
|
amine@377
|
60 print(region)
|
amine@387
|
61 # alternatively you can use
|
amine@432
|
62 region = auditok.AudioRegion(data, sr, sw, ch)
|
amine@377
|
63
|
amine@377
|
64 output:
|
amine@377
|
65
|
amine@377
|
66 .. code:: bash
|
amine@377
|
67
|
amine@377
|
68 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
69
|
amine@377
|
70 From the microphone
|
amine@377
|
71 ===================
|
amine@377
|
72
|
amine@441
|
73 If the first argument is ``None``, :func:`load` will attempt to read data from the
|
amine@441
|
74 microphone. In this case, audio parameters, along with the ``max_read`` parameter,
|
amine@432
|
75 are required.
|
amine@377
|
76
|
amine@377
|
77 .. code:: python
|
amine@377
|
78
|
amine@377
|
79 sr = 16000
|
amine@377
|
80 sw = 2
|
amine@377
|
81 ch = 1
|
amine@377
|
82 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
|
amine@377
|
83 print(five_sec_audio)
|
amine@377
|
84
|
amine@377
|
85 output:
|
amine@377
|
86
|
amine@377
|
87 .. code:: bash
|
amine@377
|
88
|
amine@377
|
89 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
|
amine@377
|
90
|
amine@377
|
91
|
amine@377
|
92 Skip part of audio data
|
amine@377
|
93 =======================
|
amine@377
|
94
|
amine@432
|
95 If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified
|
amine@432
|
96 amount of leading audio data, measured in seconds:
|
amine@377
|
97
|
amine@377
|
98 .. code:: python
|
amine@377
|
99
|
amine@377
|
100 import auditok
|
amine@377
|
101 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
|
amine@377
|
102
|
amine@387
|
103 This argument must be 0 when reading data from the microphone.
|
amine@387
|
104
|
amine@387
|
105
|
amine@387
|
106 Limit the amount of read audio
|
amine@387
|
107 ==============================
|
amine@387
|
108
|
amine@432
|
109 If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount
|
amine@387
|
110 in seconds of audio data:
|
amine@387
|
111
|
amine@387
|
112 .. code:: python
|
amine@387
|
113
|
amine@387
|
114 import auditok
|
amine@387
|
115 region = auditok.load("audio.ogg", max_read=5)
|
amine@387
|
116 assert region.duration <= 5
|
amine@387
|
117
|
amine@432
|
118 This argument is required when reading data from the microphone.
|
amine@377
|
119
|
amine@377
|
120
|
amine@377
|
121 Basic split example
|
amine@377
|
122 -------------------
|
amine@377
|
123
|
amine@432
|
124 In the following example, we'll use the :func:`split` function to tokenize an
|
amine@432
|
125 audio file.We’ll specify that valid audio events must be at least 0.2 seconds
|
amine@432
|
126 long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
|
amine@432
|
127 silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
|
amine@432
|
128 will be returned as two 4-second events plus a final 1.5-second event. Additionally,
|
amine@432
|
129 a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
|
amine@379
|
130
|
amine@432
|
131 :func:`split` returns a generator of :class:`AudioRegion` objects. Each
|
amine@432
|
132 :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer),
|
amine@432
|
133 and concatenated with another region (see examples below). Note that
|
amine@441
|
134 :class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop``
|
amine@432
|
135 attributes, which mark the beginning and end of the audio event relative to the
|
amine@432
|
136 input audio stream.
|
amine@379
|
137
|
amine@377
|
138 .. code:: python
|
amine@377
|
139
|
amine@377
|
140 import auditok
|
amine@377
|
141
|
amine@432
|
142 # `split` returns a generator of AudioRegion objects
|
amine@432
|
143 audio_events = auditok.split(
|
amine@377
|
144 "audio.wav",
|
amine@432
|
145 min_dur=0.2, # Minimum duration of a valid audio event in seconds
|
amine@432
|
146 max_dur=4, # Maximum duration of an event
|
amine@432
|
147 max_silence=0.3, # Maximum tolerated silence duration within an event
|
amine@432
|
148 energy_threshold=55 # Detection threshold
|
amine@377
|
149 )
|
amine@377
|
150
|
amine@432
|
151 for i, r in enumerate(audio_events):
|
amine@432
|
152 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
|
amine@432
|
153 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
|
amine@377
|
154
|
amine@432
|
155 # Play the audio event
|
amine@432
|
156 r.play(progress_bar=True)
|
amine@377
|
157
|
amine@432
|
158 # Save the event with start and end times in the filename
|
amine@432
|
159 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
|
amine@441
|
160 print(f"event saved as: {filename}")
|
amine@377
|
161
|
amine@432
|
162 Example output:
|
amine@377
|
163
|
amine@377
|
164 .. code:: bash
|
amine@377
|
165
|
amine@432
|
166 Event 0: 0.700s -- 1.400s
|
amine@441
|
167 event saved as: event_0.700-1.400.wav
|
amine@432
|
168 Event 1: 3.800s -- 4.500s
|
amine@441
|
169 event saved as: event_3.800-4.500.wav
|
amine@432
|
170 Event 2: 8.750s -- 9.950s
|
amine@441
|
171 event saved as: event_8.750-9.950.wav
|
amine@432
|
172 Event 3: 11.700s -- 12.400s
|
amine@441
|
173 event saved as: event_11.700-12.400.wav
|
amine@432
|
174 Event 4: 15.050s -- 15.850s
|
amine@441
|
175 event saved as: event_15.050-15.850.wav
|
amine@377
|
176
|
amine@377
|
177 Split and plot
|
amine@377
|
178 --------------
|
amine@377
|
179
|
amine@377
|
180 Visualize audio signal and detections:
|
amine@377
|
181
|
amine@377
|
182 .. code:: python
|
amine@377
|
183
|
amine@377
|
184 import auditok
|
amine@377
|
185 region = auditok.load("audio.wav") # returns an AudioRegion object
|
amine@377
|
186 regions = region.split_and_plot(...) # or just region.splitp()
|
amine@369
|
187
|
amine@369
|
188 output figure:
|
amine@369
|
189
|
amine@369
|
190 .. image:: figures/example_1.png
|
amine@369
|
191
|
amine@432
|
192 Split an audio stream and re-join (glue) audio events with silence
|
amine@432
|
193 ------------------------------------------------------------------
|
amine@432
|
194
|
amine@432
|
195 The following code detects audio events within an audio stream, then insert
|
amine@432
|
196 1 second of silence between them to create an audio with pauses:
|
amine@432
|
197
|
amine@432
|
198 .. code:: python
|
amine@432
|
199
|
amine@432
|
200 # Create a 1-second silent audio region
|
amine@432
|
201 # Audio parameters must match the original stream
|
amine@432
|
202 from auditok import split, make_silence
|
amine@432
|
203 silence = make_silence(duration=1,
|
amine@432
|
204 sampling_rate=16000,
|
amine@432
|
205 sample_width=2,
|
amine@432
|
206 channels=1)
|
amine@432
|
207 events = split("audio.wav")
|
amine@432
|
208 audio_with_pauses = silence.join(events)
|
amine@432
|
209
|
amine@432
|
210 Alternatively, use ``split_and_join_with_silence``:
|
amine@432
|
211
|
amine@432
|
212 .. code:: python
|
amine@432
|
213
|
amine@432
|
214 from auditok import split_and_join_with_silence
|
amine@432
|
215 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
|
amine@432
|
216
|
amine@377
|
217
|
amine@441
|
218 Read audio data from the microphone and perform real-time event detection
|
amine@441
|
219 -------------------------------------------------------------------------
|
amine@377
|
220
|
amine@432
|
221 If the first argument of :func:`split` is ``None``, audio data is read from the
|
amine@379
|
222 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
|
amine@377
|
223
|
amine@377
|
224 .. code:: python
|
amine@377
|
225
|
amine@377
|
226 import auditok
|
amine@377
|
227
|
amine@377
|
228 sr = 16000
|
amine@377
|
229 sw = 2
|
amine@377
|
230 ch = 1
|
amine@377
|
231 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
232
|
amine@377
|
233 try:
|
amine@377
|
234 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
235 print(region)
|
amine@377
|
236 region.play(progress_bar=True) # progress bar requires `tqdm`
|
amine@377
|
237 except KeyboardInterrupt:
|
amine@377
|
238 pass
|
amine@377
|
239
|
amine@377
|
240
|
amine@432
|
241 :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read
|
amine@432
|
242 a specific amount of audio data, pass the desired number of seconds using the
|
amine@441
|
243 ``max_read`` argument.
|
amine@377
|
244
|
amine@377
|
245
|
amine@387
|
246 Access recorded data after split
|
amine@387
|
247 --------------------------------
|
amine@377
|
248
|
amine@432
|
249 Using a :class:`Recorder` object you can access to audio data read from a file
|
amine@432
|
250 of from the mirophone. With the following code press ``Ctrl-C`` to stop recording:
|
amine@377
|
251
|
amine@377
|
252
|
amine@377
|
253 .. code:: python
|
amine@377
|
254
|
amine@377
|
255 import auditok
|
amine@377
|
256
|
amine@377
|
257 sr = 16000
|
amine@377
|
258 sw = 2
|
amine@377
|
259 ch = 1
|
amine@377
|
260 eth = 55 # alias for energy_threshold, default value is 50
|
amine@377
|
261
|
amine@377
|
262 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
|
amine@432
|
263 events = []
|
amine@377
|
264
|
amine@377
|
265 try:
|
amine@377
|
266 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
|
amine@377
|
267 print(region)
|
amine@432
|
268 region.play(progress_bar=True)
|
amine@432
|
269 events.append(region)
|
amine@377
|
270 except KeyboardInterrupt:
|
amine@377
|
271 pass
|
amine@377
|
272
|
amine@377
|
273 rec.rewind()
|
amine@454
|
274 full_audio = auditok.load(rec.data, sr=sr, sw=sw, ch=ch)
|
amine@379
|
275 # alternatively you can use
|
amine@379
|
276 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
|
amine@432
|
277 full_audio.play(progress_bar=True)
|
amine@377
|
278
|
amine@377
|
279
|
amine@441
|
280 :class:`Recorder` also accepts a ``max_read`` argument.
|
amine@377
|
281
|
amine@369
|
282 Working with AudioRegions
|
amine@369
|
283 -------------------------
|
amine@369
|
284
|
amine@432
|
285 In the following sections, we will review several operations
|
amine@441
|
286 that can be performed with :class:`AudioRegion` objects.
|
amine@377
|
287
|
amine@377
|
288 Basic region information
|
amine@377
|
289 ========================
|
amine@377
|
290
|
amine@377
|
291 .. code:: python
|
amine@377
|
292
|
amine@377
|
293 import auditok
|
amine@377
|
294 region = auditok.load("audio.wav")
|
amine@377
|
295 len(region) # number of audio samples int the regions, one channel considered
|
amine@377
|
296 region.duration # duration in seconds
|
amine@377
|
297 region.sampling_rate # alias `sr`
|
amine@377
|
298 region.sample_width # alias `sw`
|
amine@377
|
299 region.channels # alias `ch`
|
amine@377
|
300
|
amine@432
|
301 When an audio region is returned by the :func:`split` function, it includes defined
|
amine@432
|
302 ``start`` and ``end`` attributes that refer to the beginning and end of the audio
|
amine@432
|
303 event relative to the input audio stream.
|
amine@377
|
304
|
amine@369
|
305 Concatenate regions
|
amine@369
|
306 ===================
|
amine@369
|
307
|
amine@369
|
308 .. code:: python
|
amine@369
|
309
|
amine@377
|
310 import auditok
|
amine@377
|
311 region_1 = auditok.load("audio_1.wav")
|
amine@377
|
312 region_2 = auditok.load("audio_2.wav")
|
amine@369
|
313 region_3 = region_1 + region_2
|
amine@369
|
314
|
amine@432
|
315 This is particularly useful when you want to join regions returned by the
|
amine@432
|
316 :func:`split` function:
|
amine@369
|
317
|
amine@369
|
318 .. code:: python
|
amine@369
|
319
|
amine@377
|
320 import auditok
|
amine@377
|
321 regions = auditok.load("audio.wav").split()
|
amine@369
|
322 gapless_region = sum(regions)
|
amine@369
|
323
|
amine@369
|
324 Repeat a region
|
amine@369
|
325 ===============
|
amine@369
|
326
|
amine@369
|
327 Multiply by a positive integer:
|
amine@369
|
328
|
amine@369
|
329 .. code:: python
|
amine@369
|
330
|
amine@377
|
331 import auditok
|
amine@377
|
332 region = auditok.load("audio.wav")
|
amine@369
|
333 region_x3 = region * 3
|
amine@369
|
334
|
amine@377
|
335 Split one region into N regions of equal size
|
amine@377
|
336 =============================================
|
amine@369
|
337
|
amine@432
|
338 Divide by a positive integer (this is unrelated to silence-based tokenization!):
|
amine@369
|
339
|
amine@369
|
340 .. code:: python
|
amine@369
|
341
|
amine@377
|
342 import auditok
|
amine@377
|
343 region = auditok.load("audio.wav")
|
amine@369
|
344 regions = regions / 5
|
amine@369
|
345 assert sum(regions) == region
|
amine@369
|
346
|
amine@432
|
347 Note that if an exact split is not possible, the last region may be shorter
|
amine@432
|
348 than the preceding N-1 regions.
|
amine@369
|
349
|
amine@377
|
350 Slice a region by samples, seconds or milliseconds
|
amine@377
|
351 ==================================================
|
amine@377
|
352
|
amine@432
|
353 Slicing an :class:`AudioRegion` can be useful in various situations.
|
amine@432
|
354 For example, you can remove a fixed-length portion of audio data from
|
amine@432
|
355 the beginning or end of a region, or crop a region by an arbitrary amount
|
amine@432
|
356 as a data augmentation strategy.
|
amine@369
|
357
|
amine@441
|
358 The most accurate way to slice an :class:`AudioRegion` is by using indices
|
amine@441
|
359 that directly refer to raw audio samples. In the following example, assuming
|
amine@432
|
360 the audio data has a sampling rate of 16000, you can extract a 5-second
|
amine@432
|
361 segment from the main region, starting at the 20th second, as follows:
|
amine@369
|
362
|
amine@369
|
363 .. code:: python
|
amine@369
|
364
|
amine@377
|
365 import auditok
|
amine@377
|
366 region = auditok.load("audio.wav")
|
amine@369
|
367 start = 20 * 16000
|
amine@369
|
368 stop = 25 * 16000
|
amine@369
|
369 five_second_region = region[start:stop]
|
amine@369
|
370
|
amine@432
|
371 This allows you to start and stop at any audio sample within the region. Similar
|
amine@432
|
372 to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative
|
amine@432
|
373 indices are also supported:
|
amine@369
|
374
|
amine@369
|
375 .. code:: python
|
amine@369
|
376
|
amine@377
|
377 import auditok
|
amine@377
|
378 region = auditok.load("audio.wav")
|
amine@369
|
379 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
|
amine@369
|
380 three_last_seconds = region[start:]
|
amine@369
|
381
|
amine@432
|
382 While slicing by raw samples offers flexibility, using temporal indices is
|
amine@432
|
383 often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds``
|
amine@432
|
384 *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``).
|
amine@369
|
385
|
amine@379
|
386 With the ``millis`` view:
|
amine@369
|
387
|
amine@369
|
388 .. code:: python
|
amine@369
|
389
|
amine@377
|
390 import auditok
|
amine@377
|
391 region = auditok.load("audio.wav")
|
amine@369
|
392 five_second_region = region.millis[5000:10000]
|
amine@432
|
393 # or
|
amine@432
|
394 five_second_region = region.ms[5000:10000]
|
amine@369
|
395
|
amine@379
|
396 or with the ``seconds`` view:
|
amine@369
|
397
|
amine@369
|
398 .. code:: python
|
amine@369
|
399
|
amine@377
|
400 import auditok
|
amine@377
|
401 region = auditok.load("audio.wav")
|
amine@369
|
402 five_second_region = region.seconds[5:10]
|
amine@432
|
403 # or
|
amine@432
|
404 five_second_region = region.sec[5:10]
|
amine@432
|
405 # or
|
amine@432
|
406 five_second_region = region.s[5:10]
|
amine@369
|
407
|
amine@379
|
408 ``seconds`` indices can also be floats:
|
amine@369
|
409
|
amine@369
|
410 .. code:: python
|
amine@369
|
411
|
amine@377
|
412 import auditok
|
amine@377
|
413 region = auditok.load("audio.wav")
|
amine@377
|
414 five_second_region = region.seconds[2.5:7.5]
|
amine@377
|
415
|
amine@432
|
416 Export an ``AudioRegion`` as a ``numpy`` array
|
amine@432
|
417 ==============================================
|
amine@377
|
418
|
amine@377
|
419 .. code:: python
|
amine@377
|
420
|
amine@432
|
421 from auditok import load, AudioRegion
|
amine@432
|
422 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
|
amine@432
|
423 x = audio.numpy()
|
amine@432
|
424 assert x.shape[0] == audio.channels
|
amine@432
|
425 assert x.shape[1] == len(audio)
|