comparison doc/examples.rst @ 432:81bc2375354f

Update documentation
author Amine Sehili <amine.sehili@gmail.com>
date Wed, 30 Oct 2024 17:21:30 +0100
parents bd242e80455f
children 6cf3ea23fadb
comparison
equal deleted inserted replaced
431:69160c7aefff 432:81bc2375354f
1 Load audio data 1 Load audio data
2 --------------- 2 ---------------
3 3
4 Audio data is loaded with the :func:`load` function which can read from audio 4 Audio data is loaded using the :func:`load` function, which can read from
5 files, the microphone or use raw audio data. 5 audio files, capture from the microphone, or accept raw audio data
6 (as a ``bytes`` object).
6 7
7 From a file 8 From a file
8 =========== 9 ===========
9 10
10 If the first argument of :func:`load` is a string, it should be a path to an 11 If the first argument of :func:`load` is a string or a `Path`, it should
11 audio file. 12 refer to an existing audio file.
12 13
13 .. code:: python 14 .. code:: python
14 15
15 import auditok 16 import auditok
16 region = auditok.load("audio.ogg") 17 region = auditok.load("audio.ogg")
17 18
18 If input file contains raw (headerless) audio data, passing `audio_format="raw"` 19 If the input file contains raw (headerless) audio data, specifying audio
19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is 20 parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required.
20 mandatory. In the following example we pass audio parameters with their short 21 Additionally, if the file name does not end with 'raw', you should explicitly
21 names: 22 pass `audio_format="raw"` to the function.
23
24 In the example below, we provide audio parameters using their abbreviated names:
22 25
23 .. code:: python 26 .. code:: python
24 27
25 region = auditok.load("audio.dat", 28 region = auditok.load("audio.dat",
26 audio_format="raw", 29 audio_format="raw",
27 sr=44100, # alias for `sampling_rate` 30 sr=44100, # alias for `sampling_rate`
28 sw=2 # alias for `sample_width` 31 sw=2, # alias for `sample_width`
29 ch=1 # alias for `channels` 32 ch=1 # alias for `channels`
30 ) 33 )
31 34
35 Alternatively you can user :class:`AudioRegion` to load audio data:
36
37 .. code:: python
38
39 from auditok import AudioRegion
40 region = AudioRegion.load("audio.dat",
41 audio_format="raw",
42 sr=44100, # alias for `sampling_rate`
43 sw=2, # alias for `sample_width`
44 ch=1 # alias for `channels`
45 )
46
47
32 From a `bytes` object 48 From a `bytes` object
33 ===================== 49 =====================
34 50
35 If the type of the first argument `bytes`, it's interpreted as raw audio data: 51 If the first argument is of type `bytes`, it is interpreted as raw audio data:
36 52
37 .. code:: python 53 .. code:: python
38 54
39 sr = 16000 55 sr = 16000
40 sw = 2 56 sw = 2
41 ch = 1 57 ch = 1
42 data = b"\0" * sr * sw * ch 58 data = b"\0" * sr * sw * ch
43 region = auditok.load(data, sr=sr, sw=sw, ch=ch) 59 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
44 print(region) 60 print(region)
45 # alternatively you can use 61 # alternatively you can use
46 #region = auditok.AudioRegion(data, sr, sw, ch) 62 region = auditok.AudioRegion(data, sr, sw, ch)
47 63
48 output: 64 output:
49 65
50 .. code:: bash 66 .. code:: bash
51 67
52 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) 68 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
53 69
54 From the microphone 70 From the microphone
55 =================== 71 ===================
56 72
57 If the first argument is `None`, :func:`load` will try to read data from the 73 If the first argument is `None`, :func:`load` will attempt to read data from the
58 microphone. Audio parameters, as well as the `max_read` parameter are mandatory: 74 microphone. In this case, audio parameters, along with the `max_read` parameter,
59 75 are required.
60 76
61 .. code:: python 77 .. code:: python
62 78
63 sr = 16000 79 sr = 16000
64 sw = 2 80 sw = 2
74 90
75 91
76 Skip part of audio data 92 Skip part of audio data
77 ======================= 93 =======================
78 94
79 If the `skip` parameter is > 0, :func:`load` will skip that amount in seconds 95 If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified
80 of leading audio data: 96 amount of leading audio data, measured in seconds:
81 97
82 .. code:: python 98 .. code:: python
83 99
84 import auditok 100 import auditok
85 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds 101 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
88 104
89 105
90 Limit the amount of read audio 106 Limit the amount of read audio
91 ============================== 107 ==============================
92 108
93 If the `max_read` parameter is > 0, :func:`load` will read at most that amount 109 If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount
94 in seconds of audio data: 110 in seconds of audio data:
95 111
96 .. code:: python 112 .. code:: python
97 113
98 import auditok 114 import auditok
99 region = auditok.load("audio.ogg", max_read=5) 115 region = auditok.load("audio.ogg", max_read=5)
100 assert region.duration <= 5 116 assert region.duration <= 5
101 117
102 This argument is mandatory when reading data from the microphone. 118 This argument is required when reading data from the microphone.
103 119
104 120
105 Basic split example 121 Basic split example
106 ------------------- 122 -------------------
107 123
108 In the following we'll use the :func:`split` function to tokenize an audio file, 124 In the following example, we'll use the :func:`split` function to tokenize an
109 requiring that valid audio events be at least 0.2 second long, at most 4 seconds 125 audio file.We’ll specify that valid audio events must be at least 0.2 seconds
110 long and contain a maximum of 0.3 second of continuous silence. Limiting the size 126 long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
111 of detected events to 4 seconds means that an event of, say, 9.5 seconds will 127 silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
112 be returned as two 4-second events plus a third 1.5-second event. Moreover, a 128 will be returned as two 4-second events plus a final 1.5-second event. Additionally,
113 valid event might contain many *silences* as far as none of them exceeds 0.3 129 a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
114 second. 130
115 131 :func:`split` returns a generator of :class:`AudioRegion` objects. Each
116 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion` 132 :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer),
117 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated 133 and concatenated with another region (see examples below). Note that
118 with another region (see examples below). Notice that :class:`AudioRegion` objects 134 :class:`AudioRegion` objects returned by :func:`split` include `start` and `stop`
119 returned by :func:`split` have a ``start`` a ``stop`` information stored in 135 attributes, which mark the beginning and end of the audio event relative to the
120 their meta data that can be accessed like `object.meta.start`. 136 input audio stream.
121 137
122 .. code:: python 138 .. code:: python
123 139
124 import auditok 140 import auditok
125 141
126 # split returns a generator of AudioRegion objects 142 # `split` returns a generator of AudioRegion objects
127 audio_regions = auditok.split( 143 audio_events = auditok.split(
128 "audio.wav", 144 "audio.wav",
129 min_dur=0.2, # minimum duration of a valid audio event in seconds 145 min_dur=0.2, # Minimum duration of a valid audio event in seconds
130 max_dur=4, # maximum duration of an event 146 max_dur=4, # Maximum duration of an event
131 max_silence=0.3, # maximum duration of tolerated continuous silence within an event 147 max_silence=0.3, # Maximum tolerated silence duration within an event
132 energy_threshold=55 # threshold of detection 148 energy_threshold=55 # Detection threshold
133 ) 149 )
134 150
135 for i, r in enumerate(audio_regions): 151 for i, r in enumerate(audio_events):
136 152 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
137 # Regions returned by `split` have 'start' and 'end' metadata fields 153 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
138 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) 154
139 155 # Play the audio event
140 # play detection 156 r.play(progress_bar=True)
141 # r.play(progress_bar=True) 157
142 158 # Save the event with start and end times in the filename
143 # region's metadata can also be used with the `save` method 159 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
144 # (no need to explicitly specify region's object and `format` arguments) 160 print(f"Event saved as: {filename}")
145 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") 161
146 print("region saved as: {}".format(filename)) 162 Example output:
147
148 output example:
149 163
150 .. code:: bash 164 .. code:: bash
151 165
152 Region 0: 0.700s -- 1.400s 166 Event 0: 0.700s -- 1.400s
153 region saved as: region_0.700-1.400.wav 167 Event saved as: event_0.700-1.400.wav
154 Region 1: 3.800s -- 4.500s 168 Event 1: 3.800s -- 4.500s
155 region saved as: region_3.800-4.500.wav 169 Event saved as: event_3.800-4.500.wav
156 Region 2: 8.750s -- 9.950s 170 Event 2: 8.750s -- 9.950s
157 region saved as: region_8.750-9.950.wav 171 Event saved as: event_8.750-9.950.wav
158 Region 3: 11.700s -- 12.400s 172 Event 3: 11.700s -- 12.400s
159 region saved as: region_11.700-12.400.wav 173 Event saved as: event_11.700-12.400.wav
160 Region 4: 15.050s -- 15.850s 174 Event 4: 15.050s -- 15.850s
161 region saved as: region_15.050-15.850.wav 175 Event saved as: event_15.050-15.850.wav
162
163 176
164 Split and plot 177 Split and plot
165 -------------- 178 --------------
166 179
167 Visualize audio signal and detections: 180 Visualize audio signal and detections:
174 187
175 output figure: 188 output figure:
176 189
177 .. image:: figures/example_1.png 190 .. image:: figures/example_1.png
178 191
192 Split an audio stream and re-join (glue) audio events with silence
193 ------------------------------------------------------------------
194
195 The following code detects audio events within an audio stream, then insert
196 1 second of silence between them to create an audio with pauses:
197
198 .. code:: python
199
200 # Create a 1-second silent audio region
201 # Audio parameters must match the original stream
202 from auditok import split, make_silence
203 silence = make_silence(duration=1,
204 sampling_rate=16000,
205 sample_width=2,
206 channels=1)
207 events = split("audio.wav")
208 audio_with_pauses = silence.join(events)
209
210 Alternatively, use ``split_and_join_with_silence``:
211
212 .. code:: python
213
214 from auditok import split_and_join_with_silence
215 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
216
179 217
180 Read and split data from the microphone 218 Read and split data from the microphone
181 --------------------------------------- 219 ---------------------------------------
182 220
183 If the first argument of :func:`split` is None, audio data is read from the 221 If the first argument of :func:`split` is ``None``, audio data is read from the
184 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): 222 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
185 223
186 .. code:: python 224 .. code:: python
187 225
188 import auditok 226 import auditok
198 region.play(progress_bar=True) # progress bar requires `tqdm` 236 region.play(progress_bar=True) # progress bar requires `tqdm`
199 except KeyboardInterrupt: 237 except KeyboardInterrupt:
200 pass 238 pass
201 239
202 240
203 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If 241 :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read
204 you want to read a specific amount of audio data, pass the desired number of 242 a specific amount of audio data, pass the desired number of seconds using the
205 seconds with the `max_read` argument. 243 `max_read` argument.
206 244
207 245
208 Access recorded data after split 246 Access recorded data after split
209 -------------------------------- 247 --------------------------------
210 248
211 Using a :class:`Recorder` object you can get hold of acquired audio data: 249 Using a :class:`Recorder` object you can access to audio data read from a file
250 of from the mirophone. With the following code press ``Ctrl-C`` to stop recording:
212 251
213 252
214 .. code:: python 253 .. code:: python
215 254
216 import auditok 255 import auditok
219 sw = 2 258 sw = 2
220 ch = 1 259 ch = 1
221 eth = 55 # alias for energy_threshold, default value is 50 260 eth = 55 # alias for energy_threshold, default value is 50
222 261
223 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) 262 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
263 events = []
224 264
225 try: 265 try:
226 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): 266 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
227 print(region) 267 print(region)
228 region.play(progress_bar=True) # progress bar requires `tqdm` 268 region.play(progress_bar=True)
269 events.append(region)
229 except KeyboardInterrupt: 270 except KeyboardInterrupt:
230 pass 271 pass
231 272
232 rec.rewind() 273 rec.rewind()
233 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) 274 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
234 # alternatively you can use 275 # alternatively you can use
235 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch) 276 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
277 full_audio.play(progress_bar=True)
236 278
237 279
238 :class:`Recorder` also accepts a `max_read` argument. 280 :class:`Recorder` also accepts a `max_read` argument.
239 281
240 Working with AudioRegions 282 Working with AudioRegions
241 ------------------------- 283 -------------------------
242 284
243 The following are a couple of interesting operations you can do with 285 In the following sections, we will review several operations
244 :class:`AudioRegion` objects. 286 that can be performed with :class:AudioRegion objects.
245
246 287
247 Basic region information 288 Basic region information
248 ======================== 289 ========================
249 290
250 .. code:: python 291 .. code:: python
255 region.duration # duration in seconds 296 region.duration # duration in seconds
256 region.sampling_rate # alias `sr` 297 region.sampling_rate # alias `sr`
257 region.sample_width # alias `sw` 298 region.sample_width # alias `sw`
258 region.channels # alias `ch` 299 region.channels # alias `ch`
259 300
301 When an audio region is returned by the :func:`split` function, it includes defined
302 ``start`` and ``end`` attributes that refer to the beginning and end of the audio
303 event relative to the input audio stream.
260 304
261 Concatenate regions 305 Concatenate regions
262 =================== 306 ===================
263 307
264 .. code:: python 308 .. code:: python
266 import auditok 310 import auditok
267 region_1 = auditok.load("audio_1.wav") 311 region_1 = auditok.load("audio_1.wav")
268 region_2 = auditok.load("audio_2.wav") 312 region_2 = auditok.load("audio_2.wav")
269 region_3 = region_1 + region_2 313 region_3 = region_1 + region_2
270 314
271 Particularly useful if you want to join regions returned by :func:`split`: 315 This is particularly useful when you want to join regions returned by the
316 :func:`split` function:
272 317
273 .. code:: python 318 .. code:: python
274 319
275 import auditok 320 import auditok
276 regions = auditok.load("audio.wav").split() 321 regions = auditok.load("audio.wav").split()
288 region_x3 = region * 3 333 region_x3 = region * 3
289 334
290 Split one region into N regions of equal size 335 Split one region into N regions of equal size
291 ============================================= 336 =============================================
292 337
293 Divide by a positive integer (this has nothing to do with silence-based 338 Divide by a positive integer (this is unrelated to silence-based tokenization!):
294 tokenization):
295 339
296 .. code:: python 340 .. code:: python
297 341
298 import auditok 342 import auditok
299 region = auditok.load("audio.wav") 343 region = auditok.load("audio.wav")
300 regions = regions / 5 344 regions = regions / 5
301 assert sum(regions) == region 345 assert sum(regions) == region
302 346
303 Note that if no perfect division is possible, the last region might be a bit 347 Note that if an exact split is not possible, the last region may be shorter
304 shorter than the previous N-1 regions. 348 than the preceding N-1 regions.
305 349
306 Slice a region by samples, seconds or milliseconds 350 Slice a region by samples, seconds or milliseconds
307 ================================================== 351 ==================================================
308 352
309 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for 353 Slicing an :class:`AudioRegion` can be useful in various situations.
310 example remove a fixed-size portion of audio data from the beginning or from the 354 For example, you can remove a fixed-length portion of audio data from
311 end of a region or crop a region by an arbitrary amount as a data augmentation 355 the beginning or end of a region, or crop a region by an arbitrary amount
312 strategy. 356 as a data augmentation strategy.
313 357
314 The most accurate way to slice an `AudioRegion` is to use indices that 358 The most accurate way to slice an `AudioRegion` is by using indices that
315 directly refer to raw audio samples. In the following example, assuming that the 359 directly refer to raw audio samples. In the following example, assuming
316 sampling rate of audio data is 16000, you can extract a 5-second region from 360 the audio data has a sampling rate of 16000, you can extract a 5-second
317 main region, starting from the 20th second as follows: 361 segment from the main region, starting at the 20th second, as follows:
318 362
319 .. code:: python 363 .. code:: python
320 364
321 import auditok 365 import auditok
322 region = auditok.load("audio.wav") 366 region = auditok.load("audio.wav")
323 start = 20 * 16000 367 start = 20 * 16000
324 stop = 25 * 16000 368 stop = 25 * 16000
325 five_second_region = region[start:stop] 369 five_second_region = region[start:stop]
326 370
327 This allows you to practically start and stop at any audio sample within the region. 371 This allows you to start and stop at any audio sample within the region. Similar
328 Just as with a `list` you can omit one of `start` and `stop`, or both. You can 372 to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative
329 also use negative indices: 373 indices are also supported:
330 374
331 .. code:: python 375 .. code:: python
332 376
333 import auditok 377 import auditok
334 region = auditok.load("audio.wav") 378 region = auditok.load("audio.wav")
335 start = -3 * region.sr # `sr` is an alias of `sampling_rate` 379 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
336 three_last_seconds = region[start:] 380 three_last_seconds = region[start:]
337 381
338 While slicing by raw samples is flexible, slicing with temporal indices is more 382 While slicing by raw samples offers flexibility, using temporal indices is
339 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an 383 often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds``
340 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). 384 *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``).
341 385
342 With the ``millis`` view: 386 With the ``millis`` view:
343 387
344 .. code:: python 388 .. code:: python
345 389
346 import auditok 390 import auditok
347 region = auditok.load("audio.wav") 391 region = auditok.load("audio.wav")
348 five_second_region = region.millis[5000:10000] 392 five_second_region = region.millis[5000:10000]
393 # or
394 five_second_region = region.ms[5000:10000]
349 395
350 or with the ``seconds`` view: 396 or with the ``seconds`` view:
351 397
352 .. code:: python 398 .. code:: python
353 399
354 import auditok 400 import auditok
355 region = auditok.load("audio.wav") 401 region = auditok.load("audio.wav")
356 five_second_region = region.seconds[5:10] 402 five_second_region = region.seconds[5:10]
403 # or
404 five_second_region = region.sec[5:10]
405 # or
406 five_second_region = region.s[5:10]
357 407
358 ``seconds`` indices can also be floats: 408 ``seconds`` indices can also be floats:
359 409
360 .. code:: python 410 .. code:: python
361 411
362 import auditok 412 import auditok
363 region = auditok.load("audio.wav") 413 region = auditok.load("audio.wav")
364 five_second_region = region.seconds[2.5:7.5] 414 five_second_region = region.seconds[2.5:7.5]
365 415
366 Get arrays of audio samples 416 Export an ``AudioRegion`` as a ``numpy`` array
367 =========================== 417 ==============================================
368 418
369 If `numpy` is not installed, the `samples` attributes is a list of audio samples 419 .. code:: python
370 arrays (standard `array.array` objects), one per channels. If numpy is installed, 420
371 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel 421 from auditok import load, AudioRegion
372 and the second is the the sample. 422 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
373 423 x = audio.numpy()
374 .. code:: python 424 assert x.shape[0] == audio.channels
375 425 assert x.shape[1] == len(audio)
376 import auditok
377 region = auditok.load("audio.wav")
378 samples = region.samples
379 assert len(samples) == region.channels
380
381
382 If `numpy` is installed you can use:
383
384 .. code:: python
385
386 import numpy as np
387 region = auditok.load("audio.wav")
388 samples = np.asarray(region)
389 assert len(samples.shape) == 2