Mercurial > hg > auditok
comparison doc/examples.rst @ 432:81bc2375354f
Update documentation
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 30 Oct 2024 17:21:30 +0100 |
parents | bd242e80455f |
children | 6cf3ea23fadb |
comparison
equal
deleted
inserted
replaced
431:69160c7aefff | 432:81bc2375354f |
---|---|
1 Load audio data | 1 Load audio data |
2 --------------- | 2 --------------- |
3 | 3 |
4 Audio data is loaded with the :func:`load` function which can read from audio | 4 Audio data is loaded using the :func:`load` function, which can read from |
5 files, the microphone or use raw audio data. | 5 audio files, capture from the microphone, or accept raw audio data |
6 (as a ``bytes`` object). | |
6 | 7 |
7 From a file | 8 From a file |
8 =========== | 9 =========== |
9 | 10 |
10 If the first argument of :func:`load` is a string, it should be a path to an | 11 If the first argument of :func:`load` is a string or a `Path`, it should |
11 audio file. | 12 refer to an existing audio file. |
12 | 13 |
13 .. code:: python | 14 .. code:: python |
14 | 15 |
15 import auditok | 16 import auditok |
16 region = auditok.load("audio.ogg") | 17 region = auditok.load("audio.ogg") |
17 | 18 |
18 If input file contains raw (headerless) audio data, passing `audio_format="raw"` | 19 If the input file contains raw (headerless) audio data, specifying audio |
19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is | 20 parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required. |
20 mandatory. In the following example we pass audio parameters with their short | 21 Additionally, if the file name does not end with 'raw', you should explicitly |
21 names: | 22 pass `audio_format="raw"` to the function. |
23 | |
24 In the example below, we provide audio parameters using their abbreviated names: | |
22 | 25 |
23 .. code:: python | 26 .. code:: python |
24 | 27 |
25 region = auditok.load("audio.dat", | 28 region = auditok.load("audio.dat", |
26 audio_format="raw", | 29 audio_format="raw", |
27 sr=44100, # alias for `sampling_rate` | 30 sr=44100, # alias for `sampling_rate` |
28 sw=2 # alias for `sample_width` | 31 sw=2, # alias for `sample_width` |
29 ch=1 # alias for `channels` | 32 ch=1 # alias for `channels` |
30 ) | 33 ) |
31 | 34 |
35 Alternatively you can user :class:`AudioRegion` to load audio data: | |
36 | |
37 .. code:: python | |
38 | |
39 from auditok import AudioRegion | |
40 region = AudioRegion.load("audio.dat", | |
41 audio_format="raw", | |
42 sr=44100, # alias for `sampling_rate` | |
43 sw=2, # alias for `sample_width` | |
44 ch=1 # alias for `channels` | |
45 ) | |
46 | |
47 | |
32 From a `bytes` object | 48 From a `bytes` object |
33 ===================== | 49 ===================== |
34 | 50 |
35 If the type of the first argument `bytes`, it's interpreted as raw audio data: | 51 If the first argument is of type `bytes`, it is interpreted as raw audio data: |
36 | 52 |
37 .. code:: python | 53 .. code:: python |
38 | 54 |
39 sr = 16000 | 55 sr = 16000 |
40 sw = 2 | 56 sw = 2 |
41 ch = 1 | 57 ch = 1 |
42 data = b"\0" * sr * sw * ch | 58 data = b"\0" * sr * sw * ch |
43 region = auditok.load(data, sr=sr, sw=sw, ch=ch) | 59 region = auditok.load(data, sr=sr, sw=sw, ch=ch) |
44 print(region) | 60 print(region) |
45 # alternatively you can use | 61 # alternatively you can use |
46 #region = auditok.AudioRegion(data, sr, sw, ch) | 62 region = auditok.AudioRegion(data, sr, sw, ch) |
47 | 63 |
48 output: | 64 output: |
49 | 65 |
50 .. code:: bash | 66 .. code:: bash |
51 | 67 |
52 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) | 68 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) |
53 | 69 |
54 From the microphone | 70 From the microphone |
55 =================== | 71 =================== |
56 | 72 |
57 If the first argument is `None`, :func:`load` will try to read data from the | 73 If the first argument is `None`, :func:`load` will attempt to read data from the |
58 microphone. Audio parameters, as well as the `max_read` parameter are mandatory: | 74 microphone. In this case, audio parameters, along with the `max_read` parameter, |
59 | 75 are required. |
60 | 76 |
61 .. code:: python | 77 .. code:: python |
62 | 78 |
63 sr = 16000 | 79 sr = 16000 |
64 sw = 2 | 80 sw = 2 |
74 | 90 |
75 | 91 |
76 Skip part of audio data | 92 Skip part of audio data |
77 ======================= | 93 ======================= |
78 | 94 |
79 If the `skip` parameter is > 0, :func:`load` will skip that amount in seconds | 95 If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified |
80 of leading audio data: | 96 amount of leading audio data, measured in seconds: |
81 | 97 |
82 .. code:: python | 98 .. code:: python |
83 | 99 |
84 import auditok | 100 import auditok |
85 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds | 101 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds |
88 | 104 |
89 | 105 |
90 Limit the amount of read audio | 106 Limit the amount of read audio |
91 ============================== | 107 ============================== |
92 | 108 |
93 If the `max_read` parameter is > 0, :func:`load` will read at most that amount | 109 If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount |
94 in seconds of audio data: | 110 in seconds of audio data: |
95 | 111 |
96 .. code:: python | 112 .. code:: python |
97 | 113 |
98 import auditok | 114 import auditok |
99 region = auditok.load("audio.ogg", max_read=5) | 115 region = auditok.load("audio.ogg", max_read=5) |
100 assert region.duration <= 5 | 116 assert region.duration <= 5 |
101 | 117 |
102 This argument is mandatory when reading data from the microphone. | 118 This argument is required when reading data from the microphone. |
103 | 119 |
104 | 120 |
105 Basic split example | 121 Basic split example |
106 ------------------- | 122 ------------------- |
107 | 123 |
108 In the following we'll use the :func:`split` function to tokenize an audio file, | 124 In the following example, we'll use the :func:`split` function to tokenize an |
109 requiring that valid audio events be at least 0.2 second long, at most 4 seconds | 125 audio file.We’ll specify that valid audio events must be at least 0.2 seconds |
110 long and contain a maximum of 0.3 second of continuous silence. Limiting the size | 126 long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous |
111 of detected events to 4 seconds means that an event of, say, 9.5 seconds will | 127 silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance, |
112 be returned as two 4-second events plus a third 1.5-second event. Moreover, a | 128 will be returned as two 4-second events plus a final 1.5-second event. Additionally, |
113 valid event might contain many *silences* as far as none of them exceeds 0.3 | 129 a valid event may contain multiple silences, as long as none exceed 0.3 seconds. |
114 second. | 130 |
115 | 131 :func:`split` returns a generator of :class:`AudioRegion` objects. Each |
116 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion` | 132 :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer), |
117 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated | 133 and concatenated with another region (see examples below). Note that |
118 with another region (see examples below). Notice that :class:`AudioRegion` objects | 134 :class:`AudioRegion` objects returned by :func:`split` include `start` and `stop` |
119 returned by :func:`split` have a ``start`` a ``stop`` information stored in | 135 attributes, which mark the beginning and end of the audio event relative to the |
120 their meta data that can be accessed like `object.meta.start`. | 136 input audio stream. |
121 | 137 |
122 .. code:: python | 138 .. code:: python |
123 | 139 |
124 import auditok | 140 import auditok |
125 | 141 |
126 # split returns a generator of AudioRegion objects | 142 # `split` returns a generator of AudioRegion objects |
127 audio_regions = auditok.split( | 143 audio_events = auditok.split( |
128 "audio.wav", | 144 "audio.wav", |
129 min_dur=0.2, # minimum duration of a valid audio event in seconds | 145 min_dur=0.2, # Minimum duration of a valid audio event in seconds |
130 max_dur=4, # maximum duration of an event | 146 max_dur=4, # Maximum duration of an event |
131 max_silence=0.3, # maximum duration of tolerated continuous silence within an event | 147 max_silence=0.3, # Maximum tolerated silence duration within an event |
132 energy_threshold=55 # threshold of detection | 148 energy_threshold=55 # Detection threshold |
133 ) | 149 ) |
134 | 150 |
135 for i, r in enumerate(audio_regions): | 151 for i, r in enumerate(audio_events): |
136 | 152 # AudioRegions returned by `split` have defined 'start' and 'end' attributes |
137 # Regions returned by `split` have 'start' and 'end' metadata fields | 153 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}") |
138 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) | 154 |
139 | 155 # Play the audio event |
140 # play detection | 156 r.play(progress_bar=True) |
141 # r.play(progress_bar=True) | 157 |
142 | 158 # Save the event with start and end times in the filename |
143 # region's metadata can also be used with the `save` method | 159 filename = r.save("event_{start:.3f}-{end:.3f}.wav") |
144 # (no need to explicitly specify region's object and `format` arguments) | 160 print(f"Event saved as: {filename}") |
145 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") | 161 |
146 print("region saved as: {}".format(filename)) | 162 Example output: |
147 | |
148 output example: | |
149 | 163 |
150 .. code:: bash | 164 .. code:: bash |
151 | 165 |
152 Region 0: 0.700s -- 1.400s | 166 Event 0: 0.700s -- 1.400s |
153 region saved as: region_0.700-1.400.wav | 167 Event saved as: event_0.700-1.400.wav |
154 Region 1: 3.800s -- 4.500s | 168 Event 1: 3.800s -- 4.500s |
155 region saved as: region_3.800-4.500.wav | 169 Event saved as: event_3.800-4.500.wav |
156 Region 2: 8.750s -- 9.950s | 170 Event 2: 8.750s -- 9.950s |
157 region saved as: region_8.750-9.950.wav | 171 Event saved as: event_8.750-9.950.wav |
158 Region 3: 11.700s -- 12.400s | 172 Event 3: 11.700s -- 12.400s |
159 region saved as: region_11.700-12.400.wav | 173 Event saved as: event_11.700-12.400.wav |
160 Region 4: 15.050s -- 15.850s | 174 Event 4: 15.050s -- 15.850s |
161 region saved as: region_15.050-15.850.wav | 175 Event saved as: event_15.050-15.850.wav |
162 | |
163 | 176 |
164 Split and plot | 177 Split and plot |
165 -------------- | 178 -------------- |
166 | 179 |
167 Visualize audio signal and detections: | 180 Visualize audio signal and detections: |
174 | 187 |
175 output figure: | 188 output figure: |
176 | 189 |
177 .. image:: figures/example_1.png | 190 .. image:: figures/example_1.png |
178 | 191 |
192 Split an audio stream and re-join (glue) audio events with silence | |
193 ------------------------------------------------------------------ | |
194 | |
195 The following code detects audio events within an audio stream, then insert | |
196 1 second of silence between them to create an audio with pauses: | |
197 | |
198 .. code:: python | |
199 | |
200 # Create a 1-second silent audio region | |
201 # Audio parameters must match the original stream | |
202 from auditok import split, make_silence | |
203 silence = make_silence(duration=1, | |
204 sampling_rate=16000, | |
205 sample_width=2, | |
206 channels=1) | |
207 events = split("audio.wav") | |
208 audio_with_pauses = silence.join(events) | |
209 | |
210 Alternatively, use ``split_and_join_with_silence``: | |
211 | |
212 .. code:: python | |
213 | |
214 from auditok import split_and_join_with_silence | |
215 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav") | |
216 | |
179 | 217 |
180 Read and split data from the microphone | 218 Read and split data from the microphone |
181 --------------------------------------- | 219 --------------------------------------- |
182 | 220 |
183 If the first argument of :func:`split` is None, audio data is read from the | 221 If the first argument of :func:`split` is ``None``, audio data is read from the |
184 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): | 222 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): |
185 | 223 |
186 .. code:: python | 224 .. code:: python |
187 | 225 |
188 import auditok | 226 import auditok |
198 region.play(progress_bar=True) # progress bar requires `tqdm` | 236 region.play(progress_bar=True) # progress bar requires `tqdm` |
199 except KeyboardInterrupt: | 237 except KeyboardInterrupt: |
200 pass | 238 pass |
201 | 239 |
202 | 240 |
203 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If | 241 :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read |
204 you want to read a specific amount of audio data, pass the desired number of | 242 a specific amount of audio data, pass the desired number of seconds using the |
205 seconds with the `max_read` argument. | 243 `max_read` argument. |
206 | 244 |
207 | 245 |
208 Access recorded data after split | 246 Access recorded data after split |
209 -------------------------------- | 247 -------------------------------- |
210 | 248 |
211 Using a :class:`Recorder` object you can get hold of acquired audio data: | 249 Using a :class:`Recorder` object you can access to audio data read from a file |
250 of from the mirophone. With the following code press ``Ctrl-C`` to stop recording: | |
212 | 251 |
213 | 252 |
214 .. code:: python | 253 .. code:: python |
215 | 254 |
216 import auditok | 255 import auditok |
219 sw = 2 | 258 sw = 2 |
220 ch = 1 | 259 ch = 1 |
221 eth = 55 # alias for energy_threshold, default value is 50 | 260 eth = 55 # alias for energy_threshold, default value is 50 |
222 | 261 |
223 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) | 262 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) |
263 events = [] | |
224 | 264 |
225 try: | 265 try: |
226 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): | 266 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): |
227 print(region) | 267 print(region) |
228 region.play(progress_bar=True) # progress bar requires `tqdm` | 268 region.play(progress_bar=True) |
269 events.append(region) | |
229 except KeyboardInterrupt: | 270 except KeyboardInterrupt: |
230 pass | 271 pass |
231 | 272 |
232 rec.rewind() | 273 rec.rewind() |
233 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) | 274 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) |
234 # alternatively you can use | 275 # alternatively you can use |
235 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch) | 276 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch) |
277 full_audio.play(progress_bar=True) | |
236 | 278 |
237 | 279 |
238 :class:`Recorder` also accepts a `max_read` argument. | 280 :class:`Recorder` also accepts a `max_read` argument. |
239 | 281 |
240 Working with AudioRegions | 282 Working with AudioRegions |
241 ------------------------- | 283 ------------------------- |
242 | 284 |
243 The following are a couple of interesting operations you can do with | 285 In the following sections, we will review several operations |
244 :class:`AudioRegion` objects. | 286 that can be performed with :class:AudioRegion objects. |
245 | |
246 | 287 |
247 Basic region information | 288 Basic region information |
248 ======================== | 289 ======================== |
249 | 290 |
250 .. code:: python | 291 .. code:: python |
255 region.duration # duration in seconds | 296 region.duration # duration in seconds |
256 region.sampling_rate # alias `sr` | 297 region.sampling_rate # alias `sr` |
257 region.sample_width # alias `sw` | 298 region.sample_width # alias `sw` |
258 region.channels # alias `ch` | 299 region.channels # alias `ch` |
259 | 300 |
301 When an audio region is returned by the :func:`split` function, it includes defined | |
302 ``start`` and ``end`` attributes that refer to the beginning and end of the audio | |
303 event relative to the input audio stream. | |
260 | 304 |
261 Concatenate regions | 305 Concatenate regions |
262 =================== | 306 =================== |
263 | 307 |
264 .. code:: python | 308 .. code:: python |
266 import auditok | 310 import auditok |
267 region_1 = auditok.load("audio_1.wav") | 311 region_1 = auditok.load("audio_1.wav") |
268 region_2 = auditok.load("audio_2.wav") | 312 region_2 = auditok.load("audio_2.wav") |
269 region_3 = region_1 + region_2 | 313 region_3 = region_1 + region_2 |
270 | 314 |
271 Particularly useful if you want to join regions returned by :func:`split`: | 315 This is particularly useful when you want to join regions returned by the |
316 :func:`split` function: | |
272 | 317 |
273 .. code:: python | 318 .. code:: python |
274 | 319 |
275 import auditok | 320 import auditok |
276 regions = auditok.load("audio.wav").split() | 321 regions = auditok.load("audio.wav").split() |
288 region_x3 = region * 3 | 333 region_x3 = region * 3 |
289 | 334 |
290 Split one region into N regions of equal size | 335 Split one region into N regions of equal size |
291 ============================================= | 336 ============================================= |
292 | 337 |
293 Divide by a positive integer (this has nothing to do with silence-based | 338 Divide by a positive integer (this is unrelated to silence-based tokenization!): |
294 tokenization): | |
295 | 339 |
296 .. code:: python | 340 .. code:: python |
297 | 341 |
298 import auditok | 342 import auditok |
299 region = auditok.load("audio.wav") | 343 region = auditok.load("audio.wav") |
300 regions = regions / 5 | 344 regions = regions / 5 |
301 assert sum(regions) == region | 345 assert sum(regions) == region |
302 | 346 |
303 Note that if no perfect division is possible, the last region might be a bit | 347 Note that if an exact split is not possible, the last region may be shorter |
304 shorter than the previous N-1 regions. | 348 than the preceding N-1 regions. |
305 | 349 |
306 Slice a region by samples, seconds or milliseconds | 350 Slice a region by samples, seconds or milliseconds |
307 ================================================== | 351 ================================================== |
308 | 352 |
309 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for | 353 Slicing an :class:`AudioRegion` can be useful in various situations. |
310 example remove a fixed-size portion of audio data from the beginning or from the | 354 For example, you can remove a fixed-length portion of audio data from |
311 end of a region or crop a region by an arbitrary amount as a data augmentation | 355 the beginning or end of a region, or crop a region by an arbitrary amount |
312 strategy. | 356 as a data augmentation strategy. |
313 | 357 |
314 The most accurate way to slice an `AudioRegion` is to use indices that | 358 The most accurate way to slice an `AudioRegion` is by using indices that |
315 directly refer to raw audio samples. In the following example, assuming that the | 359 directly refer to raw audio samples. In the following example, assuming |
316 sampling rate of audio data is 16000, you can extract a 5-second region from | 360 the audio data has a sampling rate of 16000, you can extract a 5-second |
317 main region, starting from the 20th second as follows: | 361 segment from the main region, starting at the 20th second, as follows: |
318 | 362 |
319 .. code:: python | 363 .. code:: python |
320 | 364 |
321 import auditok | 365 import auditok |
322 region = auditok.load("audio.wav") | 366 region = auditok.load("audio.wav") |
323 start = 20 * 16000 | 367 start = 20 * 16000 |
324 stop = 25 * 16000 | 368 stop = 25 * 16000 |
325 five_second_region = region[start:stop] | 369 five_second_region = region[start:stop] |
326 | 370 |
327 This allows you to practically start and stop at any audio sample within the region. | 371 This allows you to start and stop at any audio sample within the region. Similar |
328 Just as with a `list` you can omit one of `start` and `stop`, or both. You can | 372 to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative |
329 also use negative indices: | 373 indices are also supported: |
330 | 374 |
331 .. code:: python | 375 .. code:: python |
332 | 376 |
333 import auditok | 377 import auditok |
334 region = auditok.load("audio.wav") | 378 region = auditok.load("audio.wav") |
335 start = -3 * region.sr # `sr` is an alias of `sampling_rate` | 379 start = -3 * region.sr # `sr` is an alias of `sampling_rate` |
336 three_last_seconds = region[start:] | 380 three_last_seconds = region[start:] |
337 | 381 |
338 While slicing by raw samples is flexible, slicing with temporal indices is more | 382 While slicing by raw samples offers flexibility, using temporal indices is |
339 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an | 383 often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds`` |
340 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). | 384 *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``). |
341 | 385 |
342 With the ``millis`` view: | 386 With the ``millis`` view: |
343 | 387 |
344 .. code:: python | 388 .. code:: python |
345 | 389 |
346 import auditok | 390 import auditok |
347 region = auditok.load("audio.wav") | 391 region = auditok.load("audio.wav") |
348 five_second_region = region.millis[5000:10000] | 392 five_second_region = region.millis[5000:10000] |
393 # or | |
394 five_second_region = region.ms[5000:10000] | |
349 | 395 |
350 or with the ``seconds`` view: | 396 or with the ``seconds`` view: |
351 | 397 |
352 .. code:: python | 398 .. code:: python |
353 | 399 |
354 import auditok | 400 import auditok |
355 region = auditok.load("audio.wav") | 401 region = auditok.load("audio.wav") |
356 five_second_region = region.seconds[5:10] | 402 five_second_region = region.seconds[5:10] |
403 # or | |
404 five_second_region = region.sec[5:10] | |
405 # or | |
406 five_second_region = region.s[5:10] | |
357 | 407 |
358 ``seconds`` indices can also be floats: | 408 ``seconds`` indices can also be floats: |
359 | 409 |
360 .. code:: python | 410 .. code:: python |
361 | 411 |
362 import auditok | 412 import auditok |
363 region = auditok.load("audio.wav") | 413 region = auditok.load("audio.wav") |
364 five_second_region = region.seconds[2.5:7.5] | 414 five_second_region = region.seconds[2.5:7.5] |
365 | 415 |
366 Get arrays of audio samples | 416 Export an ``AudioRegion`` as a ``numpy`` array |
367 =========================== | 417 ============================================== |
368 | 418 |
369 If `numpy` is not installed, the `samples` attributes is a list of audio samples | 419 .. code:: python |
370 arrays (standard `array.array` objects), one per channels. If numpy is installed, | 420 |
371 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel | 421 from auditok import load, AudioRegion |
372 and the second is the the sample. | 422 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")` |
373 | 423 x = audio.numpy() |
374 .. code:: python | 424 assert x.shape[0] == audio.channels |
375 | 425 assert x.shape[1] == len(audio) |
376 import auditok | |
377 region = auditok.load("audio.wav") | |
378 samples = region.samples | |
379 assert len(samples) == region.channels | |
380 | |
381 | |
382 If `numpy` is installed you can use: | |
383 | |
384 .. code:: python | |
385 | |
386 import numpy as np | |
387 region = auditok.load("audio.wav") | |
388 samples = np.asarray(region) | |
389 assert len(samples.shape) == 2 |