annotate doc/examples.rst @ 455:7dae98b84cdd tip master

Merge branch 'master' of https://github.com/amsehili/auditok
author www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date Tue, 03 Dec 2024 09:18:01 +0000
parents f9d5eb9387d2
children
rev   line source
amine@387 1 Load audio data
amine@387 2 ---------------
amine@377 3
amine@432 4 Audio data is loaded using the :func:`load` function, which can read from
amine@432 5 audio files, capture from the microphone, or accept raw audio data
amine@432 6 (as a ``bytes`` object).
amine@379 7
amine@377 8 From a file
amine@377 9 ===========
amine@377 10
amine@441 11 If the first argument of :func:`load` is a string or a ``Path``, it should
amine@432 12 refer to an existing audio file.
amine@369 13
amine@369 14 .. code:: python
amine@369 15
amine@377 16 import auditok
amine@377 17 region = auditok.load("audio.ogg")
amine@369 18
amine@432 19 If the input file contains raw (headerless) audio data, specifying audio
amine@432 20 parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required.
amine@432 21 Additionally, if the file name does not end with 'raw', you should explicitly
amine@441 22 pass ``audio_format="raw"`` to the function.
amine@432 23
amine@432 24 In the example below, we provide audio parameters using their abbreviated names:
amine@369 25
amine@369 26 .. code:: python
amine@369 27
amine@377 28 region = auditok.load("audio.dat",
amine@377 29 audio_format="raw",
amine@379 30 sr=44100, # alias for `sampling_rate`
amine@432 31 sw=2, # alias for `sample_width`
amine@379 32 ch=1 # alias for `channels`
amine@379 33 )
amine@377 34
amine@432 35 Alternatively you can user :class:`AudioRegion` to load audio data:
amine@432 36
amine@432 37 .. code:: python
amine@432 38
amine@432 39 from auditok import AudioRegion
amine@432 40 region = AudioRegion.load("audio.dat",
amine@432 41 audio_format="raw",
amine@432 42 sr=44100, # alias for `sampling_rate`
amine@441 43 sw=2, # alias for `sample_width`
amine@432 44 ch=1 # alias for `channels`
amine@432 45 )
amine@432 46
amine@432 47
amine@441 48 From a ``bytes`` object
amine@441 49 =======================
amine@377 50
amine@441 51 If the first argument is of type ``bytes``, it is interpreted as raw audio data:
amine@377 52
amine@377 53 .. code:: python
amine@377 54
amine@377 55 sr = 16000
amine@377 56 sw = 2
amine@377 57 ch = 1
amine@377 58 data = b"\0" * sr * sw * ch
amine@379 59 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377 60 print(region)
amine@387 61 # alternatively you can use
amine@432 62 region = auditok.AudioRegion(data, sr, sw, ch)
amine@377 63
amine@377 64 output:
amine@377 65
amine@377 66 .. code:: bash
amine@377 67
amine@377 68 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 69
amine@377 70 From the microphone
amine@377 71 ===================
amine@377 72
amine@441 73 If the first argument is ``None``, :func:`load` will attempt to read data from the
amine@441 74 microphone. In this case, audio parameters, along with the ``max_read`` parameter,
amine@432 75 are required.
amine@377 76
amine@377 77 .. code:: python
amine@377 78
amine@377 79 sr = 16000
amine@377 80 sw = 2
amine@377 81 ch = 1
amine@377 82 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377 83 print(five_sec_audio)
amine@377 84
amine@377 85 output:
amine@377 86
amine@377 87 .. code:: bash
amine@377 88
amine@377 89 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 90
amine@377 91
amine@377 92 Skip part of audio data
amine@377 93 =======================
amine@377 94
amine@432 95 If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified
amine@432 96 amount of leading audio data, measured in seconds:
amine@377 97
amine@377 98 .. code:: python
amine@377 99
amine@377 100 import auditok
amine@377 101 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377 102
amine@387 103 This argument must be 0 when reading data from the microphone.
amine@387 104
amine@387 105
amine@387 106 Limit the amount of read audio
amine@387 107 ==============================
amine@387 108
amine@432 109 If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount
amine@387 110 in seconds of audio data:
amine@387 111
amine@387 112 .. code:: python
amine@387 113
amine@387 114 import auditok
amine@387 115 region = auditok.load("audio.ogg", max_read=5)
amine@387 116 assert region.duration <= 5
amine@387 117
amine@432 118 This argument is required when reading data from the microphone.
amine@377 119
amine@377 120
amine@377 121 Basic split example
amine@377 122 -------------------
amine@377 123
amine@432 124 In the following example, we'll use the :func:`split` function to tokenize an
amine@432 125 audio file.We’ll specify that valid audio events must be at least 0.2 seconds
amine@432 126 long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
amine@432 127 silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
amine@432 128 will be returned as two 4-second events plus a final 1.5-second event. Additionally,
amine@432 129 a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
amine@379 130
amine@432 131 :func:`split` returns a generator of :class:`AudioRegion` objects. Each
amine@432 132 :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer),
amine@432 133 and concatenated with another region (see examples below). Note that
amine@441 134 :class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop``
amine@432 135 attributes, which mark the beginning and end of the audio event relative to the
amine@432 136 input audio stream.
amine@379 137
amine@377 138 .. code:: python
amine@377 139
amine@377 140 import auditok
amine@377 141
amine@432 142 # `split` returns a generator of AudioRegion objects
amine@432 143 audio_events = auditok.split(
amine@377 144 "audio.wav",
amine@432 145 min_dur=0.2, # Minimum duration of a valid audio event in seconds
amine@432 146 max_dur=4, # Maximum duration of an event
amine@432 147 max_silence=0.3, # Maximum tolerated silence duration within an event
amine@432 148 energy_threshold=55 # Detection threshold
amine@377 149 )
amine@377 150
amine@432 151 for i, r in enumerate(audio_events):
amine@432 152 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
amine@432 153 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
amine@377 154
amine@432 155 # Play the audio event
amine@432 156 r.play(progress_bar=True)
amine@377 157
amine@432 158 # Save the event with start and end times in the filename
amine@432 159 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
amine@441 160 print(f"event saved as: {filename}")
amine@377 161
amine@432 162 Example output:
amine@377 163
amine@377 164 .. code:: bash
amine@377 165
amine@432 166 Event 0: 0.700s -- 1.400s
amine@441 167 event saved as: event_0.700-1.400.wav
amine@432 168 Event 1: 3.800s -- 4.500s
amine@441 169 event saved as: event_3.800-4.500.wav
amine@432 170 Event 2: 8.750s -- 9.950s
amine@441 171 event saved as: event_8.750-9.950.wav
amine@432 172 Event 3: 11.700s -- 12.400s
amine@441 173 event saved as: event_11.700-12.400.wav
amine@432 174 Event 4: 15.050s -- 15.850s
amine@441 175 event saved as: event_15.050-15.850.wav
amine@377 176
amine@377 177 Split and plot
amine@377 178 --------------
amine@377 179
amine@377 180 Visualize audio signal and detections:
amine@377 181
amine@377 182 .. code:: python
amine@377 183
amine@377 184 import auditok
amine@377 185 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377 186 regions = region.split_and_plot(...) # or just region.splitp()
amine@369 187
amine@369 188 output figure:
amine@369 189
amine@369 190 .. image:: figures/example_1.png
amine@369 191
amine@432 192 Split an audio stream and re-join (glue) audio events with silence
amine@432 193 ------------------------------------------------------------------
amine@432 194
amine@432 195 The following code detects audio events within an audio stream, then insert
amine@432 196 1 second of silence between them to create an audio with pauses:
amine@432 197
amine@432 198 .. code:: python
amine@432 199
amine@432 200 # Create a 1-second silent audio region
amine@432 201 # Audio parameters must match the original stream
amine@432 202 from auditok import split, make_silence
amine@432 203 silence = make_silence(duration=1,
amine@432 204 sampling_rate=16000,
amine@432 205 sample_width=2,
amine@432 206 channels=1)
amine@432 207 events = split("audio.wav")
amine@432 208 audio_with_pauses = silence.join(events)
amine@432 209
amine@432 210 Alternatively, use ``split_and_join_with_silence``:
amine@432 211
amine@432 212 .. code:: python
amine@432 213
amine@432 214 from auditok import split_and_join_with_silence
amine@432 215 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
amine@432 216
amine@377 217
amine@441 218 Read audio data from the microphone and perform real-time event detection
amine@441 219 -------------------------------------------------------------------------
amine@377 220
amine@432 221 If the first argument of :func:`split` is ``None``, audio data is read from the
amine@379 222 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377 223
amine@377 224 .. code:: python
amine@377 225
amine@377 226 import auditok
amine@377 227
amine@377 228 sr = 16000
amine@377 229 sw = 2
amine@377 230 ch = 1
amine@377 231 eth = 55 # alias for energy_threshold, default value is 50
amine@377 232
amine@377 233 try:
amine@377 234 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 235 print(region)
amine@377 236 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377 237 except KeyboardInterrupt:
amine@377 238 pass
amine@377 239
amine@377 240
amine@432 241 :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read
amine@432 242 a specific amount of audio data, pass the desired number of seconds using the
amine@441 243 ``max_read`` argument.
amine@377 244
amine@377 245
amine@387 246 Access recorded data after split
amine@387 247 --------------------------------
amine@377 248
amine@432 249 Using a :class:`Recorder` object you can access to audio data read from a file
amine@432 250 of from the mirophone. With the following code press ``Ctrl-C`` to stop recording:
amine@377 251
amine@377 252
amine@377 253 .. code:: python
amine@377 254
amine@377 255 import auditok
amine@377 256
amine@377 257 sr = 16000
amine@377 258 sw = 2
amine@377 259 ch = 1
amine@377 260 eth = 55 # alias for energy_threshold, default value is 50
amine@377 261
amine@377 262 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@432 263 events = []
amine@377 264
amine@377 265 try:
amine@377 266 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 267 print(region)
amine@432 268 region.play(progress_bar=True)
amine@432 269 events.append(region)
amine@377 270 except KeyboardInterrupt:
amine@377 271 pass
amine@377 272
amine@377 273 rec.rewind()
amine@454 274 full_audio = auditok.load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379 275 # alternatively you can use
amine@379 276 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@432 277 full_audio.play(progress_bar=True)
amine@377 278
amine@377 279
amine@441 280 :class:`Recorder` also accepts a ``max_read`` argument.
amine@377 281
amine@369 282 Working with AudioRegions
amine@369 283 -------------------------
amine@369 284
amine@432 285 In the following sections, we will review several operations
amine@441 286 that can be performed with :class:`AudioRegion` objects.
amine@377 287
amine@377 288 Basic region information
amine@377 289 ========================
amine@377 290
amine@377 291 .. code:: python
amine@377 292
amine@377 293 import auditok
amine@377 294 region = auditok.load("audio.wav")
amine@377 295 len(region) # number of audio samples int the regions, one channel considered
amine@377 296 region.duration # duration in seconds
amine@377 297 region.sampling_rate # alias `sr`
amine@377 298 region.sample_width # alias `sw`
amine@377 299 region.channels # alias `ch`
amine@377 300
amine@432 301 When an audio region is returned by the :func:`split` function, it includes defined
amine@432 302 ``start`` and ``end`` attributes that refer to the beginning and end of the audio
amine@432 303 event relative to the input audio stream.
amine@377 304
amine@369 305 Concatenate regions
amine@369 306 ===================
amine@369 307
amine@369 308 .. code:: python
amine@369 309
amine@377 310 import auditok
amine@377 311 region_1 = auditok.load("audio_1.wav")
amine@377 312 region_2 = auditok.load("audio_2.wav")
amine@369 313 region_3 = region_1 + region_2
amine@369 314
amine@432 315 This is particularly useful when you want to join regions returned by the
amine@432 316 :func:`split` function:
amine@369 317
amine@369 318 .. code:: python
amine@369 319
amine@377 320 import auditok
amine@377 321 regions = auditok.load("audio.wav").split()
amine@369 322 gapless_region = sum(regions)
amine@369 323
amine@369 324 Repeat a region
amine@369 325 ===============
amine@369 326
amine@369 327 Multiply by a positive integer:
amine@369 328
amine@369 329 .. code:: python
amine@369 330
amine@377 331 import auditok
amine@377 332 region = auditok.load("audio.wav")
amine@369 333 region_x3 = region * 3
amine@369 334
amine@377 335 Split one region into N regions of equal size
amine@377 336 =============================================
amine@369 337
amine@432 338 Divide by a positive integer (this is unrelated to silence-based tokenization!):
amine@369 339
amine@369 340 .. code:: python
amine@369 341
amine@377 342 import auditok
amine@377 343 region = auditok.load("audio.wav")
amine@369 344 regions = regions / 5
amine@369 345 assert sum(regions) == region
amine@369 346
amine@432 347 Note that if an exact split is not possible, the last region may be shorter
amine@432 348 than the preceding N-1 regions.
amine@369 349
amine@377 350 Slice a region by samples, seconds or milliseconds
amine@377 351 ==================================================
amine@377 352
amine@432 353 Slicing an :class:`AudioRegion` can be useful in various situations.
amine@432 354 For example, you can remove a fixed-length portion of audio data from
amine@432 355 the beginning or end of a region, or crop a region by an arbitrary amount
amine@432 356 as a data augmentation strategy.
amine@369 357
amine@441 358 The most accurate way to slice an :class:`AudioRegion` is by using indices
amine@441 359 that directly refer to raw audio samples. In the following example, assuming
amine@432 360 the audio data has a sampling rate of 16000, you can extract a 5-second
amine@432 361 segment from the main region, starting at the 20th second, as follows:
amine@369 362
amine@369 363 .. code:: python
amine@369 364
amine@377 365 import auditok
amine@377 366 region = auditok.load("audio.wav")
amine@369 367 start = 20 * 16000
amine@369 368 stop = 25 * 16000
amine@369 369 five_second_region = region[start:stop]
amine@369 370
amine@432 371 This allows you to start and stop at any audio sample within the region. Similar
amine@432 372 to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative
amine@432 373 indices are also supported:
amine@369 374
amine@369 375 .. code:: python
amine@369 376
amine@377 377 import auditok
amine@377 378 region = auditok.load("audio.wav")
amine@369 379 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369 380 three_last_seconds = region[start:]
amine@369 381
amine@432 382 While slicing by raw samples offers flexibility, using temporal indices is
amine@432 383 often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds``
amine@432 384 *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``).
amine@369 385
amine@379 386 With the ``millis`` view:
amine@369 387
amine@369 388 .. code:: python
amine@369 389
amine@377 390 import auditok
amine@377 391 region = auditok.load("audio.wav")
amine@369 392 five_second_region = region.millis[5000:10000]
amine@432 393 # or
amine@432 394 five_second_region = region.ms[5000:10000]
amine@369 395
amine@379 396 or with the ``seconds`` view:
amine@369 397
amine@369 398 .. code:: python
amine@369 399
amine@377 400 import auditok
amine@377 401 region = auditok.load("audio.wav")
amine@369 402 five_second_region = region.seconds[5:10]
amine@432 403 # or
amine@432 404 five_second_region = region.sec[5:10]
amine@432 405 # or
amine@432 406 five_second_region = region.s[5:10]
amine@369 407
amine@379 408 ``seconds`` indices can also be floats:
amine@369 409
amine@369 410 .. code:: python
amine@369 411
amine@377 412 import auditok
amine@377 413 region = auditok.load("audio.wav")
amine@377 414 five_second_region = region.seconds[2.5:7.5]
amine@377 415
amine@432 416 Export an ``AudioRegion`` as a ``numpy`` array
amine@432 417 ==============================================
amine@377 418
amine@377 419 .. code:: python
amine@377 420
amine@432 421 from auditok import load, AudioRegion
amine@432 422 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
amine@432 423 x = audio.numpy()
amine@432 424 assert x.shape[0] == audio.channels
amine@432 425 assert x.shape[1] == len(audio)