annotate doc/examples.rst @ 397:c89c0977db47

Update README.rst
author Amine SEHILI <amsehili@users.noreply.github.com>
date Thu, 30 Mar 2023 10:31:43 +0200
parents bd242e80455f
children 81bc2375354f
rev   line source
amine@387 1 Load audio data
amine@387 2 ---------------
amine@377 3
amine@379 4 Audio data is loaded with the :func:`load` function which can read from audio
amine@379 5 files, the microphone or use raw audio data.
amine@379 6
amine@377 7 From a file
amine@377 8 ===========
amine@377 9
amine@387 10 If the first argument of :func:`load` is a string, it should be a path to an
amine@387 11 audio file.
amine@369 12
amine@369 13 .. code:: python
amine@369 14
amine@377 15 import auditok
amine@377 16 region = auditok.load("audio.ogg")
amine@369 17
amine@387 18 If input file contains raw (headerless) audio data, passing `audio_format="raw"`
amine@377 19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377 20 mandatory. In the following example we pass audio parameters with their short
amine@377 21 names:
amine@369 22
amine@369 23 .. code:: python
amine@369 24
amine@377 25 region = auditok.load("audio.dat",
amine@377 26 audio_format="raw",
amine@379 27 sr=44100, # alias for `sampling_rate`
amine@379 28 sw=2 # alias for `sample_width`
amine@379 29 ch=1 # alias for `channels`
amine@379 30 )
amine@377 31
amine@377 32 From a `bytes` object
amine@377 33 =====================
amine@377 34
amine@379 35 If the type of the first argument `bytes`, it's interpreted as raw audio data:
amine@377 36
amine@377 37 .. code:: python
amine@377 38
amine@377 39 sr = 16000
amine@377 40 sw = 2
amine@377 41 ch = 1
amine@377 42 data = b"\0" * sr * sw * ch
amine@379 43 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377 44 print(region)
amine@387 45 # alternatively you can use
amine@387 46 #region = auditok.AudioRegion(data, sr, sw, ch)
amine@377 47
amine@377 48 output:
amine@377 49
amine@377 50 .. code:: bash
amine@377 51
amine@377 52 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 53
amine@377 54 From the microphone
amine@377 55 ===================
amine@377 56
amine@379 57 If the first argument is `None`, :func:`load` will try to read data from the
amine@379 58 microphone. Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377 59
amine@377 60
amine@377 61 .. code:: python
amine@377 62
amine@377 63 sr = 16000
amine@377 64 sw = 2
amine@377 65 ch = 1
amine@377 66 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377 67 print(five_sec_audio)
amine@377 68
amine@377 69 output:
amine@377 70
amine@377 71 .. code:: bash
amine@377 72
amine@377 73 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 74
amine@377 75
amine@377 76 Skip part of audio data
amine@377 77 =======================
amine@377 78
amine@387 79 If the `skip` parameter is > 0, :func:`load` will skip that amount in seconds
amine@387 80 of leading audio data:
amine@377 81
amine@377 82 .. code:: python
amine@377 83
amine@377 84 import auditok
amine@377 85 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377 86
amine@387 87 This argument must be 0 when reading data from the microphone.
amine@387 88
amine@387 89
amine@387 90 Limit the amount of read audio
amine@387 91 ==============================
amine@387 92
amine@387 93 If the `max_read` parameter is > 0, :func:`load` will read at most that amount
amine@387 94 in seconds of audio data:
amine@387 95
amine@387 96 .. code:: python
amine@387 97
amine@387 98 import auditok
amine@387 99 region = auditok.load("audio.ogg", max_read=5)
amine@387 100 assert region.duration <= 5
amine@387 101
amine@387 102 This argument is mandatory when reading data from the microphone.
amine@377 103
amine@377 104
amine@377 105 Basic split example
amine@377 106 -------------------
amine@377 107
amine@379 108 In the following we'll use the :func:`split` function to tokenize an audio file,
amine@379 109 requiring that valid audio events be at least 0.2 second long, at most 4 seconds
amine@379 110 long and contain a maximum of 0.3 second of continuous silence. Limiting the size
amine@379 111 of detected events to 4 seconds means that an event of, say, 9.5 seconds will
amine@379 112 be returned as two 4-second events plus a third 1.5-second event. Moreover, a
amine@379 113 valid event might contain many *silences* as far as none of them exceeds 0.3
amine@379 114 second.
amine@379 115
amine@379 116 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
amine@379 117 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
amine@379 118 with another region (see examples below). Notice that :class:`AudioRegion` objects
amine@379 119 returned by :func:`split` have a ``start`` a ``stop`` information stored in
amine@379 120 their meta data that can be accessed like `object.meta.start`.
amine@379 121
amine@377 122 .. code:: python
amine@377 123
amine@377 124 import auditok
amine@377 125
amine@377 126 # split returns a generator of AudioRegion objects
amine@377 127 audio_regions = auditok.split(
amine@377 128 "audio.wav",
amine@377 129 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@377 130 max_dur=4, # maximum duration of an event
amine@377 131 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377 132 energy_threshold=55 # threshold of detection
amine@377 133 )
amine@377 134
amine@377 135 for i, r in enumerate(audio_regions):
amine@377 136
amine@377 137 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377 138 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377 139
amine@377 140 # play detection
amine@377 141 # r.play(progress_bar=True)
amine@377 142
amine@377 143 # region's metadata can also be used with the `save` method
amine@377 144 # (no need to explicitly specify region's object and `format` arguments)
amine@377 145 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377 146 print("region saved as: {}".format(filename))
amine@377 147
amine@377 148 output example:
amine@377 149
amine@377 150 .. code:: bash
amine@377 151
amine@377 152 Region 0: 0.700s -- 1.400s
amine@377 153 region saved as: region_0.700-1.400.wav
amine@377 154 Region 1: 3.800s -- 4.500s
amine@377 155 region saved as: region_3.800-4.500.wav
amine@377 156 Region 2: 8.750s -- 9.950s
amine@377 157 region saved as: region_8.750-9.950.wav
amine@377 158 Region 3: 11.700s -- 12.400s
amine@377 159 region saved as: region_11.700-12.400.wav
amine@377 160 Region 4: 15.050s -- 15.850s
amine@377 161 region saved as: region_15.050-15.850.wav
amine@377 162
amine@377 163
amine@377 164 Split and plot
amine@377 165 --------------
amine@377 166
amine@377 167 Visualize audio signal and detections:
amine@377 168
amine@377 169 .. code:: python
amine@377 170
amine@377 171 import auditok
amine@377 172 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377 173 regions = region.split_and_plot(...) # or just region.splitp()
amine@369 174
amine@369 175 output figure:
amine@369 176
amine@369 177 .. image:: figures/example_1.png
amine@369 178
amine@377 179
amine@377 180 Read and split data from the microphone
amine@377 181 ---------------------------------------
amine@377 182
amine@379 183 If the first argument of :func:`split` is None, audio data is read from the
amine@379 184 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377 185
amine@377 186 .. code:: python
amine@377 187
amine@377 188 import auditok
amine@377 189
amine@377 190 sr = 16000
amine@377 191 sw = 2
amine@377 192 ch = 1
amine@377 193 eth = 55 # alias for energy_threshold, default value is 50
amine@377 194
amine@377 195 try:
amine@377 196 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 197 print(region)
amine@377 198 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377 199 except KeyboardInterrupt:
amine@377 200 pass
amine@377 201
amine@377 202
amine@379 203 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If
amine@379 204 you want to read a specific amount of audio data, pass the desired number of
amine@379 205 seconds with the `max_read` argument.
amine@377 206
amine@377 207
amine@387 208 Access recorded data after split
amine@387 209 --------------------------------
amine@377 210
amine@379 211 Using a :class:`Recorder` object you can get hold of acquired audio data:
amine@377 212
amine@377 213
amine@377 214 .. code:: python
amine@377 215
amine@377 216 import auditok
amine@377 217
amine@377 218 sr = 16000
amine@377 219 sw = 2
amine@377 220 ch = 1
amine@377 221 eth = 55 # alias for energy_threshold, default value is 50
amine@377 222
amine@377 223 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377 224
amine@377 225 try:
amine@377 226 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 227 print(region)
amine@377 228 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377 229 except KeyboardInterrupt:
amine@377 230 pass
amine@377 231
amine@377 232 rec.rewind()
amine@377 233 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379 234 # alternatively you can use
amine@379 235 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@377 236
amine@377 237
amine@379 238 :class:`Recorder` also accepts a `max_read` argument.
amine@377 239
amine@369 240 Working with AudioRegions
amine@369 241 -------------------------
amine@369 242
amine@379 243 The following are a couple of interesting operations you can do with
amine@379 244 :class:`AudioRegion` objects.
amine@369 245
amine@377 246
amine@377 247 Basic region information
amine@377 248 ========================
amine@377 249
amine@377 250 .. code:: python
amine@377 251
amine@377 252 import auditok
amine@377 253 region = auditok.load("audio.wav")
amine@377 254 len(region) # number of audio samples int the regions, one channel considered
amine@377 255 region.duration # duration in seconds
amine@377 256 region.sampling_rate # alias `sr`
amine@377 257 region.sample_width # alias `sw`
amine@377 258 region.channels # alias `ch`
amine@377 259
amine@377 260
amine@369 261 Concatenate regions
amine@369 262 ===================
amine@369 263
amine@369 264 .. code:: python
amine@369 265
amine@377 266 import auditok
amine@377 267 region_1 = auditok.load("audio_1.wav")
amine@377 268 region_2 = auditok.load("audio_2.wav")
amine@369 269 region_3 = region_1 + region_2
amine@369 270
amine@379 271 Particularly useful if you want to join regions returned by :func:`split`:
amine@369 272
amine@369 273 .. code:: python
amine@369 274
amine@377 275 import auditok
amine@377 276 regions = auditok.load("audio.wav").split()
amine@369 277 gapless_region = sum(regions)
amine@369 278
amine@369 279 Repeat a region
amine@369 280 ===============
amine@369 281
amine@369 282 Multiply by a positive integer:
amine@369 283
amine@369 284 .. code:: python
amine@369 285
amine@377 286 import auditok
amine@377 287 region = auditok.load("audio.wav")
amine@369 288 region_x3 = region * 3
amine@369 289
amine@377 290 Split one region into N regions of equal size
amine@377 291 =============================================
amine@369 292
amine@379 293 Divide by a positive integer (this has nothing to do with silence-based
amine@379 294 tokenization):
amine@369 295
amine@369 296 .. code:: python
amine@369 297
amine@377 298 import auditok
amine@377 299 region = auditok.load("audio.wav")
amine@369 300 regions = regions / 5
amine@369 301 assert sum(regions) == region
amine@369 302
amine@379 303 Note that if no perfect division is possible, the last region might be a bit
amine@379 304 shorter than the previous N-1 regions.
amine@369 305
amine@377 306 Slice a region by samples, seconds or milliseconds
amine@377 307 ==================================================
amine@377 308
amine@379 309 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
amine@377 310 example remove a fixed-size portion of audio data from the beginning or from the
amine@377 311 end of a region or crop a region by an arbitrary amount as a data augmentation
amine@379 312 strategy.
amine@369 313
amine@377 314 The most accurate way to slice an `AudioRegion` is to use indices that
amine@369 315 directly refer to raw audio samples. In the following example, assuming that the
amine@369 316 sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369 317 main region, starting from the 20th second as follows:
amine@369 318
amine@369 319 .. code:: python
amine@369 320
amine@377 321 import auditok
amine@377 322 region = auditok.load("audio.wav")
amine@369 323 start = 20 * 16000
amine@369 324 stop = 25 * 16000
amine@369 325 five_second_region = region[start:stop]
amine@369 326
amine@379 327 This allows you to practically start and stop at any audio sample within the region.
amine@369 328 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369 329 also use negative indices:
amine@369 330
amine@369 331 .. code:: python
amine@369 332
amine@377 333 import auditok
amine@377 334 region = auditok.load("audio.wav")
amine@369 335 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369 336 three_last_seconds = region[start:]
amine@369 337
amine@379 338 While slicing by raw samples is flexible, slicing with temporal indices is more
amine@379 339 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
amine@377 340 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369 341
amine@379 342 With the ``millis`` view:
amine@369 343
amine@369 344 .. code:: python
amine@369 345
amine@377 346 import auditok
amine@377 347 region = auditok.load("audio.wav")
amine@369 348 five_second_region = region.millis[5000:10000]
amine@369 349
amine@379 350 or with the ``seconds`` view:
amine@369 351
amine@369 352 .. code:: python
amine@369 353
amine@377 354 import auditok
amine@377 355 region = auditok.load("audio.wav")
amine@369 356 five_second_region = region.seconds[5:10]
amine@369 357
amine@379 358 ``seconds`` indices can also be floats:
amine@369 359
amine@369 360 .. code:: python
amine@369 361
amine@377 362 import auditok
amine@377 363 region = auditok.load("audio.wav")
amine@377 364 five_second_region = region.seconds[2.5:7.5]
amine@377 365
amine@377 366 Get arrays of audio samples
amine@377 367 ===========================
amine@377 368
amine@377 369 If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377 370 arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377 371 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377 372 and the second is the the sample.
amine@377 373
amine@377 374 .. code:: python
amine@377 375
amine@377 376 import auditok
amine@377 377 region = auditok.load("audio.wav")
amine@369 378 samples = region.samples
amine@379 379 assert len(samples) == region.channels
amine@369 380
amine@369 381
amine@387 382 If `numpy` is installed you can use:
amine@369 383
amine@369 384 .. code:: python
amine@369 385
amine@369 386 import numpy as np
amine@377 387 region = auditok.load("audio.wav")
amine@369 388 samples = np.asarray(region)
amine@377 389 assert len(samples.shape) == 2