annotate doc/examples.rst @ 377:c6308873f239

Improve documentation, add more examples
author Amine Sehili <amine.sehili@gmail.com>
date Wed, 17 Feb 2021 21:18:05 +0100
parents 0106c4799906
children df2a320e10d5
rev   line source
amine@377 1 Loading audio data
amine@377 2 ------------------
amine@377 3
amine@377 4 From a file
amine@377 5 ===========
amine@377 6
amine@377 7 If the first argument of `load` is a string, it should be a path to an audio
amine@377 8 file.
amine@369 9
amine@369 10 .. code:: python
amine@369 11
amine@377 12 import auditok
amine@377 13 region = auditok.load("audio.ogg")
amine@369 14
amine@377 15 If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
amine@377 16 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377 17 mandatory. In the following example we pass audio parameters with their short
amine@377 18 names:
amine@369 19
amine@369 20 .. code:: python
amine@369 21
amine@377 22 region = auditok.load("audio.dat",
amine@377 23 audio_format="raw",
amine@377 24 sr=44100,
amine@377 25 sw=2
amine@377 26 ch=1)
amine@377 27
amine@377 28 From a `bytes` object
amine@377 29 =====================
amine@377 30
amine@377 31 If the first argument is of type `bytes` it's interpreted as raw audio data:
amine@377 32
amine@377 33 .. code:: python
amine@377 34
amine@377 35 sr = 16000
amine@377 36 sw = 2
amine@377 37 ch = 1
amine@377 38 data = b"\0" * sr * sw * ch
amine@377 39 load(data, sr=sr, sw=sw, ch=ch)
amine@377 40 print(region)
amine@377 41
amine@377 42 output:
amine@377 43
amine@377 44 .. code:: bash
amine@377 45
amine@377 46 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 47
amine@377 48 From the microphone
amine@377 49 ===================
amine@377 50
amine@377 51 If the first argument is `None`, `load` will try to read data from the microphone.
amine@377 52 Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377 53
amine@377 54
amine@377 55 .. code:: python
amine@377 56
amine@377 57 sr = 16000
amine@377 58 sw = 2
amine@377 59 ch = 1
amine@377 60 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377 61 print(five_sec_audio)
amine@377 62
amine@377 63 output:
amine@377 64
amine@377 65 .. code:: bash
amine@377 66
amine@377 67 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377 68
amine@377 69
amine@377 70 Skip part of audio data
amine@377 71 =======================
amine@377 72
amine@377 73 If the `skip` parameter is > 0, `load` will skip that leading amount of audio
amine@377 74 data:
amine@377 75
amine@377 76 .. code:: python
amine@377 77
amine@377 78 import auditok
amine@377 79 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377 80
amine@377 81 This argument must be 0 when reading from the microphone.
amine@377 82
amine@377 83
amine@377 84 Basic split example
amine@377 85 -------------------
amine@377 86
amine@377 87 .. code:: python
amine@377 88
amine@377 89 import auditok
amine@377 90
amine@377 91 # split returns a generator of AudioRegion objects
amine@377 92 audio_regions = auditok.split(
amine@377 93 "audio.wav",
amine@377 94 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@377 95 max_dur=4, # maximum duration of an event
amine@377 96 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377 97 energy_threshold=55 # threshold of detection
amine@377 98 )
amine@377 99
amine@377 100 for i, r in enumerate(audio_regions):
amine@377 101
amine@377 102 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377 103 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377 104
amine@377 105 # play detection
amine@377 106 # r.play(progress_bar=True)
amine@377 107
amine@377 108 # region's metadata can also be used with the `save` method
amine@377 109 # (no need to explicitly specify region's object and `format` arguments)
amine@377 110 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377 111 print("region saved as: {}".format(filename))
amine@377 112
amine@377 113 output example:
amine@377 114
amine@377 115 .. code:: bash
amine@377 116
amine@377 117 Region 0: 0.700s -- 1.400s
amine@377 118 region saved as: region_0.700-1.400.wav
amine@377 119 Region 1: 3.800s -- 4.500s
amine@377 120 region saved as: region_3.800-4.500.wav
amine@377 121 Region 2: 8.750s -- 9.950s
amine@377 122 region saved as: region_8.750-9.950.wav
amine@377 123 Region 3: 11.700s -- 12.400s
amine@377 124 region saved as: region_11.700-12.400.wav
amine@377 125 Region 4: 15.050s -- 15.850s
amine@377 126 region saved as: region_15.050-15.850.wav
amine@377 127
amine@377 128
amine@377 129 Split and plot
amine@377 130 --------------
amine@377 131
amine@377 132 Visualize audio signal and detections:
amine@377 133
amine@377 134 .. code:: python
amine@377 135
amine@377 136 import auditok
amine@377 137 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377 138 regions = region.split_and_plot(...) # or just region.splitp()
amine@369 139
amine@369 140 output figure:
amine@369 141
amine@369 142 .. image:: figures/example_1.png
amine@369 143
amine@377 144
amine@377 145 Read and split data from the microphone
amine@377 146 ---------------------------------------
amine@377 147
amine@377 148 If the first argument of `split` is None, audio data is read from the microphone
amine@377 149 (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377 150
amine@377 151 .. code:: python
amine@377 152
amine@377 153 import auditok
amine@377 154
amine@377 155 sr = 16000
amine@377 156 sw = 2
amine@377 157 ch = 1
amine@377 158 eth = 55 # alias for energy_threshold, default value is 50
amine@377 159
amine@377 160 try:
amine@377 161 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 162 print(region)
amine@377 163 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377 164 except KeyboardInterrupt:
amine@377 165 pass
amine@377 166
amine@377 167
amine@377 168 `split` will continue reading audio data until you press ``Ctrl-C``. If you want
amine@377 169 to read a specific amount of audio data, pass the desired number of seconds with
amine@377 170 the `max_read` argument.
amine@377 171
amine@377 172
amine@377 173 Accessing recorded data after split
amine@377 174 -----------------------------------
amine@377 175
amine@377 176 Using a `Recorder` object you can get hold of acquired audio:
amine@377 177
amine@377 178
amine@377 179 .. code:: python
amine@377 180
amine@377 181 import auditok
amine@377 182
amine@377 183 sr = 16000
amine@377 184 sw = 2
amine@377 185 ch = 1
amine@377 186 eth = 55 # alias for energy_threshold, default value is 50
amine@377 187
amine@377 188 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377 189
amine@377 190 try:
amine@377 191 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377 192 print(region)
amine@377 193 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377 194 except KeyboardInterrupt:
amine@377 195 pass
amine@377 196
amine@377 197 rec.rewind()
amine@377 198 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@377 199
amine@377 200
amine@377 201 `Recorder` also accepts a `max_read` argument.
amine@377 202
amine@369 203 Working with AudioRegions
amine@369 204 -------------------------
amine@369 205
amine@369 206 Beyond splitting, there are a couple of interesting operations you can do with
amine@369 207 `AudioRegion` objects.
amine@369 208
amine@377 209
amine@377 210 Basic region information
amine@377 211 ========================
amine@377 212
amine@377 213 .. code:: python
amine@377 214
amine@377 215 import auditok
amine@377 216 region = auditok.load("audio.wav")
amine@377 217 len(region) # number of audio samples int the regions, one channel considered
amine@377 218 region.duration # duration in seconds
amine@377 219 region.sampling_rate # alias `sr`
amine@377 220 region.sample_width # alias `sw`
amine@377 221 region.channels # alias `ch`
amine@377 222
amine@377 223
amine@369 224 Concatenate regions
amine@369 225 ===================
amine@369 226
amine@369 227 .. code:: python
amine@369 228
amine@377 229 import auditok
amine@377 230 region_1 = auditok.load("audio_1.wav")
amine@377 231 region_2 = auditok.load("audio_2.wav")
amine@369 232 region_3 = region_1 + region_2
amine@369 233
amine@377 234 Particularly useful if you want to join regions returned by `split`:
amine@369 235
amine@369 236 .. code:: python
amine@369 237
amine@377 238 import auditok
amine@377 239 regions = auditok.load("audio.wav").split()
amine@369 240 gapless_region = sum(regions)
amine@369 241
amine@369 242 Repeat a region
amine@369 243 ===============
amine@369 244
amine@369 245 Multiply by a positive integer:
amine@369 246
amine@369 247 .. code:: python
amine@369 248
amine@377 249 import auditok
amine@377 250 region = auditok.load("audio.wav")
amine@369 251 region_x3 = region * 3
amine@369 252
amine@377 253 Split one region into N regions of equal size
amine@377 254 =============================================
amine@369 255
amine@369 256 Divide by a positive integer:
amine@369 257
amine@369 258 .. code:: python
amine@369 259
amine@377 260 import auditok
amine@377 261 region = auditok.load("audio.wav")
amine@369 262 regions = regions / 5
amine@369 263 assert sum(regions) == region
amine@369 264
amine@377 265 Note that if perfect division is possible, the last region might be a bit shorter
amine@377 266 than the previous N-1 regions.
amine@369 267
amine@377 268 Slice a region by samples, seconds or milliseconds
amine@377 269 ==================================================
amine@377 270
amine@377 271 Slicing an `AudioRegion` can be interesting in many situations. You can for
amine@377 272 example remove a fixed-size portion of audio data from the beginning or from the
amine@377 273 end of a region or crop a region by an arbitrary amount as a data augmentation
amine@369 274 strategy, etc.
amine@369 275
amine@377 276 The most accurate way to slice an `AudioRegion` is to use indices that
amine@369 277 directly refer to raw audio samples. In the following example, assuming that the
amine@369 278 sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369 279 main region, starting from the 20th second as follows:
amine@369 280
amine@369 281 .. code:: python
amine@369 282
amine@377 283 import auditok
amine@377 284 region = auditok.load("audio.wav")
amine@369 285 start = 20 * 16000
amine@369 286 stop = 25 * 16000
amine@369 287 five_second_region = region[start:stop]
amine@369 288
amine@377 289 This allows you to practically start and stop at any audio sample of the region.
amine@369 290 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369 291 also use negative indices:
amine@369 292
amine@369 293 .. code:: python
amine@369 294
amine@377 295 import auditok
amine@377 296 region = auditok.load("audio.wav")
amine@369 297 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369 298 three_last_seconds = region[start:]
amine@369 299
amine@369 300 While slicing by raw samples is accurate, slicing with temporal indices is more
amine@377 301 intuitive. You can do so by accessing the `millis` or `seconds` views of an
amine@377 302 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369 303
amine@377 304 With the `millis` view:
amine@369 305
amine@369 306 .. code:: python
amine@369 307
amine@377 308 import auditok
amine@377 309 region = auditok.load("audio.wav")
amine@369 310 five_second_region = region.millis[5000:10000]
amine@369 311
amine@377 312 or with the `seconds` view:
amine@369 313
amine@369 314 .. code:: python
amine@369 315
amine@377 316 import auditok
amine@377 317 region = auditok.load("audio.wav")
amine@369 318 five_second_region = region.seconds[5:10]
amine@369 319
amine@377 320 `seconds` indices can also be floats:
amine@369 321
amine@369 322 .. code:: python
amine@369 323
amine@377 324 import auditok
amine@377 325 region = auditok.load("audio.wav")
amine@377 326 five_second_region = region.seconds[2.5:7.5]
amine@377 327
amine@377 328 Get arrays of audio samples
amine@377 329 ===========================
amine@377 330
amine@377 331 If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377 332 arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377 333 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377 334 and the second is the the sample.
amine@377 335
amine@377 336 .. code:: python
amine@377 337
amine@377 338 import auditok
amine@377 339 region = auditok.load("audio.wav")
amine@369 340 samples = region.samples
amine@369 341
amine@369 342
amine@377 343 If `numpy` is not installed you can use:
amine@369 344
amine@369 345 .. code:: python
amine@369 346
amine@369 347 import numpy as np
amine@377 348 region = auditok.load("audio.wav")
amine@369 349 samples = np.asarray(region)
amine@377 350 assert len(samples.shape) == 2