comparison doc/examples.rst @ 377:c6308873f239

Improve documentation, add more examples
author Amine Sehili <amine.sehili@gmail.com>
date Wed, 17 Feb 2021 21:18:05 +0100
parents 0106c4799906
children df2a320e10d5
comparison
equal deleted inserted replaced
376:d83cba0f8072 377:c6308873f239
1 Basic example 1 Loading audio data
2 ------------- 2 ------------------
3 3
4 .. code:: python 4 From a file
5 5 ===========
6 from auditok import split 6
7 If the first argument of `load` is a string, it should be a path to an audio
8 file.
9
10 .. code:: python
11
12 import auditok
13 region = auditok.load("audio.ogg")
14
15 If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
16 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
17 mandatory. In the following example we pass audio parameters with their short
18 names:
19
20 .. code:: python
21
22 region = auditok.load("audio.dat",
23 audio_format="raw",
24 sr=44100,
25 sw=2
26 ch=1)
27
28 From a `bytes` object
29 =====================
30
31 If the first argument is of type `bytes` it's interpreted as raw audio data:
32
33 .. code:: python
34
35 sr = 16000
36 sw = 2
37 ch = 1
38 data = b"\0" * sr * sw * ch
39 load(data, sr=sr, sw=sw, ch=ch)
40 print(region)
41
42 output:
43
44 .. code:: bash
45
46 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
47
48 From the microphone
49 ===================
50
51 If the first argument is `None`, `load` will try to read data from the microphone.
52 Audio parameters, as well as the `max_read` parameter are mandatory:
53
54
55 .. code:: python
56
57 sr = 16000
58 sw = 2
59 ch = 1
60 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
61 print(five_sec_audio)
62
63 output:
64
65 .. code:: bash
66
67 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
68
69
70 Skip part of audio data
71 =======================
72
73 If the `skip` parameter is > 0, `load` will skip that leading amount of audio
74 data:
75
76 .. code:: python
77
78 import auditok
79 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
80
81 This argument must be 0 when reading from the microphone.
82
83
84 Basic split example
85 -------------------
86
87 .. code:: python
88
89 import auditok
7 90
8 # split returns a generator of AudioRegion objects 91 # split returns a generator of AudioRegion objects
9 audio_regions = split("audio.wav") 92 audio_regions = auditok.split(
10 for region in audio_regions: 93 "audio.wav",
11 region.play(progress_bar=True) 94 min_dur=0.2, # minimum duration of a valid audio event in seconds
12 filename = region.save("/tmp/region_{meta.start:.3f}.wav") 95 max_dur=4, # maximum duration of an event
96 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
97 energy_threshold=55 # threshold of detection
98 )
99
100 for i, r in enumerate(audio_regions):
101
102 # Regions returned by `split` have 'start' and 'end' metadata fields
103 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
104
105 # play detection
106 # r.play(progress_bar=True)
107
108 # region's metadata can also be used with the `save` method
109 # (no need to explicitly specify region's object and `format` arguments)
110 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
13 print("region saved as: {}".format(filename)) 111 print("region saved as: {}".format(filename))
14 112
15 Example using `AudioRegion` 113 output example:
16 --------------------------- 114
17 115 .. code:: bash
18 .. code:: python 116
19 117 Region 0: 0.700s -- 1.400s
20 from auditok import AudioRegion 118 region saved as: region_0.700-1.400.wav
21 region = AudioRegion.load("audio.wav") 119 Region 1: 3.800s -- 4.500s
22 regions = region.split_and_plot() # or just region.splitp() 120 region saved as: region_3.800-4.500.wav
121 Region 2: 8.750s -- 9.950s
122 region saved as: region_8.750-9.950.wav
123 Region 3: 11.700s -- 12.400s
124 region saved as: region_11.700-12.400.wav
125 Region 4: 15.050s -- 15.850s
126 region saved as: region_15.050-15.850.wav
127
128
129 Split and plot
130 --------------
131
132 Visualize audio signal and detections:
133
134 .. code:: python
135
136 import auditok
137 region = auditok.load("audio.wav") # returns an AudioRegion object
138 regions = region.split_and_plot(...) # or just region.splitp()
23 139
24 output figure: 140 output figure:
25 141
26 .. image:: figures/example_1.png 142 .. image:: figures/example_1.png
143
144
145 Read and split data from the microphone
146 ---------------------------------------
147
148 If the first argument of `split` is None, audio data is read from the microphone
149 (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
150
151 .. code:: python
152
153 import auditok
154
155 sr = 16000
156 sw = 2
157 ch = 1
158 eth = 55 # alias for energy_threshold, default value is 50
159
160 try:
161 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
162 print(region)
163 region.play(progress_bar=True) # progress bar requires `tqdm`
164 except KeyboardInterrupt:
165 pass
166
167
168 `split` will continue reading audio data until you press ``Ctrl-C``. If you want
169 to read a specific amount of audio data, pass the desired number of seconds with
170 the `max_read` argument.
171
172
173 Accessing recorded data after split
174 -----------------------------------
175
176 Using a `Recorder` object you can get hold of acquired audio:
177
178
179 .. code:: python
180
181 import auditok
182
183 sr = 16000
184 sw = 2
185 ch = 1
186 eth = 55 # alias for energy_threshold, default value is 50
187
188 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
189
190 try:
191 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
192 print(region)
193 region.play(progress_bar=True) # progress bar requires `tqdm`
194 except KeyboardInterrupt:
195 pass
196
197 rec.rewind()
198 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
199
200
201 `Recorder` also accepts a `max_read` argument.
27 202
28 Working with AudioRegions 203 Working with AudioRegions
29 ------------------------- 204 -------------------------
30 205
31 Beyond splitting, there are a couple of interesting operations you can do with 206 Beyond splitting, there are a couple of interesting operations you can do with
32 `AudioRegion` objects. 207 `AudioRegion` objects.
33 208
209
210 Basic region information
211 ========================
212
213 .. code:: python
214
215 import auditok
216 region = auditok.load("audio.wav")
217 len(region) # number of audio samples int the regions, one channel considered
218 region.duration # duration in seconds
219 region.sampling_rate # alias `sr`
220 region.sample_width # alias `sw`
221 region.channels # alias `ch`
222
223
34 Concatenate regions 224 Concatenate regions
35 =================== 225 ===================
36 226
37 .. code:: python 227 .. code:: python
38 228
39 from auditok import AudioRegion 229 import auditok
40 region_1 = AudioRegion.load("audio_1.wav") 230 region_1 = auditok.load("audio_1.wav")
41 region_2 = AudioRegion.load("audio_2.wav") 231 region_2 = auditok.load("audio_2.wav")
42 region_3 = region_1 + region_2 232 region_3 = region_1 + region_2
43 233
44 Particularly useful if you want to join regions returned by ``split``: 234 Particularly useful if you want to join regions returned by `split`:
45 235
46 .. code:: python 236 .. code:: python
47 237
48 from auditok import AudioRegion 238 import auditok
49 regions = AudioRegion.load("audio.wav").split() 239 regions = auditok.load("audio.wav").split()
50 gapless_region = sum(regions) 240 gapless_region = sum(regions)
51 241
52 Repeat a region 242 Repeat a region
53 =============== 243 ===============
54 244
55 Multiply by a positive integer: 245 Multiply by a positive integer:
56 246
57 .. code:: python 247 .. code:: python
58 248
59 from auditok import AudioRegion 249 import auditok
60 region = AudioRegion.load("audio.wav") 250 region = auditok.load("audio.wav")
61 region_x3 = region * 3 251 region_x3 = region * 3
62 252
63 Make slices of equal size out of a region 253 Split one region into N regions of equal size
64 ========================================= 254 =============================================
65 255
66 Divide by a positive integer: 256 Divide by a positive integer:
67 257
68 .. code:: python 258 .. code:: python
69 259
70 from auditok import AudioRegion 260 import auditok
71 region = AudioRegion.load("audio.wav") 261 region = auditok.load("audio.wav")
72 regions = regions / 5 262 regions = regions / 5
73 assert sum(regions) == region 263 assert sum(regions) == region
74 264
75 Make audio slices of arbitrary size 265 Note that if perfect division is possible, the last region might be a bit shorter
76 =================================== 266 than the previous N-1 regions.
77 267
78 Slicing an ``AudioRegion`` can be interesting in many situations. You can for 268 Slice a region by samples, seconds or milliseconds
79 example remove a fixed-size portion of audio data from the beginning or the end 269 ==================================================
80 of a region or crop a region by an arbitrary amount as a data augmentation 270
271 Slicing an `AudioRegion` can be interesting in many situations. You can for
272 example remove a fixed-size portion of audio data from the beginning or from the
273 end of a region or crop a region by an arbitrary amount as a data augmentation
81 strategy, etc. 274 strategy, etc.
82 275
83 The most accurate way to slice an ``AudioRegion`` is to use indices that 276 The most accurate way to slice an `AudioRegion` is to use indices that
84 directly refer to raw audio samples. In the following example, assuming that the 277 directly refer to raw audio samples. In the following example, assuming that the
85 sampling rate of audio data is 16000, you can extract a 5-second region from 278 sampling rate of audio data is 16000, you can extract a 5-second region from
86 main region, starting from the 20th second as follows: 279 main region, starting from the 20th second as follows:
87 280
88 .. code:: python 281 .. code:: python
89 282
90 from auditok import AudioRegion 283 import auditok
91 region = AudioRegion.load("audio.wav") 284 region = auditok.load("audio.wav")
92 start = 20 * 16000 285 start = 20 * 16000
93 stop = 25 * 16000 286 stop = 25 * 16000
94 five_second_region = region[start:stop] 287 five_second_region = region[start:stop]
95 288
96 This allows you to practically start and stop at any sample within the region. 289 This allows you to practically start and stop at any audio sample of the region.
97 Just as with a `list` you can omit one of `start` and `stop`, or both. You can 290 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
98 also use negative indices: 291 also use negative indices:
99 292
100 .. code:: python 293 .. code:: python
101 294
102 from auditok import AudioRegion 295 import auditok
103 region = AudioRegion.load("audio.wav") 296 region = auditok.load("audio.wav")
104 start = -3 * region.sr # `sr` is an alias of `sampling_rate` 297 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
105 three_last_seconds = region[start:] 298 three_last_seconds = region[start:]
106 299
107 While slicing by raw samples is accurate, slicing with temporal indices is more 300 While slicing by raw samples is accurate, slicing with temporal indices is more
108 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of 301 intuitive. You can do so by accessing the `millis` or `seconds` views of an
109 ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). 302 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
110 303
111 With the ``millis`` view: 304 With the `millis` view:
112 305
113 .. code:: python 306 .. code:: python
114 307
115 from auditok import AudioRegion 308 import auditok
116 region = AudioRegion.load("audio.wav") 309 region = auditok.load("audio.wav")
117 five_second_region = region.millis[5000:10000] 310 five_second_region = region.millis[5000:10000]
118 311
119 or with the ``seconds`` view: 312 or with the `seconds` view:
120 313
121 .. code:: python 314 .. code:: python
122 315
123 from auditok import AudioRegion 316 import auditok
124 region = AudioRegion.load("audio.wav") 317 region = auditok.load("audio.wav")
125 five_second_region = region.seconds[5:10] 318 five_second_region = region.seconds[5:10]
126 319
127 Get an array of audio samples 320 `seconds` indices can also be floats:
128 ============================= 321
129 322 .. code:: python
130 .. code:: python 323
131 324 import auditok
132 from auditok import AudioRegion 325 region = auditok.load("audio.wav")
133 region = AudioRegion.load("audio.wav") 326 five_second_region = region.seconds[2.5:7.5]
327
328 Get arrays of audio samples
329 ===========================
330
331 If `numpy` is not installed, the `samples` attributes is a list of audio samples
332 arrays (standard `array.array` objects), one per channels. If numpy is installed,
333 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
334 and the second is the the sample.
335
336 .. code:: python
337
338 import auditok
339 region = auditok.load("audio.wav")
134 samples = region.samples 340 samples = region.samples
135 341
136 If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data 342
137 is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not 343 If `numpy` is not installed you can use:
138 installed this will return a standard ``array.array`` for mono data, and a list
139 of ``array.array`` for multichannel data.
140
141 Alternatively you can use:
142 344
143 .. code:: python 345 .. code:: python
144 346
145 import numpy as np 347 import numpy as np
146 region = AudioRegion.load("audio.wav") 348 region = auditok.load("audio.wav")
147 samples = np.asarray(region) 349 samples = np.asarray(region)
350 assert len(samples.shape) == 2