Mercurial > hg > auditok
comparison doc/examples.rst @ 377:c6308873f239
Improve documentation, add more examples
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 17 Feb 2021 21:18:05 +0100 |
parents | 0106c4799906 |
children | df2a320e10d5 |
comparison
equal
deleted
inserted
replaced
376:d83cba0f8072 | 377:c6308873f239 |
---|---|
1 Basic example | 1 Loading audio data |
2 ------------- | 2 ------------------ |
3 | 3 |
4 .. code:: python | 4 From a file |
5 | 5 =========== |
6 from auditok import split | 6 |
7 If the first argument of `load` is a string, it should be a path to an audio | |
8 file. | |
9 | |
10 .. code:: python | |
11 | |
12 import auditok | |
13 region = auditok.load("audio.ogg") | |
14 | |
15 If input file contains a raw (headerless) audio data, passing `audio_format="raw"` | |
16 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is | |
17 mandatory. In the following example we pass audio parameters with their short | |
18 names: | |
19 | |
20 .. code:: python | |
21 | |
22 region = auditok.load("audio.dat", | |
23 audio_format="raw", | |
24 sr=44100, | |
25 sw=2 | |
26 ch=1) | |
27 | |
28 From a `bytes` object | |
29 ===================== | |
30 | |
31 If the first argument is of type `bytes` it's interpreted as raw audio data: | |
32 | |
33 .. code:: python | |
34 | |
35 sr = 16000 | |
36 sw = 2 | |
37 ch = 1 | |
38 data = b"\0" * sr * sw * ch | |
39 load(data, sr=sr, sw=sw, ch=ch) | |
40 print(region) | |
41 | |
42 output: | |
43 | |
44 .. code:: bash | |
45 | |
46 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) | |
47 | |
48 From the microphone | |
49 =================== | |
50 | |
51 If the first argument is `None`, `load` will try to read data from the microphone. | |
52 Audio parameters, as well as the `max_read` parameter are mandatory: | |
53 | |
54 | |
55 .. code:: python | |
56 | |
57 sr = 16000 | |
58 sw = 2 | |
59 ch = 1 | |
60 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5) | |
61 print(five_sec_audio) | |
62 | |
63 output: | |
64 | |
65 .. code:: bash | |
66 | |
67 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1) | |
68 | |
69 | |
70 Skip part of audio data | |
71 ======================= | |
72 | |
73 If the `skip` parameter is > 0, `load` will skip that leading amount of audio | |
74 data: | |
75 | |
76 .. code:: python | |
77 | |
78 import auditok | |
79 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds | |
80 | |
81 This argument must be 0 when reading from the microphone. | |
82 | |
83 | |
84 Basic split example | |
85 ------------------- | |
86 | |
87 .. code:: python | |
88 | |
89 import auditok | |
7 | 90 |
8 # split returns a generator of AudioRegion objects | 91 # split returns a generator of AudioRegion objects |
9 audio_regions = split("audio.wav") | 92 audio_regions = auditok.split( |
10 for region in audio_regions: | 93 "audio.wav", |
11 region.play(progress_bar=True) | 94 min_dur=0.2, # minimum duration of a valid audio event in seconds |
12 filename = region.save("/tmp/region_{meta.start:.3f}.wav") | 95 max_dur=4, # maximum duration of an event |
96 max_silence=0.3, # maximum duration of tolerated continuous silence within an event | |
97 energy_threshold=55 # threshold of detection | |
98 ) | |
99 | |
100 for i, r in enumerate(audio_regions): | |
101 | |
102 # Regions returned by `split` have 'start' and 'end' metadata fields | |
103 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) | |
104 | |
105 # play detection | |
106 # r.play(progress_bar=True) | |
107 | |
108 # region's metadata can also be used with the `save` method | |
109 # (no need to explicitly specify region's object and `format` arguments) | |
110 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") | |
13 print("region saved as: {}".format(filename)) | 111 print("region saved as: {}".format(filename)) |
14 | 112 |
15 Example using `AudioRegion` | 113 output example: |
16 --------------------------- | 114 |
17 | 115 .. code:: bash |
18 .. code:: python | 116 |
19 | 117 Region 0: 0.700s -- 1.400s |
20 from auditok import AudioRegion | 118 region saved as: region_0.700-1.400.wav |
21 region = AudioRegion.load("audio.wav") | 119 Region 1: 3.800s -- 4.500s |
22 regions = region.split_and_plot() # or just region.splitp() | 120 region saved as: region_3.800-4.500.wav |
121 Region 2: 8.750s -- 9.950s | |
122 region saved as: region_8.750-9.950.wav | |
123 Region 3: 11.700s -- 12.400s | |
124 region saved as: region_11.700-12.400.wav | |
125 Region 4: 15.050s -- 15.850s | |
126 region saved as: region_15.050-15.850.wav | |
127 | |
128 | |
129 Split and plot | |
130 -------------- | |
131 | |
132 Visualize audio signal and detections: | |
133 | |
134 .. code:: python | |
135 | |
136 import auditok | |
137 region = auditok.load("audio.wav") # returns an AudioRegion object | |
138 regions = region.split_and_plot(...) # or just region.splitp() | |
23 | 139 |
24 output figure: | 140 output figure: |
25 | 141 |
26 .. image:: figures/example_1.png | 142 .. image:: figures/example_1.png |
143 | |
144 | |
145 Read and split data from the microphone | |
146 --------------------------------------- | |
147 | |
148 If the first argument of `split` is None, audio data is read from the microphone | |
149 (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): | |
150 | |
151 .. code:: python | |
152 | |
153 import auditok | |
154 | |
155 sr = 16000 | |
156 sw = 2 | |
157 ch = 1 | |
158 eth = 55 # alias for energy_threshold, default value is 50 | |
159 | |
160 try: | |
161 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth): | |
162 print(region) | |
163 region.play(progress_bar=True) # progress bar requires `tqdm` | |
164 except KeyboardInterrupt: | |
165 pass | |
166 | |
167 | |
168 `split` will continue reading audio data until you press ``Ctrl-C``. If you want | |
169 to read a specific amount of audio data, pass the desired number of seconds with | |
170 the `max_read` argument. | |
171 | |
172 | |
173 Accessing recorded data after split | |
174 ----------------------------------- | |
175 | |
176 Using a `Recorder` object you can get hold of acquired audio: | |
177 | |
178 | |
179 .. code:: python | |
180 | |
181 import auditok | |
182 | |
183 sr = 16000 | |
184 sw = 2 | |
185 ch = 1 | |
186 eth = 55 # alias for energy_threshold, default value is 50 | |
187 | |
188 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) | |
189 | |
190 try: | |
191 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): | |
192 print(region) | |
193 region.play(progress_bar=True) # progress bar requires `tqdm` | |
194 except KeyboardInterrupt: | |
195 pass | |
196 | |
197 rec.rewind() | |
198 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) | |
199 | |
200 | |
201 `Recorder` also accepts a `max_read` argument. | |
27 | 202 |
28 Working with AudioRegions | 203 Working with AudioRegions |
29 ------------------------- | 204 ------------------------- |
30 | 205 |
31 Beyond splitting, there are a couple of interesting operations you can do with | 206 Beyond splitting, there are a couple of interesting operations you can do with |
32 `AudioRegion` objects. | 207 `AudioRegion` objects. |
33 | 208 |
209 | |
210 Basic region information | |
211 ======================== | |
212 | |
213 .. code:: python | |
214 | |
215 import auditok | |
216 region = auditok.load("audio.wav") | |
217 len(region) # number of audio samples int the regions, one channel considered | |
218 region.duration # duration in seconds | |
219 region.sampling_rate # alias `sr` | |
220 region.sample_width # alias `sw` | |
221 region.channels # alias `ch` | |
222 | |
223 | |
34 Concatenate regions | 224 Concatenate regions |
35 =================== | 225 =================== |
36 | 226 |
37 .. code:: python | 227 .. code:: python |
38 | 228 |
39 from auditok import AudioRegion | 229 import auditok |
40 region_1 = AudioRegion.load("audio_1.wav") | 230 region_1 = auditok.load("audio_1.wav") |
41 region_2 = AudioRegion.load("audio_2.wav") | 231 region_2 = auditok.load("audio_2.wav") |
42 region_3 = region_1 + region_2 | 232 region_3 = region_1 + region_2 |
43 | 233 |
44 Particularly useful if you want to join regions returned by ``split``: | 234 Particularly useful if you want to join regions returned by `split`: |
45 | 235 |
46 .. code:: python | 236 .. code:: python |
47 | 237 |
48 from auditok import AudioRegion | 238 import auditok |
49 regions = AudioRegion.load("audio.wav").split() | 239 regions = auditok.load("audio.wav").split() |
50 gapless_region = sum(regions) | 240 gapless_region = sum(regions) |
51 | 241 |
52 Repeat a region | 242 Repeat a region |
53 =============== | 243 =============== |
54 | 244 |
55 Multiply by a positive integer: | 245 Multiply by a positive integer: |
56 | 246 |
57 .. code:: python | 247 .. code:: python |
58 | 248 |
59 from auditok import AudioRegion | 249 import auditok |
60 region = AudioRegion.load("audio.wav") | 250 region = auditok.load("audio.wav") |
61 region_x3 = region * 3 | 251 region_x3 = region * 3 |
62 | 252 |
63 Make slices of equal size out of a region | 253 Split one region into N regions of equal size |
64 ========================================= | 254 ============================================= |
65 | 255 |
66 Divide by a positive integer: | 256 Divide by a positive integer: |
67 | 257 |
68 .. code:: python | 258 .. code:: python |
69 | 259 |
70 from auditok import AudioRegion | 260 import auditok |
71 region = AudioRegion.load("audio.wav") | 261 region = auditok.load("audio.wav") |
72 regions = regions / 5 | 262 regions = regions / 5 |
73 assert sum(regions) == region | 263 assert sum(regions) == region |
74 | 264 |
75 Make audio slices of arbitrary size | 265 Note that if perfect division is possible, the last region might be a bit shorter |
76 =================================== | 266 than the previous N-1 regions. |
77 | 267 |
78 Slicing an ``AudioRegion`` can be interesting in many situations. You can for | 268 Slice a region by samples, seconds or milliseconds |
79 example remove a fixed-size portion of audio data from the beginning or the end | 269 ================================================== |
80 of a region or crop a region by an arbitrary amount as a data augmentation | 270 |
271 Slicing an `AudioRegion` can be interesting in many situations. You can for | |
272 example remove a fixed-size portion of audio data from the beginning or from the | |
273 end of a region or crop a region by an arbitrary amount as a data augmentation | |
81 strategy, etc. | 274 strategy, etc. |
82 | 275 |
83 The most accurate way to slice an ``AudioRegion`` is to use indices that | 276 The most accurate way to slice an `AudioRegion` is to use indices that |
84 directly refer to raw audio samples. In the following example, assuming that the | 277 directly refer to raw audio samples. In the following example, assuming that the |
85 sampling rate of audio data is 16000, you can extract a 5-second region from | 278 sampling rate of audio data is 16000, you can extract a 5-second region from |
86 main region, starting from the 20th second as follows: | 279 main region, starting from the 20th second as follows: |
87 | 280 |
88 .. code:: python | 281 .. code:: python |
89 | 282 |
90 from auditok import AudioRegion | 283 import auditok |
91 region = AudioRegion.load("audio.wav") | 284 region = auditok.load("audio.wav") |
92 start = 20 * 16000 | 285 start = 20 * 16000 |
93 stop = 25 * 16000 | 286 stop = 25 * 16000 |
94 five_second_region = region[start:stop] | 287 five_second_region = region[start:stop] |
95 | 288 |
96 This allows you to practically start and stop at any sample within the region. | 289 This allows you to practically start and stop at any audio sample of the region. |
97 Just as with a `list` you can omit one of `start` and `stop`, or both. You can | 290 Just as with a `list` you can omit one of `start` and `stop`, or both. You can |
98 also use negative indices: | 291 also use negative indices: |
99 | 292 |
100 .. code:: python | 293 .. code:: python |
101 | 294 |
102 from auditok import AudioRegion | 295 import auditok |
103 region = AudioRegion.load("audio.wav") | 296 region = auditok.load("audio.wav") |
104 start = -3 * region.sr # `sr` is an alias of `sampling_rate` | 297 start = -3 * region.sr # `sr` is an alias of `sampling_rate` |
105 three_last_seconds = region[start:] | 298 three_last_seconds = region[start:] |
106 | 299 |
107 While slicing by raw samples is accurate, slicing with temporal indices is more | 300 While slicing by raw samples is accurate, slicing with temporal indices is more |
108 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of | 301 intuitive. You can do so by accessing the `millis` or `seconds` views of an |
109 ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). | 302 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). |
110 | 303 |
111 With the ``millis`` view: | 304 With the `millis` view: |
112 | 305 |
113 .. code:: python | 306 .. code:: python |
114 | 307 |
115 from auditok import AudioRegion | 308 import auditok |
116 region = AudioRegion.load("audio.wav") | 309 region = auditok.load("audio.wav") |
117 five_second_region = region.millis[5000:10000] | 310 five_second_region = region.millis[5000:10000] |
118 | 311 |
119 or with the ``seconds`` view: | 312 or with the `seconds` view: |
120 | 313 |
121 .. code:: python | 314 .. code:: python |
122 | 315 |
123 from auditok import AudioRegion | 316 import auditok |
124 region = AudioRegion.load("audio.wav") | 317 region = auditok.load("audio.wav") |
125 five_second_region = region.seconds[5:10] | 318 five_second_region = region.seconds[5:10] |
126 | 319 |
127 Get an array of audio samples | 320 `seconds` indices can also be floats: |
128 ============================= | 321 |
129 | 322 .. code:: python |
130 .. code:: python | 323 |
131 | 324 import auditok |
132 from auditok import AudioRegion | 325 region = auditok.load("audio.wav") |
133 region = AudioRegion.load("audio.wav") | 326 five_second_region = region.seconds[2.5:7.5] |
327 | |
328 Get arrays of audio samples | |
329 =========================== | |
330 | |
331 If `numpy` is not installed, the `samples` attributes is a list of audio samples | |
332 arrays (standard `array.array` objects), one per channels. If numpy is installed, | |
333 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel | |
334 and the second is the the sample. | |
335 | |
336 .. code:: python | |
337 | |
338 import auditok | |
339 region = auditok.load("audio.wav") | |
134 samples = region.samples | 340 samples = region.samples |
135 | 341 |
136 If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data | 342 |
137 is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not | 343 If `numpy` is not installed you can use: |
138 installed this will return a standard ``array.array`` for mono data, and a list | |
139 of ``array.array`` for multichannel data. | |
140 | |
141 Alternatively you can use: | |
142 | 344 |
143 .. code:: python | 345 .. code:: python |
144 | 346 |
145 import numpy as np | 347 import numpy as np |
146 region = AudioRegion.load("audio.wav") | 348 region = auditok.load("audio.wav") |
147 samples = np.asarray(region) | 349 samples = np.asarray(region) |
350 assert len(samples.shape) == 2 |