auditok: doc/examples.rst annotate

annotate doc/examples.rst @ 377:c6308873f239

Improve documentation, add more examples

author	Amine Sehili <amine.sehili@gmail.com>
date	Wed, 17 Feb 2021 21:18:05 +0100
parents	0106c4799906
children	df2a320e10d5

rev	line source
amine@377	1 Loading audio data
amine@377	2 ------------------
amine@377	3
amine@377	4 From a file
amine@377	5 ===========
amine@377	6
amine@377	7 If the first argument of `load` is a string, it should be a path to an audio
amine@377	8 file.
amine@369	9
amine@369	10 .. code:: python
amine@369	11
amine@377	12 import auditok
amine@377	13 region = auditok.load("audio.ogg")
amine@369	14
amine@377	15 If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
amine@377	16 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377	17 mandatory. In the following example we pass audio parameters with their short
amine@377	18 names:
amine@369	19
amine@369	20 .. code:: python
amine@369	21
amine@377	22 region = auditok.load("audio.dat",
amine@377	23 audio_format="raw",
amine@377	24 sr=44100,
amine@377	25 sw=2
amine@377	26 ch=1)
amine@377	27
amine@377	28 From a `bytes` object
amine@377	29 =====================
amine@377	30
amine@377	31 If the first argument is of type `bytes` it's interpreted as raw audio data:
amine@377	32
amine@377	33 .. code:: python
amine@377	34
amine@377	35 sr = 16000
amine@377	36 sw = 2
amine@377	37 ch = 1
amine@377	38 data = b"\0" * sr * sw * ch
amine@377	39 load(data, sr=sr, sw=sw, ch=ch)
amine@377	40 print(region)
amine@377	41
amine@377	42 output:
amine@377	43
amine@377	44 .. code:: bash
amine@377	45
amine@377	46 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	47
amine@377	48 From the microphone
amine@377	49 ===================
amine@377	50
amine@377	51 If the first argument is `None`, `load` will try to read data from the microphone.
amine@377	52 Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377	53
amine@377	54
amine@377	55 .. code:: python
amine@377	56
amine@377	57 sr = 16000
amine@377	58 sw = 2
amine@377	59 ch = 1
amine@377	60 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377	61 print(five_sec_audio)
amine@377	62
amine@377	63 output:
amine@377	64
amine@377	65 .. code:: bash
amine@377	66
amine@377	67 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	68
amine@377	69
amine@377	70 Skip part of audio data
amine@377	71 =======================
amine@377	72
amine@377	73 If the `skip` parameter is > 0, `load` will skip that leading amount of audio
amine@377	74 data:
amine@377	75
amine@377	76 .. code:: python
amine@377	77
amine@377	78 import auditok
amine@377	79 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377	80
amine@377	81 This argument must be 0 when reading from the microphone.
amine@377	82
amine@377	83
amine@377	84 Basic split example
amine@377	85 -------------------
amine@377	86
amine@377	87 .. code:: python
amine@377	88
amine@377	89 import auditok
amine@377	90
amine@377	91 # split returns a generator of AudioRegion objects
amine@377	92 audio_regions = auditok.split(
amine@377	93 "audio.wav",
amine@377	94 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@377	95 max_dur=4, # maximum duration of an event
amine@377	96 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377	97 energy_threshold=55 # threshold of detection
amine@377	98 )
amine@377	99
amine@377	100 for i, r in enumerate(audio_regions):
amine@377	101
amine@377	102 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377	103 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377	104
amine@377	105 # play detection
amine@377	106 # r.play(progress_bar=True)
amine@377	107
amine@377	108 # region's metadata can also be used with the `save` method
amine@377	109 # (no need to explicitly specify region's object and `format` arguments)
amine@377	110 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377	111 print("region saved as: {}".format(filename))
amine@377	112
amine@377	113 output example:
amine@377	114
amine@377	115 .. code:: bash
amine@377	116
amine@377	117 Region 0: 0.700s -- 1.400s
amine@377	118 region saved as: region_0.700-1.400.wav
amine@377	119 Region 1: 3.800s -- 4.500s
amine@377	120 region saved as: region_3.800-4.500.wav
amine@377	121 Region 2: 8.750s -- 9.950s
amine@377	122 region saved as: region_8.750-9.950.wav
amine@377	123 Region 3: 11.700s -- 12.400s
amine@377	124 region saved as: region_11.700-12.400.wav
amine@377	125 Region 4: 15.050s -- 15.850s
amine@377	126 region saved as: region_15.050-15.850.wav
amine@377	127
amine@377	128
amine@377	129 Split and plot
amine@377	130 --------------
amine@377	131
amine@377	132 Visualize audio signal and detections:
amine@377	133
amine@377	134 .. code:: python
amine@377	135
amine@377	136 import auditok
amine@377	137 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377	138 regions = region.split_and_plot(...) # or just region.splitp()
amine@369	139
amine@369	140 output figure:
amine@369	141
amine@369	142 .. image:: figures/example_1.png
amine@369	143
amine@377	144
amine@377	145 Read and split data from the microphone
amine@377	146 ---------------------------------------
amine@377	147
amine@377	148 If the first argument of `split` is None, audio data is read from the microphone
amine@377	149 (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377	150
amine@377	151 .. code:: python
amine@377	152
amine@377	153 import auditok
amine@377	154
amine@377	155 sr = 16000
amine@377	156 sw = 2
amine@377	157 ch = 1
amine@377	158 eth = 55 # alias for energy_threshold, default value is 50
amine@377	159
amine@377	160 try:
amine@377	161 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	162 print(region)
amine@377	163 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	164 except KeyboardInterrupt:
amine@377	165 pass
amine@377	166
amine@377	167
amine@377	168 `split` will continue reading audio data until you press ``Ctrl-C``. If you want
amine@377	169 to read a specific amount of audio data, pass the desired number of seconds with
amine@377	170 the `max_read` argument.
amine@377	171
amine@377	172
amine@377	173 Accessing recorded data after split
amine@377	174 -----------------------------------
amine@377	175
amine@377	176 Using a `Recorder` object you can get hold of acquired audio:
amine@377	177
amine@377	178
amine@377	179 .. code:: python
amine@377	180
amine@377	181 import auditok
amine@377	182
amine@377	183 sr = 16000
amine@377	184 sw = 2
amine@377	185 ch = 1
amine@377	186 eth = 55 # alias for energy_threshold, default value is 50
amine@377	187
amine@377	188 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377	189
amine@377	190 try:
amine@377	191 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	192 print(region)
amine@377	193 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	194 except KeyboardInterrupt:
amine@377	195 pass
amine@377	196
amine@377	197 rec.rewind()
amine@377	198 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@377	199
amine@377	200
amine@377	201 `Recorder` also accepts a `max_read` argument.
amine@377	202
amine@369	203 Working with AudioRegions
amine@369	204 -------------------------
amine@369	205
amine@369	206 Beyond splitting, there are a couple of interesting operations you can do with
amine@369	207 `AudioRegion` objects.
amine@369	208
amine@377	209
amine@377	210 Basic region information
amine@377	211 ========================
amine@377	212
amine@377	213 .. code:: python
amine@377	214
amine@377	215 import auditok
amine@377	216 region = auditok.load("audio.wav")
amine@377	217 len(region) # number of audio samples int the regions, one channel considered
amine@377	218 region.duration # duration in seconds
amine@377	219 region.sampling_rate # alias `sr`
amine@377	220 region.sample_width # alias `sw`
amine@377	221 region.channels # alias `ch`
amine@377	222
amine@377	223
amine@369	224 Concatenate regions
amine@369	225 ===================
amine@369	226
amine@369	227 .. code:: python
amine@369	228
amine@377	229 import auditok
amine@377	230 region_1 = auditok.load("audio_1.wav")
amine@377	231 region_2 = auditok.load("audio_2.wav")
amine@369	232 region_3 = region_1 + region_2
amine@369	233
amine@377	234 Particularly useful if you want to join regions returned by `split`:
amine@369	235
amine@369	236 .. code:: python
amine@369	237
amine@377	238 import auditok
amine@377	239 regions = auditok.load("audio.wav").split()
amine@369	240 gapless_region = sum(regions)
amine@369	241
amine@369	242 Repeat a region
amine@369	243 ===============
amine@369	244
amine@369	245 Multiply by a positive integer:
amine@369	246
amine@369	247 .. code:: python
amine@369	248
amine@377	249 import auditok
amine@377	250 region = auditok.load("audio.wav")
amine@369	251 region_x3 = region * 3
amine@369	252
amine@377	253 Split one region into N regions of equal size
amine@377	254 =============================================
amine@369	255
amine@369	256 Divide by a positive integer:
amine@369	257
amine@369	258 .. code:: python
amine@369	259
amine@377	260 import auditok
amine@377	261 region = auditok.load("audio.wav")
amine@369	262 regions = regions / 5
amine@369	263 assert sum(regions) == region
amine@369	264
amine@377	265 Note that if perfect division is possible, the last region might be a bit shorter
amine@377	266 than the previous N-1 regions.
amine@369	267
amine@377	268 Slice a region by samples, seconds or milliseconds
amine@377	269 ==================================================
amine@377	270
amine@377	271 Slicing an `AudioRegion` can be interesting in many situations. You can for
amine@377	272 example remove a fixed-size portion of audio data from the beginning or from the
amine@377	273 end of a region or crop a region by an arbitrary amount as a data augmentation
amine@369	274 strategy, etc.
amine@369	275
amine@377	276 The most accurate way to slice an `AudioRegion` is to use indices that
amine@369	277 directly refer to raw audio samples. In the following example, assuming that the
amine@369	278 sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369	279 main region, starting from the 20th second as follows:
amine@369	280
amine@369	281 .. code:: python
amine@369	282
amine@377	283 import auditok
amine@377	284 region = auditok.load("audio.wav")
amine@369	285 start = 20 * 16000
amine@369	286 stop = 25 * 16000
amine@369	287 five_second_region = region[start:stop]
amine@369	288
amine@377	289 This allows you to practically start and stop at any audio sample of the region.
amine@369	290 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369	291 also use negative indices:
amine@369	292
amine@369	293 .. code:: python
amine@369	294
amine@377	295 import auditok
amine@377	296 region = auditok.load("audio.wav")
amine@369	297 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369	298 three_last_seconds = region[start:]
amine@369	299
amine@369	300 While slicing by raw samples is accurate, slicing with temporal indices is more
amine@377	301 intuitive. You can do so by accessing the `millis` or `seconds` views of an
amine@377	302 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369	303
amine@377	304 With the `millis` view:
amine@369	305
amine@369	306 .. code:: python
amine@369	307
amine@377	308 import auditok
amine@377	309 region = auditok.load("audio.wav")
amine@369	310 five_second_region = region.millis[5000:10000]
amine@369	311
amine@377	312 or with the `seconds` view:
amine@369	313
amine@369	314 .. code:: python
amine@369	315
amine@377	316 import auditok
amine@377	317 region = auditok.load("audio.wav")
amine@369	318 five_second_region = region.seconds[5:10]
amine@369	319
amine@377	320 `seconds` indices can also be floats:
amine@369	321
amine@369	322 .. code:: python
amine@369	323
amine@377	324 import auditok
amine@377	325 region = auditok.load("audio.wav")
amine@377	326 five_second_region = region.seconds[2.5:7.5]
amine@377	327
amine@377	328 Get arrays of audio samples
amine@377	329 ===========================
amine@377	330
amine@377	331 If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377	332 arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377	333 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377	334 and the second is the the sample.
amine@377	335
amine@377	336 .. code:: python
amine@377	337
amine@377	338 import auditok
amine@377	339 region = auditok.load("audio.wav")
amine@369	340 samples = region.samples
amine@369	341
amine@369	342
amine@377	343 If `numpy` is not installed you can use:
amine@369	344
amine@369	345 .. code:: python
amine@369	346
amine@369	347 import numpy as np
amine@377	348 region = auditok.load("audio.wav")
amine@369	349 samples = np.asarray(region)
amine@377	350 assert len(samples.shape) == 2

Mercurial > hg > auditok

annotate doc/examples.rst @ 377:c6308873f239