auditok: doc/examples.rst annotate

annotate doc/examples.rst @ 397:c89c0977db47

Update README.rst

author	Amine SEHILI <amsehili@users.noreply.github.com>
date	Thu, 30 Mar 2023 10:31:43 +0200
parents	bd242e80455f
children	81bc2375354f

rev	line source
amine@387	1 Load audio data
amine@387	2 ---------------
amine@377	3
amine@379	4 Audio data is loaded with the :func:`load` function which can read from audio
amine@379	5 files, the microphone or use raw audio data.
amine@379	6
amine@377	7 From a file
amine@377	8 ===========
amine@377	9
amine@387	10 If the first argument of :func:`load` is a string, it should be a path to an
amine@387	11 audio file.
amine@369	12
amine@369	13 .. code:: python
amine@369	14
amine@377	15 import auditok
amine@377	16 region = auditok.load("audio.ogg")
amine@369	17
amine@387	18 If input file contains raw (headerless) audio data, passing `audio_format="raw"`
amine@377	19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377	20 mandatory. In the following example we pass audio parameters with their short
amine@377	21 names:
amine@369	22
amine@369	23 .. code:: python
amine@369	24
amine@377	25 region = auditok.load("audio.dat",
amine@377	26 audio_format="raw",
amine@379	27 sr=44100, # alias for `sampling_rate`
amine@379	28 sw=2 # alias for `sample_width`
amine@379	29 ch=1 # alias for `channels`
amine@379	30 )
amine@377	31
amine@377	32 From a `bytes` object
amine@377	33 =====================
amine@377	34
amine@379	35 If the type of the first argument `bytes`, it's interpreted as raw audio data:
amine@377	36
amine@377	37 .. code:: python
amine@377	38
amine@377	39 sr = 16000
amine@377	40 sw = 2
amine@377	41 ch = 1
amine@377	42 data = b"\0" * sr * sw * ch
amine@379	43 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377	44 print(region)
amine@387	45 # alternatively you can use
amine@387	46 #region = auditok.AudioRegion(data, sr, sw, ch)
amine@377	47
amine@377	48 output:
amine@377	49
amine@377	50 .. code:: bash
amine@377	51
amine@377	52 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	53
amine@377	54 From the microphone
amine@377	55 ===================
amine@377	56
amine@379	57 If the first argument is `None`, :func:`load` will try to read data from the
amine@379	58 microphone. Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377	59
amine@377	60
amine@377	61 .. code:: python
amine@377	62
amine@377	63 sr = 16000
amine@377	64 sw = 2
amine@377	65 ch = 1
amine@377	66 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377	67 print(five_sec_audio)
amine@377	68
amine@377	69 output:
amine@377	70
amine@377	71 .. code:: bash
amine@377	72
amine@377	73 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	74
amine@377	75
amine@377	76 Skip part of audio data
amine@377	77 =======================
amine@377	78
amine@387	79 If the `skip` parameter is > 0, :func:`load` will skip that amount in seconds
amine@387	80 of leading audio data:
amine@377	81
amine@377	82 .. code:: python
amine@377	83
amine@377	84 import auditok
amine@377	85 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377	86
amine@387	87 This argument must be 0 when reading data from the microphone.
amine@387	88
amine@387	89
amine@387	90 Limit the amount of read audio
amine@387	91 ==============================
amine@387	92
amine@387	93 If the `max_read` parameter is > 0, :func:`load` will read at most that amount
amine@387	94 in seconds of audio data:
amine@387	95
amine@387	96 .. code:: python
amine@387	97
amine@387	98 import auditok
amine@387	99 region = auditok.load("audio.ogg", max_read=5)
amine@387	100 assert region.duration <= 5
amine@387	101
amine@387	102 This argument is mandatory when reading data from the microphone.
amine@377	103
amine@377	104
amine@377	105 Basic split example
amine@377	106 -------------------
amine@377	107
amine@379	108 In the following we'll use the :func:`split` function to tokenize an audio file,
amine@379	109 requiring that valid audio events be at least 0.2 second long, at most 4 seconds
amine@379	110 long and contain a maximum of 0.3 second of continuous silence. Limiting the size
amine@379	111 of detected events to 4 seconds means that an event of, say, 9.5 seconds will
amine@379	112 be returned as two 4-second events plus a third 1.5-second event. Moreover, a
amine@379	113 valid event might contain many silences as far as none of them exceeds 0.3
amine@379	114 second.
amine@379	115
amine@379	116 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
amine@379	117 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
amine@379	118 with another region (see examples below). Notice that :class:`AudioRegion` objects
amine@379	119 returned by :func:`split` have a ``start`` a ``stop`` information stored in
amine@379	120 their meta data that can be accessed like `object.meta.start`.
amine@379	121
amine@377	122 .. code:: python
amine@377	123
amine@377	124 import auditok
amine@377	125
amine@377	126 # split returns a generator of AudioRegion objects
amine@377	127 audio_regions = auditok.split(
amine@377	128 "audio.wav",
amine@377	129 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@377	130 max_dur=4, # maximum duration of an event
amine@377	131 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377	132 energy_threshold=55 # threshold of detection
amine@377	133 )
amine@377	134
amine@377	135 for i, r in enumerate(audio_regions):
amine@377	136
amine@377	137 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377	138 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377	139
amine@377	140 # play detection
amine@377	141 # r.play(progress_bar=True)
amine@377	142
amine@377	143 # region's metadata can also be used with the `save` method
amine@377	144 # (no need to explicitly specify region's object and `format` arguments)
amine@377	145 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377	146 print("region saved as: {}".format(filename))
amine@377	147
amine@377	148 output example:
amine@377	149
amine@377	150 .. code:: bash
amine@377	151
amine@377	152 Region 0: 0.700s -- 1.400s
amine@377	153 region saved as: region_0.700-1.400.wav
amine@377	154 Region 1: 3.800s -- 4.500s
amine@377	155 region saved as: region_3.800-4.500.wav
amine@377	156 Region 2: 8.750s -- 9.950s
amine@377	157 region saved as: region_8.750-9.950.wav
amine@377	158 Region 3: 11.700s -- 12.400s
amine@377	159 region saved as: region_11.700-12.400.wav
amine@377	160 Region 4: 15.050s -- 15.850s
amine@377	161 region saved as: region_15.050-15.850.wav
amine@377	162
amine@377	163
amine@377	164 Split and plot
amine@377	165 --------------
amine@377	166
amine@377	167 Visualize audio signal and detections:
amine@377	168
amine@377	169 .. code:: python
amine@377	170
amine@377	171 import auditok
amine@377	172 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377	173 regions = region.split_and_plot(...) # or just region.splitp()
amine@369	174
amine@369	175 output figure:
amine@369	176
amine@369	177 .. image:: figures/example_1.png
amine@369	178
amine@377	179
amine@377	180 Read and split data from the microphone
amine@377	181 ---------------------------------------
amine@377	182
amine@379	183 If the first argument of :func:`split` is None, audio data is read from the
amine@379	184 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377	185
amine@377	186 .. code:: python
amine@377	187
amine@377	188 import auditok
amine@377	189
amine@377	190 sr = 16000
amine@377	191 sw = 2
amine@377	192 ch = 1
amine@377	193 eth = 55 # alias for energy_threshold, default value is 50
amine@377	194
amine@377	195 try:
amine@377	196 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	197 print(region)
amine@377	198 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	199 except KeyboardInterrupt:
amine@377	200 pass
amine@377	201
amine@377	202
amine@379	203 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If
amine@379	204 you want to read a specific amount of audio data, pass the desired number of
amine@379	205 seconds with the `max_read` argument.
amine@377	206
amine@377	207
amine@387	208 Access recorded data after split
amine@387	209 --------------------------------
amine@377	210
amine@379	211 Using a :class:`Recorder` object you can get hold of acquired audio data:
amine@377	212
amine@377	213
amine@377	214 .. code:: python
amine@377	215
amine@377	216 import auditok
amine@377	217
amine@377	218 sr = 16000
amine@377	219 sw = 2
amine@377	220 ch = 1
amine@377	221 eth = 55 # alias for energy_threshold, default value is 50
amine@377	222
amine@377	223 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377	224
amine@377	225 try:
amine@377	226 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	227 print(region)
amine@377	228 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	229 except KeyboardInterrupt:
amine@377	230 pass
amine@377	231
amine@377	232 rec.rewind()
amine@377	233 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379	234 # alternatively you can use
amine@379	235 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@377	236
amine@377	237
amine@379	238 :class:`Recorder` also accepts a `max_read` argument.
amine@377	239
amine@369	240 Working with AudioRegions
amine@369	241 -------------------------
amine@369	242
amine@379	243 The following are a couple of interesting operations you can do with
amine@379	244 :class:`AudioRegion` objects.
amine@369	245
amine@377	246
amine@377	247 Basic region information
amine@377	248 ========================
amine@377	249
amine@377	250 .. code:: python
amine@377	251
amine@377	252 import auditok
amine@377	253 region = auditok.load("audio.wav")
amine@377	254 len(region) # number of audio samples int the regions, one channel considered
amine@377	255 region.duration # duration in seconds
amine@377	256 region.sampling_rate # alias `sr`
amine@377	257 region.sample_width # alias `sw`
amine@377	258 region.channels # alias `ch`
amine@377	259
amine@377	260
amine@369	261 Concatenate regions
amine@369	262 ===================
amine@369	263
amine@369	264 .. code:: python
amine@369	265
amine@377	266 import auditok
amine@377	267 region_1 = auditok.load("audio_1.wav")
amine@377	268 region_2 = auditok.load("audio_2.wav")
amine@369	269 region_3 = region_1 + region_2
amine@369	270
amine@379	271 Particularly useful if you want to join regions returned by :func:`split`:
amine@369	272
amine@369	273 .. code:: python
amine@369	274
amine@377	275 import auditok
amine@377	276 regions = auditok.load("audio.wav").split()
amine@369	277 gapless_region = sum(regions)
amine@369	278
amine@369	279 Repeat a region
amine@369	280 ===============
amine@369	281
amine@369	282 Multiply by a positive integer:
amine@369	283
amine@369	284 .. code:: python
amine@369	285
amine@377	286 import auditok
amine@377	287 region = auditok.load("audio.wav")
amine@369	288 region_x3 = region * 3
amine@369	289
amine@377	290 Split one region into N regions of equal size
amine@377	291 =============================================
amine@369	292
amine@379	293 Divide by a positive integer (this has nothing to do with silence-based
amine@379	294 tokenization):
amine@369	295
amine@369	296 .. code:: python
amine@369	297
amine@377	298 import auditok
amine@377	299 region = auditok.load("audio.wav")
amine@369	300 regions = regions / 5
amine@369	301 assert sum(regions) == region
amine@369	302
amine@379	303 Note that if no perfect division is possible, the last region might be a bit
amine@379	304 shorter than the previous N-1 regions.
amine@369	305
amine@377	306 Slice a region by samples, seconds or milliseconds
amine@377	307 ==================================================
amine@377	308
amine@379	309 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
amine@377	310 example remove a fixed-size portion of audio data from the beginning or from the
amine@377	311 end of a region or crop a region by an arbitrary amount as a data augmentation
amine@379	312 strategy.
amine@369	313
amine@377	314 The most accurate way to slice an `AudioRegion` is to use indices that
amine@369	315 directly refer to raw audio samples. In the following example, assuming that the
amine@369	316 sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369	317 main region, starting from the 20th second as follows:
amine@369	318
amine@369	319 .. code:: python
amine@369	320
amine@377	321 import auditok
amine@377	322 region = auditok.load("audio.wav")
amine@369	323 start = 20 * 16000
amine@369	324 stop = 25 * 16000
amine@369	325 five_second_region = region[start:stop]
amine@369	326
amine@379	327 This allows you to practically start and stop at any audio sample within the region.
amine@369	328 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369	329 also use negative indices:
amine@369	330
amine@369	331 .. code:: python
amine@369	332
amine@377	333 import auditok
amine@377	334 region = auditok.load("audio.wav")
amine@369	335 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369	336 three_last_seconds = region[start:]
amine@369	337
amine@379	338 While slicing by raw samples is flexible, slicing with temporal indices is more
amine@379	339 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
amine@377	340 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369	341
amine@379	342 With the ``millis`` view:
amine@369	343
amine@369	344 .. code:: python
amine@369	345
amine@377	346 import auditok
amine@377	347 region = auditok.load("audio.wav")
amine@369	348 five_second_region = region.millis[5000:10000]
amine@369	349
amine@379	350 or with the ``seconds`` view:
amine@369	351
amine@369	352 .. code:: python
amine@369	353
amine@377	354 import auditok
amine@377	355 region = auditok.load("audio.wav")
amine@369	356 five_second_region = region.seconds[5:10]
amine@369	357
amine@379	358 ``seconds`` indices can also be floats:
amine@369	359
amine@369	360 .. code:: python
amine@369	361
amine@377	362 import auditok
amine@377	363 region = auditok.load("audio.wav")
amine@377	364 five_second_region = region.seconds[2.5:7.5]
amine@377	365
amine@377	366 Get arrays of audio samples
amine@377	367 ===========================
amine@377	368
amine@377	369 If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377	370 arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377	371 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377	372 and the second is the the sample.
amine@377	373
amine@377	374 .. code:: python
amine@377	375
amine@377	376 import auditok
amine@377	377 region = auditok.load("audio.wav")
amine@369	378 samples = region.samples
amine@379	379 assert len(samples) == region.channels
amine@369	380
amine@369	381
amine@387	382 If `numpy` is installed you can use:
amine@369	383
amine@369	384 .. code:: python
amine@369	385
amine@369	386 import numpy as np
amine@377	387 region = auditok.load("audio.wav")
amine@369	388 samples = np.asarray(region)
amine@377	389 assert len(samples.shape) == 2

Mercurial > hg > auditok

annotate doc/examples.rst @ 397:c89c0977db47