auditok: doc/examples.rst annotate

annotate doc/examples.rst @ 379:df2a320e10d5

Add command-line guide Update documentation

author	Amine Sehili <amine.sehili@gmail.com>
date	Wed, 17 Feb 2021 19:22:18 +0100
parents	c6308873f239
children	edcc102fb33f

rev	line source
amine@377	1 Loading audio data
amine@377	2 ------------------
amine@377	3
amine@379	4 Audio data is loaded with the :func:`load` function which can read from audio
amine@379	5 files, the microphone or use raw audio data.
amine@379	6
amine@377	7 From a file
amine@377	8 ===========
amine@377	9
amine@379	10 If the first argument of :func:`load` is a string, it should be a path to an audio
amine@377	11 file.
amine@369	12
amine@369	13 .. code:: python
amine@369	14
amine@377	15 import auditok
amine@377	16 region = auditok.load("audio.ogg")
amine@369	17
amine@377	18 If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
amine@377	19 and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377	20 mandatory. In the following example we pass audio parameters with their short
amine@377	21 names:
amine@369	22
amine@369	23 .. code:: python
amine@369	24
amine@377	25 region = auditok.load("audio.dat",
amine@377	26 audio_format="raw",
amine@379	27 sr=44100, # alias for `sampling_rate`
amine@379	28 sw=2 # alias for `sample_width`
amine@379	29 ch=1 # alias for `channels`
amine@379	30 )
amine@377	31
amine@377	32 From a `bytes` object
amine@377	33 =====================
amine@377	34
amine@379	35 If the type of the first argument `bytes`, it's interpreted as raw audio data:
amine@377	36
amine@377	37 .. code:: python
amine@377	38
amine@377	39 sr = 16000
amine@377	40 sw = 2
amine@377	41 ch = 1
amine@377	42 data = b"\0" * sr * sw * ch
amine@379	43 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377	44 print(region)
amine@377	45
amine@377	46 output:
amine@377	47
amine@377	48 .. code:: bash
amine@377	49
amine@377	50 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	51
amine@377	52 From the microphone
amine@377	53 ===================
amine@377	54
amine@379	55 If the first argument is `None`, :func:`load` will try to read data from the
amine@379	56 microphone. Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377	57
amine@377	58
amine@377	59 .. code:: python
amine@377	60
amine@377	61 sr = 16000
amine@377	62 sw = 2
amine@377	63 ch = 1
amine@377	64 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377	65 print(five_sec_audio)
amine@377	66
amine@377	67 output:
amine@377	68
amine@377	69 .. code:: bash
amine@377	70
amine@377	71 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	72
amine@377	73
amine@377	74 Skip part of audio data
amine@377	75 =======================
amine@377	76
amine@379	77 If the `skip` parameter is > 0, :func:`load` will skip that leading amount of audio
amine@377	78 data:
amine@377	79
amine@377	80 .. code:: python
amine@377	81
amine@377	82 import auditok
amine@377	83 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377	84
amine@377	85 This argument must be 0 when reading from the microphone.
amine@377	86
amine@377	87
amine@377	88 Basic split example
amine@377	89 -------------------
amine@377	90
amine@379	91 In the following we'll use the :func:`split` function to tokenize an audio file,
amine@379	92 requiring that valid audio events be at least 0.2 second long, at most 4 seconds
amine@379	93 long and contain a maximum of 0.3 second of continuous silence. Limiting the size
amine@379	94 of detected events to 4 seconds means that an event of, say, 9.5 seconds will
amine@379	95 be returned as two 4-second events plus a third 1.5-second event. Moreover, a
amine@379	96 valid event might contain many silences as far as none of them exceeds 0.3
amine@379	97 second.
amine@379	98
amine@379	99 :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
amine@379	100 can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
amine@379	101 with another region (see examples below). Notice that :class:`AudioRegion` objects
amine@379	102 returned by :func:`split` have a ``start`` a ``stop`` information stored in
amine@379	103 their meta data that can be accessed like `object.meta.start`.
amine@379	104
amine@377	105 .. code:: python
amine@377	106
amine@377	107 import auditok
amine@377	108
amine@377	109 # split returns a generator of AudioRegion objects
amine@377	110 audio_regions = auditok.split(
amine@377	111 "audio.wav",
amine@377	112 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@377	113 max_dur=4, # maximum duration of an event
amine@377	114 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377	115 energy_threshold=55 # threshold of detection
amine@377	116 )
amine@377	117
amine@377	118 for i, r in enumerate(audio_regions):
amine@377	119
amine@377	120 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377	121 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377	122
amine@377	123 # play detection
amine@377	124 # r.play(progress_bar=True)
amine@377	125
amine@377	126 # region's metadata can also be used with the `save` method
amine@377	127 # (no need to explicitly specify region's object and `format` arguments)
amine@377	128 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377	129 print("region saved as: {}".format(filename))
amine@377	130
amine@377	131 output example:
amine@377	132
amine@377	133 .. code:: bash
amine@377	134
amine@377	135 Region 0: 0.700s -- 1.400s
amine@377	136 region saved as: region_0.700-1.400.wav
amine@377	137 Region 1: 3.800s -- 4.500s
amine@377	138 region saved as: region_3.800-4.500.wav
amine@377	139 Region 2: 8.750s -- 9.950s
amine@377	140 region saved as: region_8.750-9.950.wav
amine@377	141 Region 3: 11.700s -- 12.400s
amine@377	142 region saved as: region_11.700-12.400.wav
amine@377	143 Region 4: 15.050s -- 15.850s
amine@377	144 region saved as: region_15.050-15.850.wav
amine@377	145
amine@377	146
amine@377	147 Split and plot
amine@377	148 --------------
amine@377	149
amine@377	150 Visualize audio signal and detections:
amine@377	151
amine@377	152 .. code:: python
amine@377	153
amine@377	154 import auditok
amine@377	155 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377	156 regions = region.split_and_plot(...) # or just region.splitp()
amine@369	157
amine@369	158 output figure:
amine@369	159
amine@369	160 .. image:: figures/example_1.png
amine@369	161
amine@377	162
amine@377	163 Read and split data from the microphone
amine@377	164 ---------------------------------------
amine@377	165
amine@379	166 If the first argument of :func:`split` is None, audio data is read from the
amine@379	167 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377	168
amine@377	169 .. code:: python
amine@377	170
amine@377	171 import auditok
amine@377	172
amine@377	173 sr = 16000
amine@377	174 sw = 2
amine@377	175 ch = 1
amine@377	176 eth = 55 # alias for energy_threshold, default value is 50
amine@377	177
amine@377	178 try:
amine@377	179 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	180 print(region)
amine@377	181 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	182 except KeyboardInterrupt:
amine@377	183 pass
amine@377	184
amine@377	185
amine@379	186 :func:`split` will continue reading audio data until you press ``Ctrl-C``. If
amine@379	187 you want to read a specific amount of audio data, pass the desired number of
amine@379	188 seconds with the `max_read` argument.
amine@377	189
amine@377	190
amine@377	191 Accessing recorded data after split
amine@377	192 -----------------------------------
amine@377	193
amine@379	194 Using a :class:`Recorder` object you can get hold of acquired audio data:
amine@377	195
amine@377	196
amine@377	197 .. code:: python
amine@377	198
amine@377	199 import auditok
amine@377	200
amine@377	201 sr = 16000
amine@377	202 sw = 2
amine@377	203 ch = 1
amine@377	204 eth = 55 # alias for energy_threshold, default value is 50
amine@377	205
amine@377	206 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377	207
amine@377	208 try:
amine@377	209 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	210 print(region)
amine@377	211 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	212 except KeyboardInterrupt:
amine@377	213 pass
amine@377	214
amine@377	215 rec.rewind()
amine@377	216 full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379	217 # alternatively you can use
amine@379	218 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@377	219
amine@377	220
amine@379	221 :class:`Recorder` also accepts a `max_read` argument.
amine@377	222
amine@369	223 Working with AudioRegions
amine@369	224 -------------------------
amine@369	225
amine@379	226 The following are a couple of interesting operations you can do with
amine@379	227 :class:`AudioRegion` objects.
amine@369	228
amine@377	229
amine@377	230 Basic region information
amine@377	231 ========================
amine@377	232
amine@377	233 .. code:: python
amine@377	234
amine@377	235 import auditok
amine@377	236 region = auditok.load("audio.wav")
amine@377	237 len(region) # number of audio samples int the regions, one channel considered
amine@377	238 region.duration # duration in seconds
amine@377	239 region.sampling_rate # alias `sr`
amine@377	240 region.sample_width # alias `sw`
amine@377	241 region.channels # alias `ch`
amine@377	242
amine@377	243
amine@369	244 Concatenate regions
amine@369	245 ===================
amine@369	246
amine@369	247 .. code:: python
amine@369	248
amine@377	249 import auditok
amine@377	250 region_1 = auditok.load("audio_1.wav")
amine@377	251 region_2 = auditok.load("audio_2.wav")
amine@369	252 region_3 = region_1 + region_2
amine@369	253
amine@379	254 Particularly useful if you want to join regions returned by :func:`split`:
amine@369	255
amine@369	256 .. code:: python
amine@369	257
amine@377	258 import auditok
amine@377	259 regions = auditok.load("audio.wav").split()
amine@369	260 gapless_region = sum(regions)
amine@369	261
amine@369	262 Repeat a region
amine@369	263 ===============
amine@369	264
amine@369	265 Multiply by a positive integer:
amine@369	266
amine@369	267 .. code:: python
amine@369	268
amine@377	269 import auditok
amine@377	270 region = auditok.load("audio.wav")
amine@369	271 region_x3 = region * 3
amine@369	272
amine@377	273 Split one region into N regions of equal size
amine@377	274 =============================================
amine@369	275
amine@379	276 Divide by a positive integer (this has nothing to do with silence-based
amine@379	277 tokenization):
amine@369	278
amine@369	279 .. code:: python
amine@369	280
amine@377	281 import auditok
amine@377	282 region = auditok.load("audio.wav")
amine@369	283 regions = regions / 5
amine@369	284 assert sum(regions) == region
amine@369	285
amine@379	286 Note that if no perfect division is possible, the last region might be a bit
amine@379	287 shorter than the previous N-1 regions.
amine@369	288
amine@377	289 Slice a region by samples, seconds or milliseconds
amine@377	290 ==================================================
amine@377	291
amine@379	292 Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
amine@377	293 example remove a fixed-size portion of audio data from the beginning or from the
amine@377	294 end of a region or crop a region by an arbitrary amount as a data augmentation
amine@379	295 strategy.
amine@369	296
amine@377	297 The most accurate way to slice an `AudioRegion` is to use indices that
amine@369	298 directly refer to raw audio samples. In the following example, assuming that the
amine@369	299 sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369	300 main region, starting from the 20th second as follows:
amine@369	301
amine@369	302 .. code:: python
amine@369	303
amine@377	304 import auditok
amine@377	305 region = auditok.load("audio.wav")
amine@369	306 start = 20 * 16000
amine@369	307 stop = 25 * 16000
amine@369	308 five_second_region = region[start:stop]
amine@369	309
amine@379	310 This allows you to practically start and stop at any audio sample within the region.
amine@369	311 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369	312 also use negative indices:
amine@369	313
amine@369	314 .. code:: python
amine@369	315
amine@377	316 import auditok
amine@377	317 region = auditok.load("audio.wav")
amine@369	318 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369	319 three_last_seconds = region[start:]
amine@369	320
amine@379	321 While slicing by raw samples is flexible, slicing with temporal indices is more
amine@379	322 intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
amine@377	323 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369	324
amine@379	325 With the ``millis`` view:
amine@369	326
amine@369	327 .. code:: python
amine@369	328
amine@377	329 import auditok
amine@377	330 region = auditok.load("audio.wav")
amine@369	331 five_second_region = region.millis[5000:10000]
amine@369	332
amine@379	333 or with the ``seconds`` view:
amine@369	334
amine@369	335 .. code:: python
amine@369	336
amine@377	337 import auditok
amine@377	338 region = auditok.load("audio.wav")
amine@369	339 five_second_region = region.seconds[5:10]
amine@369	340
amine@379	341 ``seconds`` indices can also be floats:
amine@369	342
amine@369	343 .. code:: python
amine@369	344
amine@377	345 import auditok
amine@377	346 region = auditok.load("audio.wav")
amine@377	347 five_second_region = region.seconds[2.5:7.5]
amine@377	348
amine@377	349 Get arrays of audio samples
amine@377	350 ===========================
amine@377	351
amine@377	352 If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377	353 arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377	354 `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377	355 and the second is the the sample.
amine@377	356
amine@377	357 .. code:: python
amine@377	358
amine@377	359 import auditok
amine@377	360 region = auditok.load("audio.wav")
amine@369	361 samples = region.samples
amine@379	362 assert len(samples) == region.channels
amine@369	363
amine@369	364
amine@377	365 If `numpy` is not installed you can use:
amine@369	366
amine@369	367 .. code:: python
amine@369	368
amine@369	369 import numpy as np
amine@377	370 region = auditok.load("audio.wav")
amine@369	371 samples = np.asarray(region)
amine@377	372 assert len(samples.shape) == 2

Mercurial > hg > auditok

annotate doc/examples.rst @ 379:df2a320e10d5