auditok: doc/examples.rst annotate

annotate doc/examples.rst @ 455:7dae98b84cdd tip master

Merge branch 'master' of https://github.com/amsehili/auditok

author	www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date	Tue, 03 Dec 2024 09:18:01 +0000
parents	f9d5eb9387d2
children

rev	line source
amine@387	1 Load audio data
amine@387	2 ---------------
amine@377	3
amine@432	4 Audio data is loaded using the :func:`load` function, which can read from
amine@432	5 audio files, capture from the microphone, or accept raw audio data
amine@432	6 (as a ``bytes`` object).
amine@379	7
amine@377	8 From a file
amine@377	9 ===========
amine@377	10
amine@441	11 If the first argument of :func:`load` is a string or a ``Path``, it should
amine@432	12 refer to an existing audio file.
amine@369	13
amine@369	14 .. code:: python
amine@369	15
amine@377	16 import auditok
amine@377	17 region = auditok.load("audio.ogg")
amine@369	18
amine@432	19 If the input file contains raw (headerless) audio data, specifying audio
amine@432	20 parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required.
amine@432	21 Additionally, if the file name does not end with 'raw', you should explicitly
amine@441	22 pass ``audio_format="raw"`` to the function.
amine@432	23
amine@432	24 In the example below, we provide audio parameters using their abbreviated names:
amine@369	25
amine@369	26 .. code:: python
amine@369	27
amine@377	28 region = auditok.load("audio.dat",
amine@377	29 audio_format="raw",
amine@379	30 sr=44100, # alias for `sampling_rate`
amine@432	31 sw=2, # alias for `sample_width`
amine@379	32 ch=1 # alias for `channels`
amine@379	33 )
amine@377	34
amine@432	35 Alternatively you can user :class:`AudioRegion` to load audio data:
amine@432	36
amine@432	37 .. code:: python
amine@432	38
amine@432	39 from auditok import AudioRegion
amine@432	40 region = AudioRegion.load("audio.dat",
amine@432	41 audio_format="raw",
amine@432	42 sr=44100, # alias for `sampling_rate`
amine@441	43 sw=2, # alias for `sample_width`
amine@432	44 ch=1 # alias for `channels`
amine@432	45 )
amine@432	46
amine@432	47
amine@441	48 From a ``bytes`` object
amine@441	49 =======================
amine@377	50
amine@441	51 If the first argument is of type ``bytes``, it is interpreted as raw audio data:
amine@377	52
amine@377	53 .. code:: python
amine@377	54
amine@377	55 sr = 16000
amine@377	56 sw = 2
amine@377	57 ch = 1
amine@377	58 data = b"\0" * sr * sw * ch
amine@379	59 region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377	60 print(region)
amine@387	61 # alternatively you can use
amine@432	62 region = auditok.AudioRegion(data, sr, sw, ch)
amine@377	63
amine@377	64 output:
amine@377	65
amine@377	66 .. code:: bash
amine@377	67
amine@377	68 AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	69
amine@377	70 From the microphone
amine@377	71 ===================
amine@377	72
amine@441	73 If the first argument is ``None``, :func:`load` will attempt to read data from the
amine@441	74 microphone. In this case, audio parameters, along with the ``max_read`` parameter,
amine@432	75 are required.
amine@377	76
amine@377	77 .. code:: python
amine@377	78
amine@377	79 sr = 16000
amine@377	80 sw = 2
amine@377	81 ch = 1
amine@377	82 five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377	83 print(five_sec_audio)
amine@377	84
amine@377	85 output:
amine@377	86
amine@377	87 .. code:: bash
amine@377	88
amine@377	89 AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377	90
amine@377	91
amine@377	92 Skip part of audio data
amine@377	93 =======================
amine@377	94
amine@432	95 If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified
amine@432	96 amount of leading audio data, measured in seconds:
amine@377	97
amine@377	98 .. code:: python
amine@377	99
amine@377	100 import auditok
amine@377	101 region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377	102
amine@387	103 This argument must be 0 when reading data from the microphone.
amine@387	104
amine@387	105
amine@387	106 Limit the amount of read audio
amine@387	107 ==============================
amine@387	108
amine@432	109 If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount
amine@387	110 in seconds of audio data:
amine@387	111
amine@387	112 .. code:: python
amine@387	113
amine@387	114 import auditok
amine@387	115 region = auditok.load("audio.ogg", max_read=5)
amine@387	116 assert region.duration <= 5
amine@387	117
amine@432	118 This argument is required when reading data from the microphone.
amine@377	119
amine@377	120
amine@377	121 Basic split example
amine@377	122 -------------------
amine@377	123
amine@432	124 In the following example, we'll use the :func:`split` function to tokenize an
amine@432	125 audio file.We’ll specify that valid audio events must be at least 0.2 seconds
amine@432	126 long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
amine@432	127 silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
amine@432	128 will be returned as two 4-second events plus a final 1.5-second event. Additionally,
amine@432	129 a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
amine@379	130
amine@432	131 :func:`split` returns a generator of :class:`AudioRegion` objects. Each
amine@432	132 :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer),
amine@432	133 and concatenated with another region (see examples below). Note that
amine@441	134 :class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop``
amine@432	135 attributes, which mark the beginning and end of the audio event relative to the
amine@432	136 input audio stream.
amine@379	137
amine@377	138 .. code:: python
amine@377	139
amine@377	140 import auditok
amine@377	141
amine@432	142 # `split` returns a generator of AudioRegion objects
amine@432	143 audio_events = auditok.split(
amine@377	144 "audio.wav",
amine@432	145 min_dur=0.2, # Minimum duration of a valid audio event in seconds
amine@432	146 max_dur=4, # Maximum duration of an event
amine@432	147 max_silence=0.3, # Maximum tolerated silence duration within an event
amine@432	148 energy_threshold=55 # Detection threshold
amine@377	149 )
amine@377	150
amine@432	151 for i, r in enumerate(audio_events):
amine@432	152 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
amine@432	153 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
amine@377	154
amine@432	155 # Play the audio event
amine@432	156 r.play(progress_bar=True)
amine@377	157
amine@432	158 # Save the event with start and end times in the filename
amine@432	159 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
amine@441	160 print(f"event saved as: {filename}")
amine@377	161
amine@432	162 Example output:
amine@377	163
amine@377	164 .. code:: bash
amine@377	165
amine@432	166 Event 0: 0.700s -- 1.400s
amine@441	167 event saved as: event_0.700-1.400.wav
amine@432	168 Event 1: 3.800s -- 4.500s
amine@441	169 event saved as: event_3.800-4.500.wav
amine@432	170 Event 2: 8.750s -- 9.950s
amine@441	171 event saved as: event_8.750-9.950.wav
amine@432	172 Event 3: 11.700s -- 12.400s
amine@441	173 event saved as: event_11.700-12.400.wav
amine@432	174 Event 4: 15.050s -- 15.850s
amine@441	175 event saved as: event_15.050-15.850.wav
amine@377	176
amine@377	177 Split and plot
amine@377	178 --------------
amine@377	179
amine@377	180 Visualize audio signal and detections:
amine@377	181
amine@377	182 .. code:: python
amine@377	183
amine@377	184 import auditok
amine@377	185 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377	186 regions = region.split_and_plot(...) # or just region.splitp()
amine@369	187
amine@369	188 output figure:
amine@369	189
amine@369	190 .. image:: figures/example_1.png
amine@369	191
amine@432	192 Split an audio stream and re-join (glue) audio events with silence
amine@432	193 ------------------------------------------------------------------
amine@432	194
amine@432	195 The following code detects audio events within an audio stream, then insert
amine@432	196 1 second of silence between them to create an audio with pauses:
amine@432	197
amine@432	198 .. code:: python
amine@432	199
amine@432	200 # Create a 1-second silent audio region
amine@432	201 # Audio parameters must match the original stream
amine@432	202 from auditok import split, make_silence
amine@432	203 silence = make_silence(duration=1,
amine@432	204 sampling_rate=16000,
amine@432	205 sample_width=2,
amine@432	206 channels=1)
amine@432	207 events = split("audio.wav")
amine@432	208 audio_with_pauses = silence.join(events)
amine@432	209
amine@432	210 Alternatively, use ``split_and_join_with_silence``:
amine@432	211
amine@432	212 .. code:: python
amine@432	213
amine@432	214 from auditok import split_and_join_with_silence
amine@432	215 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
amine@432	216
amine@377	217
amine@441	218 Read audio data from the microphone and perform real-time event detection
amine@441	219 -------------------------------------------------------------------------
amine@377	220
amine@432	221 If the first argument of :func:`split` is ``None``, audio data is read from the
amine@379	222 microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377	223
amine@377	224 .. code:: python
amine@377	225
amine@377	226 import auditok
amine@377	227
amine@377	228 sr = 16000
amine@377	229 sw = 2
amine@377	230 ch = 1
amine@377	231 eth = 55 # alias for energy_threshold, default value is 50
amine@377	232
amine@377	233 try:
amine@377	234 for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	235 print(region)
amine@377	236 region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377	237 except KeyboardInterrupt:
amine@377	238 pass
amine@377	239
amine@377	240
amine@432	241 :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read
amine@432	242 a specific amount of audio data, pass the desired number of seconds using the
amine@441	243 ``max_read`` argument.
amine@377	244
amine@377	245
amine@387	246 Access recorded data after split
amine@387	247 --------------------------------
amine@377	248
amine@432	249 Using a :class:`Recorder` object you can access to audio data read from a file
amine@432	250 of from the mirophone. With the following code press ``Ctrl-C`` to stop recording:
amine@377	251
amine@377	252
amine@377	253 .. code:: python
amine@377	254
amine@377	255 import auditok
amine@377	256
amine@377	257 sr = 16000
amine@377	258 sw = 2
amine@377	259 ch = 1
amine@377	260 eth = 55 # alias for energy_threshold, default value is 50
amine@377	261
amine@377	262 rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@432	263 events = []
amine@377	264
amine@377	265 try:
amine@377	266 for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377	267 print(region)
amine@432	268 region.play(progress_bar=True)
amine@432	269 events.append(region)
amine@377	270 except KeyboardInterrupt:
amine@377	271 pass
amine@377	272
amine@377	273 rec.rewind()
amine@454	274 full_audio = auditok.load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379	275 # alternatively you can use
amine@379	276 full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@432	277 full_audio.play(progress_bar=True)
amine@377	278
amine@377	279
amine@441	280 :class:`Recorder` also accepts a ``max_read`` argument.
amine@377	281
amine@369	282 Working with AudioRegions
amine@369	283 -------------------------
amine@369	284
amine@432	285 In the following sections, we will review several operations
amine@441	286 that can be performed with :class:`AudioRegion` objects.
amine@377	287
amine@377	288 Basic region information
amine@377	289 ========================
amine@377	290
amine@377	291 .. code:: python
amine@377	292
amine@377	293 import auditok
amine@377	294 region = auditok.load("audio.wav")
amine@377	295 len(region) # number of audio samples int the regions, one channel considered
amine@377	296 region.duration # duration in seconds
amine@377	297 region.sampling_rate # alias `sr`
amine@377	298 region.sample_width # alias `sw`
amine@377	299 region.channels # alias `ch`
amine@377	300
amine@432	301 When an audio region is returned by the :func:`split` function, it includes defined
amine@432	302 ``start`` and ``end`` attributes that refer to the beginning and end of the audio
amine@432	303 event relative to the input audio stream.
amine@377	304
amine@369	305 Concatenate regions
amine@369	306 ===================
amine@369	307
amine@369	308 .. code:: python
amine@369	309
amine@377	310 import auditok
amine@377	311 region_1 = auditok.load("audio_1.wav")
amine@377	312 region_2 = auditok.load("audio_2.wav")
amine@369	313 region_3 = region_1 + region_2
amine@369	314
amine@432	315 This is particularly useful when you want to join regions returned by the
amine@432	316 :func:`split` function:
amine@369	317
amine@369	318 .. code:: python
amine@369	319
amine@377	320 import auditok
amine@377	321 regions = auditok.load("audio.wav").split()
amine@369	322 gapless_region = sum(regions)
amine@369	323
amine@369	324 Repeat a region
amine@369	325 ===============
amine@369	326
amine@369	327 Multiply by a positive integer:
amine@369	328
amine@369	329 .. code:: python
amine@369	330
amine@377	331 import auditok
amine@377	332 region = auditok.load("audio.wav")
amine@369	333 region_x3 = region * 3
amine@369	334
amine@377	335 Split one region into N regions of equal size
amine@377	336 =============================================
amine@369	337
amine@432	338 Divide by a positive integer (this is unrelated to silence-based tokenization!):
amine@369	339
amine@369	340 .. code:: python
amine@369	341
amine@377	342 import auditok
amine@377	343 region = auditok.load("audio.wav")
amine@369	344 regions = regions / 5
amine@369	345 assert sum(regions) == region
amine@369	346
amine@432	347 Note that if an exact split is not possible, the last region may be shorter
amine@432	348 than the preceding N-1 regions.
amine@369	349
amine@377	350 Slice a region by samples, seconds or milliseconds
amine@377	351 ==================================================
amine@377	352
amine@432	353 Slicing an :class:`AudioRegion` can be useful in various situations.
amine@432	354 For example, you can remove a fixed-length portion of audio data from
amine@432	355 the beginning or end of a region, or crop a region by an arbitrary amount
amine@432	356 as a data augmentation strategy.
amine@369	357
amine@441	358 The most accurate way to slice an :class:`AudioRegion` is by using indices
amine@441	359 that directly refer to raw audio samples. In the following example, assuming
amine@432	360 the audio data has a sampling rate of 16000, you can extract a 5-second
amine@432	361 segment from the main region, starting at the 20th second, as follows:
amine@369	362
amine@369	363 .. code:: python
amine@369	364
amine@377	365 import auditok
amine@377	366 region = auditok.load("audio.wav")
amine@369	367 start = 20 * 16000
amine@369	368 stop = 25 * 16000
amine@369	369 five_second_region = region[start:stop]
amine@369	370
amine@432	371 This allows you to start and stop at any audio sample within the region. Similar
amine@432	372 to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative
amine@432	373 indices are also supported:
amine@369	374
amine@369	375 .. code:: python
amine@369	376
amine@377	377 import auditok
amine@377	378 region = auditok.load("audio.wav")
amine@369	379 start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369	380 three_last_seconds = region[start:]
amine@369	381
amine@432	382 While slicing by raw samples offers flexibility, using temporal indices is
amine@432	383 often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds``
amine@432	384 views of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``).
amine@369	385
amine@379	386 With the ``millis`` view:
amine@369	387
amine@369	388 .. code:: python
amine@369	389
amine@377	390 import auditok
amine@377	391 region = auditok.load("audio.wav")
amine@369	392 five_second_region = region.millis[5000:10000]
amine@432	393 # or
amine@432	394 five_second_region = region.ms[5000:10000]
amine@369	395
amine@379	396 or with the ``seconds`` view:
amine@369	397
amine@369	398 .. code:: python
amine@369	399
amine@377	400 import auditok
amine@377	401 region = auditok.load("audio.wav")
amine@369	402 five_second_region = region.seconds[5:10]
amine@432	403 # or
amine@432	404 five_second_region = region.sec[5:10]
amine@432	405 # or
amine@432	406 five_second_region = region.s[5:10]
amine@369	407
amine@379	408 ``seconds`` indices can also be floats:
amine@369	409
amine@369	410 .. code:: python
amine@369	411
amine@377	412 import auditok
amine@377	413 region = auditok.load("audio.wav")
amine@377	414 five_second_region = region.seconds[2.5:7.5]
amine@377	415
amine@432	416 Export an ``AudioRegion`` as a ``numpy`` array
amine@432	417 ==============================================
amine@377	418
amine@377	419 .. code:: python
amine@377	420
amine@432	421 from auditok import load, AudioRegion
amine@432	422 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
amine@432	423 x = audio.numpy()
amine@432	424 assert x.shape[0] == audio.channels
amine@432	425 assert x.shape[1] == len(audio)

Mercurial > hg > auditok

annotate doc/examples.rst @ 455:7dae98b84cdd tip master