Mercurial > hg > auditok
diff doc/examples.rst @ 377:c6308873f239
Improve documentation, add more examples
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 17 Feb 2021 21:18:05 +0100 |
parents | 0106c4799906 |
children | df2a320e10d5 |
line wrap: on
line diff
--- a/doc/examples.rst Fri Feb 05 21:44:08 2021 +0100 +++ b/doc/examples.rst Wed Feb 17 21:18:05 2021 +0100 @@ -1,52 +1,242 @@ -Basic example -------------- +Loading audio data +------------------ + +From a file +=========== + +If the first argument of `load` is a string, it should be a path to an audio +file. .. code:: python - from auditok import split + import auditok + region = auditok.load("audio.ogg") - # split returns a generator of AudioRegion objects - audio_regions = split("audio.wav") - for region in audio_regions: - region.play(progress_bar=True) - filename = region.save("/tmp/region_{meta.start:.3f}.wav") - print("region saved as: {}".format(filename)) - -Example using `AudioRegion` ---------------------------- +If input file contains a raw (headerless) audio data, passing `audio_format="raw"` +and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is +mandatory. In the following example we pass audio parameters with their short +names: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - regions = region.split_and_plot() # or just region.splitp() + region = auditok.load("audio.dat", + audio_format="raw", + sr=44100, + sw=2 + ch=1) + +From a `bytes` object +===================== + +If the first argument is of type `bytes` it's interpreted as raw audio data: + +.. code:: python + + sr = 16000 + sw = 2 + ch = 1 + data = b"\0" * sr * sw * ch + load(data, sr=sr, sw=sw, ch=ch) + print(region) + +output: + +.. code:: bash + + AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) + +From the microphone +=================== + +If the first argument is `None`, `load` will try to read data from the microphone. +Audio parameters, as well as the `max_read` parameter are mandatory: + + +.. code:: python + + sr = 16000 + sw = 2 + ch = 1 + five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5) + print(five_sec_audio) + +output: + +.. code:: bash + + AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1) + + +Skip part of audio data +======================= + +If the `skip` parameter is > 0, `load` will skip that leading amount of audio +data: + +.. code:: python + + import auditok + region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds + +This argument must be 0 when reading from the microphone. + + +Basic split example +------------------- + +.. code:: python + + import auditok + + # split returns a generator of AudioRegion objects + audio_regions = auditok.split( + "audio.wav", + min_dur=0.2, # minimum duration of a valid audio event in seconds + max_dur=4, # maximum duration of an event + max_silence=0.3, # maximum duration of tolerated continuous silence within an event + energy_threshold=55 # threshold of detection + ) + + for i, r in enumerate(audio_regions): + + # Regions returned by `split` have 'start' and 'end' metadata fields + print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) + + # play detection + # r.play(progress_bar=True) + + # region's metadata can also be used with the `save` method + # (no need to explicitly specify region's object and `format` arguments) + filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") + print("region saved as: {}".format(filename)) + +output example: + +.. code:: bash + + Region 0: 0.700s -- 1.400s + region saved as: region_0.700-1.400.wav + Region 1: 3.800s -- 4.500s + region saved as: region_3.800-4.500.wav + Region 2: 8.750s -- 9.950s + region saved as: region_8.750-9.950.wav + Region 3: 11.700s -- 12.400s + region saved as: region_11.700-12.400.wav + Region 4: 15.050s -- 15.850s + region saved as: region_15.050-15.850.wav + + +Split and plot +-------------- + +Visualize audio signal and detections: + +.. code:: python + + import auditok + region = auditok.load("audio.wav") # returns an AudioRegion object + regions = region.split_and_plot(...) # or just region.splitp() output figure: .. image:: figures/example_1.png + +Read and split data from the microphone +--------------------------------------- + +If the first argument of `split` is None, audio data is read from the microphone +(requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): + +.. code:: python + + import auditok + + sr = 16000 + sw = 2 + ch = 1 + eth = 55 # alias for energy_threshold, default value is 50 + + try: + for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth): + print(region) + region.play(progress_bar=True) # progress bar requires `tqdm` + except KeyboardInterrupt: + pass + + +`split` will continue reading audio data until you press ``Ctrl-C``. If you want +to read a specific amount of audio data, pass the desired number of seconds with +the `max_read` argument. + + +Accessing recorded data after split +----------------------------------- + +Using a `Recorder` object you can get hold of acquired audio: + + +.. code:: python + + import auditok + + sr = 16000 + sw = 2 + ch = 1 + eth = 55 # alias for energy_threshold, default value is 50 + + rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) + + try: + for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): + print(region) + region.play(progress_bar=True) # progress bar requires `tqdm` + except KeyboardInterrupt: + pass + + rec.rewind() + full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) + + +`Recorder` also accepts a `max_read` argument. + Working with AudioRegions ------------------------- Beyond splitting, there are a couple of interesting operations you can do with `AudioRegion` objects. + +Basic region information +======================== + +.. code:: python + + import auditok + region = auditok.load("audio.wav") + len(region) # number of audio samples int the regions, one channel considered + region.duration # duration in seconds + region.sampling_rate # alias `sr` + region.sample_width # alias `sw` + region.channels # alias `ch` + + Concatenate regions =================== .. code:: python - from auditok import AudioRegion - region_1 = AudioRegion.load("audio_1.wav") - region_2 = AudioRegion.load("audio_2.wav") + import auditok + region_1 = auditok.load("audio_1.wav") + region_2 = auditok.load("audio_2.wav") region_3 = region_1 + region_2 -Particularly useful if you want to join regions returned by ``split``: +Particularly useful if you want to join regions returned by `split`: .. code:: python - from auditok import AudioRegion - regions = AudioRegion.load("audio.wav").split() + import auditok + regions = auditok.load("audio.wav").split() gapless_region = sum(regions) Repeat a region @@ -56,92 +246,105 @@ .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") region_x3 = region * 3 -Make slices of equal size out of a region -========================================= +Split one region into N regions of equal size +============================================= Divide by a positive integer: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") regions = regions / 5 assert sum(regions) == region -Make audio slices of arbitrary size -=================================== +Note that if perfect division is possible, the last region might be a bit shorter +than the previous N-1 regions. -Slicing an ``AudioRegion`` can be interesting in many situations. You can for -example remove a fixed-size portion of audio data from the beginning or the end -of a region or crop a region by an arbitrary amount as a data augmentation +Slice a region by samples, seconds or milliseconds +================================================== + +Slicing an `AudioRegion` can be interesting in many situations. You can for +example remove a fixed-size portion of audio data from the beginning or from the +end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc. -The most accurate way to slice an ``AudioRegion`` is to use indices that +The most accurate way to slice an `AudioRegion` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") start = 20 * 16000 stop = 25 * 16000 five_second_region = region[start:stop] -This allows you to practically start and stop at any sample within the region. +This allows you to practically start and stop at any audio sample of the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") start = -3 * region.sr # `sr` is an alias of `sampling_rate` three_last_seconds = region[start:] While slicing by raw samples is accurate, slicing with temporal indices is more -intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of -``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). +intuitive. You can do so by accessing the `millis` or `seconds` views of an +`AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). -With the ``millis`` view: +With the `millis` view: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") five_second_region = region.millis[5000:10000] -or with the ``seconds`` view: +or with the `seconds` view: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") five_second_region = region.seconds[5:10] -Get an array of audio samples -============================= +`seconds` indices can also be floats: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") + five_second_region = region.seconds[2.5:7.5] + +Get arrays of audio samples +=========================== + +If `numpy` is not installed, the `samples` attributes is a list of audio samples +arrays (standard `array.array` objects), one per channels. If numpy is installed, +`samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel +and the second is the the sample. + +.. code:: python + + import auditok + region = auditok.load("audio.wav") samples = region.samples -If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data -is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not -installed this will return a standard ``array.array`` for mono data, and a list -of ``array.array`` for multichannel data. -Alternatively you can use: +If `numpy` is not installed you can use: .. code:: python import numpy as np - region = AudioRegion.load("audio.wav") + region = auditok.load("audio.wav") samples = np.asarray(region) + assert len(samples.shape) == 2