Mercurial > hg > auditok
changeset 377:c6308873f239
Improve documentation, add more examples
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 17 Feb 2021 21:18:05 +0100 |
parents | d83cba0f8072 |
children | 0860204227de |
files | README.rst auditok/cmdline.py doc/examples.rst doc/index.rst doc/installation.rst |
diffstat | 5 files changed, 294 insertions(+), 81 deletions(-) [+] |
line wrap: on
line diff
--- a/README.rst Fri Feb 05 21:44:08 2021 +0100 +++ b/README.rst Wed Feb 17 21:18:05 2021 +0100 @@ -25,7 +25,7 @@ packages are needed: - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. -- `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play audio back. +- `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ : read audio data from the microphone and play audio back. - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips. - `matplotlib <https://matplotlib.org/stable/index.html>`_ : plot audio signal and detections. - `numpy <https://numpy.org/>`_ : required by matplotlib. Also used for some math operations instead of standard python if available. @@ -36,7 +36,7 @@ sudo pip install auditok -Install with the latest version from github: +Install with the latest development version from github:: .. code:: bash
--- a/auditok/cmdline.py Fri Feb 05 21:44:08 2021 +0100 +++ b/auditok/cmdline.py Wed Feb 17 21:18:05 2021 +0100 @@ -5,13 +5,13 @@ `auditok` is a program that can be used for Audio/Acoustic activity detection. It can read audio data from audio files as well -as from built-in device(s) or standard input. +as from the microphone or standard input. @author: Mohamed El Amine SEHILI @copyright: 2015-2021 Mohamed El Amine SEHILI @license: MIT @contact: amine.sehili@gmail.com -@deffield updated: 03 Jan 2021 +@deffield updated: 17 Feb 2021 """ import sys @@ -30,7 +30,7 @@ __all__ = [] __date__ = "2015-11-23" -__updated__ = "2021-01-03" +__updated__ = "2021-02-17" def main(argv=None):
--- a/doc/examples.rst Fri Feb 05 21:44:08 2021 +0100 +++ b/doc/examples.rst Wed Feb 17 21:18:05 2021 +0100 @@ -1,52 +1,242 @@ -Basic example -------------- +Loading audio data +------------------ + +From a file +=========== + +If the first argument of `load` is a string, it should be a path to an audio +file. .. code:: python - from auditok import split + import auditok + region = auditok.load("audio.ogg") - # split returns a generator of AudioRegion objects - audio_regions = split("audio.wav") - for region in audio_regions: - region.play(progress_bar=True) - filename = region.save("/tmp/region_{meta.start:.3f}.wav") - print("region saved as: {}".format(filename)) - -Example using `AudioRegion` ---------------------------- +If input file contains a raw (headerless) audio data, passing `audio_format="raw"` +and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is +mandatory. In the following example we pass audio parameters with their short +names: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - regions = region.split_and_plot() # or just region.splitp() + region = auditok.load("audio.dat", + audio_format="raw", + sr=44100, + sw=2 + ch=1) + +From a `bytes` object +===================== + +If the first argument is of type `bytes` it's interpreted as raw audio data: + +.. code:: python + + sr = 16000 + sw = 2 + ch = 1 + data = b"\0" * sr * sw * ch + load(data, sr=sr, sw=sw, ch=ch) + print(region) + +output: + +.. code:: bash + + AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) + +From the microphone +=================== + +If the first argument is `None`, `load` will try to read data from the microphone. +Audio parameters, as well as the `max_read` parameter are mandatory: + + +.. code:: python + + sr = 16000 + sw = 2 + ch = 1 + five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5) + print(five_sec_audio) + +output: + +.. code:: bash + + AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1) + + +Skip part of audio data +======================= + +If the `skip` parameter is > 0, `load` will skip that leading amount of audio +data: + +.. code:: python + + import auditok + region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds + +This argument must be 0 when reading from the microphone. + + +Basic split example +------------------- + +.. code:: python + + import auditok + + # split returns a generator of AudioRegion objects + audio_regions = auditok.split( + "audio.wav", + min_dur=0.2, # minimum duration of a valid audio event in seconds + max_dur=4, # maximum duration of an event + max_silence=0.3, # maximum duration of tolerated continuous silence within an event + energy_threshold=55 # threshold of detection + ) + + for i, r in enumerate(audio_regions): + + # Regions returned by `split` have 'start' and 'end' metadata fields + print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) + + # play detection + # r.play(progress_bar=True) + + # region's metadata can also be used with the `save` method + # (no need to explicitly specify region's object and `format` arguments) + filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") + print("region saved as: {}".format(filename)) + +output example: + +.. code:: bash + + Region 0: 0.700s -- 1.400s + region saved as: region_0.700-1.400.wav + Region 1: 3.800s -- 4.500s + region saved as: region_3.800-4.500.wav + Region 2: 8.750s -- 9.950s + region saved as: region_8.750-9.950.wav + Region 3: 11.700s -- 12.400s + region saved as: region_11.700-12.400.wav + Region 4: 15.050s -- 15.850s + region saved as: region_15.050-15.850.wav + + +Split and plot +-------------- + +Visualize audio signal and detections: + +.. code:: python + + import auditok + region = auditok.load("audio.wav") # returns an AudioRegion object + regions = region.split_and_plot(...) # or just region.splitp() output figure: .. image:: figures/example_1.png + +Read and split data from the microphone +--------------------------------------- + +If the first argument of `split` is None, audio data is read from the microphone +(requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): + +.. code:: python + + import auditok + + sr = 16000 + sw = 2 + ch = 1 + eth = 55 # alias for energy_threshold, default value is 50 + + try: + for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth): + print(region) + region.play(progress_bar=True) # progress bar requires `tqdm` + except KeyboardInterrupt: + pass + + +`split` will continue reading audio data until you press ``Ctrl-C``. If you want +to read a specific amount of audio data, pass the desired number of seconds with +the `max_read` argument. + + +Accessing recorded data after split +----------------------------------- + +Using a `Recorder` object you can get hold of acquired audio: + + +.. code:: python + + import auditok + + sr = 16000 + sw = 2 + ch = 1 + eth = 55 # alias for energy_threshold, default value is 50 + + rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) + + try: + for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): + print(region) + region.play(progress_bar=True) # progress bar requires `tqdm` + except KeyboardInterrupt: + pass + + rec.rewind() + full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) + + +`Recorder` also accepts a `max_read` argument. + Working with AudioRegions ------------------------- Beyond splitting, there are a couple of interesting operations you can do with `AudioRegion` objects. + +Basic region information +======================== + +.. code:: python + + import auditok + region = auditok.load("audio.wav") + len(region) # number of audio samples int the regions, one channel considered + region.duration # duration in seconds + region.sampling_rate # alias `sr` + region.sample_width # alias `sw` + region.channels # alias `ch` + + Concatenate regions =================== .. code:: python - from auditok import AudioRegion - region_1 = AudioRegion.load("audio_1.wav") - region_2 = AudioRegion.load("audio_2.wav") + import auditok + region_1 = auditok.load("audio_1.wav") + region_2 = auditok.load("audio_2.wav") region_3 = region_1 + region_2 -Particularly useful if you want to join regions returned by ``split``: +Particularly useful if you want to join regions returned by `split`: .. code:: python - from auditok import AudioRegion - regions = AudioRegion.load("audio.wav").split() + import auditok + regions = auditok.load("audio.wav").split() gapless_region = sum(regions) Repeat a region @@ -56,92 +246,105 @@ .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") region_x3 = region * 3 -Make slices of equal size out of a region -========================================= +Split one region into N regions of equal size +============================================= Divide by a positive integer: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") regions = regions / 5 assert sum(regions) == region -Make audio slices of arbitrary size -=================================== +Note that if perfect division is possible, the last region might be a bit shorter +than the previous N-1 regions. -Slicing an ``AudioRegion`` can be interesting in many situations. You can for -example remove a fixed-size portion of audio data from the beginning or the end -of a region or crop a region by an arbitrary amount as a data augmentation +Slice a region by samples, seconds or milliseconds +================================================== + +Slicing an `AudioRegion` can be interesting in many situations. You can for +example remove a fixed-size portion of audio data from the beginning or from the +end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc. -The most accurate way to slice an ``AudioRegion`` is to use indices that +The most accurate way to slice an `AudioRegion` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") start = 20 * 16000 stop = 25 * 16000 five_second_region = region[start:stop] -This allows you to practically start and stop at any sample within the region. +This allows you to practically start and stop at any audio sample of the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") start = -3 * region.sr # `sr` is an alias of `sampling_rate` three_last_seconds = region[start:] While slicing by raw samples is accurate, slicing with temporal indices is more -intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of -``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). +intuitive. You can do so by accessing the `millis` or `seconds` views of an +`AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). -With the ``millis`` view: +With the `millis` view: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") five_second_region = region.millis[5000:10000] -or with the ``seconds`` view: +or with the `seconds` view: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") five_second_region = region.seconds[5:10] -Get an array of audio samples -============================= +`seconds` indices can also be floats: .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") + five_second_region = region.seconds[2.5:7.5] + +Get arrays of audio samples +=========================== + +If `numpy` is not installed, the `samples` attributes is a list of audio samples +arrays (standard `array.array` objects), one per channels. If numpy is installed, +`samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel +and the second is the the sample. + +.. code:: python + + import auditok + region = auditok.load("audio.wav") samples = region.samples -If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data -is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not -installed this will return a standard ``array.array`` for mono data, and a list -of ``array.array`` for multichannel data. -Alternatively you can use: +If `numpy` is not installed you can use: .. code:: python import numpy as np - region = AudioRegion.load("audio.wav") + region = auditok.load("audio.wav") samples = np.asarray(region) + assert len(samples.shape) == 2
--- a/doc/index.rst Fri Feb 05 21:44:08 2021 +0100 +++ b/doc/index.rst Wed Feb 17 21:18:05 2021 +0100 @@ -1,11 +1,9 @@ - -.. autosummary:: - :toctree: generated/ - - +auditok, an AUDIo TOKenization tool +=================================== .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master :target: https://travis-ci.org/amsehili/auditok + :alt: Build Status .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest :target: http://auditok.readthedocs.org/en/latest/?badge=latest @@ -13,7 +11,7 @@ -`auditok` is an **Audio Activity Detection** tool that can process online data +``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API.
--- a/doc/installation.rst Fri Feb 05 21:44:08 2021 +0100 +++ b/doc/installation.rst Wed Feb 17 21:18:05 2021 +0100 @@ -1,22 +1,34 @@ Installation ------------ -.. code:: bash - - pip install auditok - - A basic version of ``auditok`` will run with standard Python (>=3.4). However, without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed: - - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular - audio formats (ogg, mp3, etc.) or extract audio from a video file. - - `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data - from the microphone and play back detections. - - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing - audio clips. - - `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections. - - `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for - some math operations instead of standard python if available. +- `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. +- `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ : read audio data from the microphone and play audio back. +- `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips. +- `matplotlib <https://matplotlib.org/stable/index.html>`_ : plot audio signal and detections. +- `numpy <https://numpy.org/>`_ : required by matplotlib. Also used for some math operations instead of standard python if available. + + +Install the latest stable version with pip: + +.. code:: bash + + sudo pip install auditok + +Install with the latest development version from github: + +.. code:: bash + + pip install git+https://github.com/amsehili/auditok + +or + +.. code:: bash + + git clone https://github.com/amsehili/auditok.git + cd auditok + python setup.py install