# HG changeset patch # User Amine Sehili # Date 1611263987 -3600 # Node ID 2e26a7c5f30022cf65f64e773238d1b72be95ebf # Parent 562b7a64b7eae3d1d56e22625f6801d5846d4beb Update Readme diff -r 562b7a64b7ea -r 2e26a7c5f300 .travis.yml --- a/.travis.yml Thu Jan 21 20:08:14 2021 +0100 +++ b/.travis.yml Thu Jan 21 22:19:47 2021 +0100 @@ -4,7 +4,7 @@ - pip install numpy - pip install genty - pip install pydub - - pip install matplotlib==3.2.2 + - pip install "matplotlib<=3.2.1" language: python python: - "3.4" diff -r 562b7a64b7ea -r 2e26a7c5f300 README.rst --- a/README.rst Thu Jan 21 20:08:14 2021 +0100 +++ b/README.rst Thu Jan 21 22:19:47 2021 +0100 @@ -1,5 +1,3 @@ - - .. image:: doc/figures/auditok-logo.png :align: center @@ -10,23 +8,32 @@ :target: http://auditok.readthedocs.org/en/latest/?badge=latest :alt: Documentation Status -``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API. +``auditok`` is an **Audio Activity Detection** tool that can process online data +(read from an audio device or from standard input) as well as audio files. +It can be used as a command line program or by calling its API. -A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed: +The latest version of the documentation can be found on +`readthedocs. `_ -- `pydub `_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. - -- `pyaudio `_ : read audio data from the microphone and play back detections. - -- `tqdm `_ : show progress bar while playing audio clips. - -- `matplotlib `_ : plot audio signal and detections. - -- `numpy `_ : required by matplotlib. Also used for some math operations instead of standard python if available. Installation ------------ +A basic version of ``auditok`` will run with standard Python (>=3.4). However, +without installing additional dependencies, ``auditok`` can only deal with audio +files in *wav* or *raw* formats. if you want more features, the following +packages are needed: + + - `pydub `_ : read audio files in popular + audio formats (ogg, mp3, etc.) or extract audio from a video file. + - `pyaudio `_ : read audio data + from the microphone and play audio back. + - `tqdm `_ : show progress bar while playing + audio clips. + - `matplotlib `_ : plot audio signal and detections. + - `numpy `_ : required by matplotlib. Also used for + some math operations instead of standard python if available. + .. code:: bash pip install git+https://github.com/amsehili/auditok @@ -37,136 +44,45 @@ .. code:: python - from auditok import split + import auditok # split returns a generator of AudioRegion objects - audio_regions = split("audio.wav") + audio_regions = auditok.split( + "audio.wav", + min_dur=0.2, # minimum duration of a valid audio event in seconds + max_dur=4, # maximum duration of an event + max_silence=0.3, # maximum duration of tolerated continuous silence within an event + energy_threshold=55 # threshold of detection + ) for region in audio_regions: region.play(progress_bar=True) filename = region.save("/tmp/region_{meta.start:.3f}.wav") print("region saved as: {}".format(filename)) -Example using `AudioRegion` ---------------------------- +Split and plot +-------------- .. code:: python - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") + import auditok + region = auditok.load("audio.wav") # returns an AudioRegion object regions = region.split_and_plot() # or just region.splitp() output figure: .. image:: doc/figures/example_1.png -Working with AudioRegions -------------------------- +Limitations +----------- -Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects. - -Concatenate regions -=================== - -.. code:: python - - from auditok import AudioRegion - region_1 = AudioRegion.load("audio_1.wav") - region_2 = AudioRegion.load("audio_2.wav") - region_3 = region_1 + region_2 - -Particularly useful if you want to join regions returned by ``split``: - -.. code:: python - - from auditok import AudioRegion - regions = AudioRegion.load("audio.wav").split() - gapless_region = sum(regions) - -Repeat a region -=============== - -Multiply by a positive integer: - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - region_x3 = region * 3 - -Make slices of equal size out of a region -========================================= - -Divide by a positive integer: - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - regions = regions / 5 - assert sum(regions) == region - -Make audio slices of arbitrary size -=================================== - -Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc. - -The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows: - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - start = 20 * 16000 - stop = 25 * 16000 - five_second_region = region[start:stop] - -This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices: - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - start = -3 * region.sr # `sr` is an alias of `sampling_rate` - three_last_seconds = region[start:] - -While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). - -With the ``millis`` view: - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - five_second_region = region.millis[5000:10000] - -or with the ``seconds`` view - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - five_second_region = region.seconds[5:10] - -Get an array of audio samples -============================= - -.. code:: python - - from auditok import AudioRegion - region = AudioRegion.load("audio.wav") - samples = region.samples - -If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data. - -Alternatively you can use: - -.. code:: python - - import numpy as np - region = AudioRegion.load("audio.wav") - samples = np.asarray(region) +Currently, the core detection algorithm is based on the energy of audio signal. +While this is fast and works very well for audio streams with low background +noise (e.g., podcasts with few people talking, language lessons, audio recorded +in a rather quiet environment, etc.) the performance can drop as the level of +noise increases. Furthermore, the algorithm makes now distinction between speech +and other kinds of sounds, so you shouldn't use it for Voice Activity Detection +if your audio data might contain non-speech events. License ------- MIT. -