Mercurial > hg > auditok

--- a/.travis.yml	Thu Jan 21 20:08:14 2021 +0100
+++ b/.travis.yml	Thu Jan 21 22:19:47 2021 +0100
@@ -4,7 +4,7 @@
   - pip install numpy
   - pip install genty
   - pip install pydub
-  - pip install matplotlib==3.2.2
+  - pip install "matplotlib<=3.2.1"
 language: python
 python:
   - "3.4"
--- a/README.rst	Thu Jan 21 20:08:14 2021 +0100
+++ b/README.rst	Thu Jan 21 22:19:47 2021 +0100
@@ -1,5 +1,3 @@
-
-
 .. image:: doc/figures/auditok-logo.png
     :align: center

@@ -10,23 +8,32 @@
     :target: http://auditok.readthedocs.org/en/latest/?badge=latest
     :alt: Documentation Status

-``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API.
+``auditok`` is an **Audio Activity Detection** tool that can process online data
+(read from an audio device or from standard input) as well as audio files.
+It can be used as a command line program or by calling its API.

-A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed:
+The latest version of the documentation can be found on
+`readthedocs. <https://readthedocs.org/projects/auditok/badge/?version=latest>`_

-- `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
-
-- `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play back detections.
-
-- `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
-
-- `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections.
-
-- `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.

 Installation
 ------------

+A basic version of ``auditok`` will run with standard Python (>=3.4). However,
+without installing additional dependencies, ``auditok`` can only deal with audio
+files in *wav* or *raw* formats. if you want more features, the following
+packages are needed:
+
+    - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular
+       audio formats (ogg, mp3, etc.) or extract audio from a video file.
+    - `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data
+       from the microphone and play audio back.
+    - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing
+       audio clips.
+    - `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections.
+    - `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for
+       some math operations instead of standard python if available.
+
 .. code:: bash

     pip install git+https://github.com/amsehili/auditok
@@ -37,136 +44,45 @@

 .. code:: python

-    from auditok import split
+    import auditok

     # split returns a generator of AudioRegion objects
-    audio_regions = split("audio.wav")
+    audio_regions = auditok.split(
+        "audio.wav",
+        min_dur=0.2,     # minimum duration of a valid audio event in seconds
+        max_dur=4,       # maximum duration of an event
+        max_silence=0.3, # maximum duration of tolerated continuous silence within an event
+        energy_threshold=55 # threshold of detection
+    )
     for region in audio_regions:
         region.play(progress_bar=True)
         filename = region.save("/tmp/region_{meta.start:.3f}.wav")
         print("region saved as: {}".format(filename))

-Example using `AudioRegion`
----------------------------
+Split and plot
+--------------

 .. code:: python

-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav") # returns an AudioRegion object
     regions = region.split_and_plot() # or just region.splitp()

 output figure:

 .. image:: doc/figures/example_1.png

-Working with AudioRegions
--------------------------
+Limitations
+-----------

-Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects.
-
-Concatenate regions
-===================
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region_1 = AudioRegion.load("audio_1.wav")
-    region_2 = AudioRegion.load("audio_2.wav")
-    region_3 = region_1 + region_2
-
-Particularly useful if you want to join regions returned by ``split``:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    regions = AudioRegion.load("audio.wav").split()
-    gapless_region = sum(regions)
-
-Repeat a region
-===============
-
-Multiply by a positive integer:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    region_x3 = region * 3
-
-Make slices of equal size out of a region
-=========================================
-
-Divide by a positive integer:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    regions = regions / 5
-    assert sum(regions) == region
-
-Make audio slices of arbitrary size
-===================================
-
-Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc.
-
-The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    start = 20 * 16000
-    stop = 25 * 16000
-    five_second_region = region[start:stop]
-
-This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    start = -3 * region.sr # `sr` is an alias of `sampling_rate`
-    three_last_seconds = region[start:]
-
-While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``).
-
-With the ``millis`` view:
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    five_second_region = region.millis[5000:10000]
-
-or with the ``seconds`` view
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    five_second_region = region.seconds[5:10]
-
-Get an array of audio samples
-=============================
-
-.. code:: python
-
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    samples = region.samples
-
-If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data.
-
-Alternatively you can use:
-
-.. code:: python
-
-    import numpy as np
-    region = AudioRegion.load("audio.wav")
-    samples = np.asarray(region)
+Currently, the core detection algorithm is based on the energy of audio signal.
+While this is fast and works very well for audio streams with low background
+noise (e.g., podcasts with few people talking, language lessons, audio recorded
+in a rather quiet environment, etc.) the performance can drop as the level of
+noise increases. Furthermore, the algorithm makes now distinction between speech
+and other kinds of sounds, so you shouldn't use it for Voice Activity Detection
+if your audio data might contain non-speech events.

 License
 -------
 MIT.
-