changeset 349:1076056833c5

Update README.rst
author Amine SEHILI <amsehili@users.noreply.github.com>
date Wed, 22 Jan 2020 23:21:44 +0100
parents 0ae857447ca0
children 4f8e660641d6 3d6e4d8f6903
files README.rst
diffstat 1 files changed, 138 insertions(+), 7 deletions(-) [+]
line wrap: on
line diff
--- a/README.rst	Mon Nov 11 18:53:44 2019 +0100
+++ b/README.rst	Wed Jan 22 23:21:44 2020 +0100
@@ -1,10 +1,8 @@
+
 
 .. image:: doc/figures/auditok-logo.png
     :align: center
 
-``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
-
-
 .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
     :target: https://travis-ci.org/amsehili/auditok
 
@@ -12,6 +10,29 @@
     :target: http://auditok.readthedocs.org/en/latest/?badge=latest
     :alt: Documentation Status
 
+``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API.
+
+A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed:
+
+- `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
+
+- `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play back detections.
+
+- `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
+
+- `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections (see figures above ).
+
+- `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.
+
+Installation
+------------
+
+.. code:: bash
+
+    git clone https://github.com/amsehili/auditok.git
+    cd auditok
+    python setup.py install
+
 Basic example
 -------------
 
@@ -26,9 +47,8 @@
         filename = region.save("/tmp/region_{meta.start:.3f}.wav")
         print("region saved as: {}".format(filename))
 
-
 Example using `AudioRegion`
---------------------------
+---------------------------
 
 .. code:: python
 
@@ -36,7 +56,118 @@
     region = AudioRegion.load("audio.wav")
     regions = region.split_and_plot() # or just region.splitp()
 
-
-ouptut figure:
+output figure:
 
 .. image:: doc/figures/example_1.png
+
+Working with AudioRegions
+-------------------------
+
+Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects.
+
+Concatenate regions
+===================
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region_1 = AudioRegion.load("audio_1.wav")
+    region_2 = AudioRegion.load("audio_2.wav")
+    region_3 = region_1 + region_2
+
+Particularly useful if you want to join regions returned by ``split``:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    regions = AudioRegion.load("audio.wav").split()
+    gapless_region = sum(regions)
+
+Repeat a region
+===============
+
+Multiply by a positive integer:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    region_x3 = region * 3
+
+Make slices of equal size out of a region
+=========================================
+
+Divide by a positive integer:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    regions = regions / 5
+    assert sum(regions) == region
+
+Make audio slices of arbitrary size
+===================================
+
+Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc.
+
+The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    start = 20 * 16000
+    stop = 25 * 16000
+    five_second_region = region[start:stop]
+
+This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    start = -3 * region.sr # `sr` is an alias of `sampling_rate`
+    three_last_seconds = region[start:]
+
+While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``).
+
+With the ``millis`` view:
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    five_second_region = region.millis[5000:10000]
+
+or with the ``seconds`` view
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    five_second_region = region.seconds[5:10]
+
+Get an array of audio samples
+=============================
+
+.. code:: python
+
+    from auditok import AudioRegion
+    region = AudioRegion.load("audio.wav")
+    samples = region.samples
+
+If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data.
+
+Alternatively you can use:
+
+.. code:: python
+
+    import numpy as np
+    region = AudioRegion.load("audio.wav")
+    samples = np.asarray(region)
+
+License
+-------
+MIT.
+