Mercurial > hg > auditok

--- a/auditok/cmdline.py	Wed Feb 17 22:44:23 2021 +0100
+++ b/auditok/cmdline.py	Wed Feb 17 19:22:18 2021 +0100
@@ -11,7 +11,7 @@
 @copyright:  2015-2021 Mohamed El Amine SEHILI
 @license:    MIT
 @contact:    amine.sehili@gmail.com
-@deffield    updated: 17 Feb 2021
+@deffield    updated: 18 Feb 2021
 """

 import sys
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/command_line_usage.rst	Wed Feb 17 19:22:18 2021 +0100
@@ -0,0 +1,193 @@
+``auditok`` can also be used from the command-line. For more information about
+parameters and their description type:
+
+
+.. code:: bash
+
+    auditok -h
+
+In the following we'll a few examples that covers most use-cases.
+
+
+Read and split audio data online
+--------------------------------
+
+To try ``auditok`` from the command line with you voice, you should either
+install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so that ``auditok``
+can directly read data from the microphone, or record data with an external program
+(e.g., `sox`) and redirect its output to ``auditok``.
+
+Read data from the microphone (`pyaudio` installed):
+
+.. code:: bash
+
+    auditok
+
+This will print the *id*, *start time* and *end time* of each detected audio
+event. Note that we didn't pass any additional arguments to the previous command,
+so ``auditok`` will use default values. The most important arguments are:
+
+
+- ``-n``, ``--min-duration`` : minimum duration of a valid audio event in seconds, default: 0.2
+- ``-m``, ``--max-duration`` : maximum duration of a valid audio event in seconds, default: 5
+- ``-s``, ``--max-silence`` : maximum duration of a consecutive silence within a valid audio event in seconds, default: 0.3
+- ``-e``, ``--energy-threshold`` : energy threshold for detection, default: 50
+
+
+Read audio data with an external program
+----------------------------------------
+
+If you don't have `pyaudio`, you can use `sox` for data acquisition
+(`sudo apt-get install sox`) and make ``auditok`` read data from standard input:
+
+.. code:: bash
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1
+
+Note that when data is read from standard input, the same audio parameters must
+be used for both `sox` (or any other data generation/acquisition tool) and
+``auditok``. The following table summarizes audio parameters.
+
+
++-----------------+------------+------------------+-----------------------+
+| Audio parameter | sox option | `auditok` option | `auditok` default     |
++=================+============+==================+=======================+
+| Sampling rate   | -r         | -r               |                 16000 |
++-----------------+------------+------------------+-----------------------+
+| Sample width    | -b (bits)  | -w (bytes)       |                     2 |
++-----------------+------------+------------------+-----------------------+
+| Channels        | -c         | -c               |                     1 |
++-----------------+------------+------------------+-----------------------+
+| Encoding        | -e         | NA               | always a signed int   |
++-----------------+------------+------------------+-----------------------+
+
+According to this table, the previous command can be run with the default
+parameters as:
+
+.. code:: bash
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
+
+Play back audio detections
+--------------------------
+
+Use the ``-E`` option (for echo):
+
+.. code:: bash
+
+    auditok -E
+    # or
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E
+
+The second command works without further argument because data is recorded with
+``auditok``'s default audio parameters . If one of the parameters is not at the
+default value you should specify it alongside ``-E``.
+
+
+
+Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C``
+(used to run an external command with detected audio event as argument):
+
+.. code:: bash
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}"
+
+Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav
+file, fill the ``{file}`` placeholder with the temporary name and run the
+command. In the above example we used ``-C`` to play audio data with an external
+program but you can use it to run any other command.
+
+
+Print out detection information
+-------------------------------
+
+By default ``auditok`` prints out the **id**, the **start** and the **end** of
+each detected audio event. The latter two values represent the absolute position
+of the event within input stream (file or microphone) in seconds. The following
+listing is an example output with the default format:
+
+.. code:: bash
+
+    1 1.160 2.390
+    2 3.420 4.330
+    3 5.010 5.720
+    4 7.230 7.800
+
+The format of the output is controlled by the ``--printf`` option. Alongside
+``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and
+``{timestamp}`` (system timestamp of detected event) placeholders.
+
+Using the following format for example:
+
+.. code:: bash
+
+    auditok audio.wav  --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}"
+
+the output would be something like:
+
+.. code:: bash
+
+    1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230
+    2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910
+    3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710
+    4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570
+
+
+The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default:
+`"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}``
+by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed
+format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds)
+and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i).
+
+To completely disable printing detection information use ``-q``.
+
+Save detections
+---------------
+
+You can save audio events to disk as they're detected using ``-o`` or
+``--save-detections-as``. To get a uniq file name for each event, you can use
+``{id}``, ``{start}``, ``{end}`` and ``{duration}`` placeholders. Example:
+
+
+.. code:: bash
+
+    auditok --save-detections-as "{id}_{start}_{end}.wav"
+
+When using ``{start}``, ``{end}`` and ``{duration}`` placeholders, it's
+recommended that the number of decimals of the corresponding values be limited
+to 3. You can use something like:
+
+.. code:: bash
+
+    auditok -o "{id}_{start:.3f}_{end:.3f}.wav"
+
+
+Save whole audio stream
+-----------------------
+
+When reading audio data from the microphone, you most certainly want to save it
+to disk. For this you can use the ``-O`` or ``--save-stream`` option.
+
+.. code:: bash
+
+    auditok --save-stream "stream.wav"
+
+Note this will work even if you read data from another file on disk.
+
+
+Plot detections
+---------------
+
+Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option.
+You can also save plot to disk using ``--save-image``. The following example
+does both:
+
+.. code:: bash
+
+    auditok -p --save-image "plot.png" # can also be 'pdf' or another image format
+
+output example:
+
+.. image:: figures/example_1.png
+
+Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.
--- a/doc/examples.rst	Wed Feb 17 22:44:23 2021 +0100
+++ b/doc/examples.rst	Wed Feb 17 19:22:18 2021 +0100
@@ -1,10 +1,13 @@
 Loading audio data
 ------------------

+Audio data is loaded with the :func:`load` function which can read from audio
+files, the microphone or use raw audio data.
+
 From a file
 ===========

-If the first argument of `load` is a string, it should be a path to an audio
+If the first argument of :func:`load` is a string, it should be a path to an audio
 file.

 .. code:: python
@@ -21,14 +24,15 @@

     region = auditok.load("audio.dat",
                           audio_format="raw",
-                          sr=44100,
-                          sw=2
-                          ch=1)
+                          sr=44100, # alias for `sampling_rate`
+                          sw=2      # alias for `sample_width`
+                          ch=1      # alias for `channels`
+                          )

 From a `bytes` object
 =====================

-If the first argument is of type `bytes` it's interpreted as raw audio data:
+If the type of the first argument `bytes`, it's interpreted as raw audio data:

 .. code:: python

@@ -36,7 +40,7 @@
     sw = 2
     ch = 1
     data = b"\0" * sr * sw * ch
-    load(data, sr=sr, sw=sw, ch=ch)
+    region = auditok.load(data, sr=sr, sw=sw, ch=ch)
     print(region)

 output:
@@ -48,8 +52,8 @@
 From the microphone
 ===================

-If the first argument is `None`, `load` will try to read data from the microphone.
-Audio parameters, as well as the `max_read` parameter are mandatory:
+If the first argument is `None`, :func:`load` will try to read data from the
+microphone. Audio parameters, as well as the `max_read` parameter are mandatory:


 .. code:: python
@@ -70,7 +74,7 @@
 Skip part of audio data
 =======================

-If the `skip` parameter is > 0, `load` will skip that leading amount of audio
+If the `skip` parameter is > 0, :func:`load` will skip that leading amount of audio
 data:

 .. code:: python
@@ -84,6 +88,20 @@
 Basic split example
 -------------------

+In the following we'll use the :func:`split` function to tokenize an audio file,
+requiring that valid audio events be at least 0.2 second long, at most 4 seconds
+long and contain a maximum of 0.3 second of continuous silence. Limiting the size
+of detected events to 4 seconds means that an event of, say, 9.5 seconds will
+be returned as two 4-second events plus a third 1.5-second event. Moreover, a
+valid event might contain many *silences* as far as none of them exceeds 0.3
+second.
+
+:func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
+can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
+with another region (see examples below). Notice that :class:`AudioRegion` objects
+returned by :func:`split` have a ``start`` a ``stop`` information stored in
+their meta data that can be accessed like `object.meta.start`.
+
 .. code:: python

     import auditok
@@ -145,8 +163,8 @@
 Read and split data from the microphone
 ---------------------------------------

-If the first argument of `split` is None, audio data is read from the microphone
-(requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
+If the first argument of :func:`split` is None, audio data is read from the
+microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):

 .. code:: python

@@ -165,15 +183,15 @@
          pass


-`split` will continue reading audio data until you press ``Ctrl-C``. If you want
-to read a specific amount of audio data, pass the desired number of seconds with
-the `max_read` argument.
+:func:`split` will continue reading audio data until you press ``Ctrl-C``. If
+you want to read a specific amount of audio data, pass the desired number of
+seconds with the `max_read` argument.


 Accessing recorded data after split
 -----------------------------------

-Using a `Recorder` object you can get hold of acquired audio:
+Using a :class:`Recorder` object you can get hold of acquired audio data:


 .. code:: python
@@ -196,15 +214,17 @@

     rec.rewind()
     full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
+    # alternatively you can use
+    full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)


-`Recorder` also accepts a `max_read` argument.
+:class:`Recorder` also accepts a `max_read` argument.

 Working with AudioRegions
 -------------------------

-Beyond splitting, there are a couple of interesting operations you can do with
-`AudioRegion` objects.
+The following are a couple of interesting operations you can do with
+:class:`AudioRegion` objects.


 Basic region information
@@ -231,7 +251,7 @@
     region_2 = auditok.load("audio_2.wav")
     region_3 = region_1 + region_2

-Particularly useful if you want to join regions returned by `split`:
+Particularly useful if you want to join regions returned by :func:`split`:

 .. code:: python

@@ -253,7 +273,8 @@
 Split one region into N regions of equal size
 =============================================

-Divide by a positive integer:
+Divide by a positive integer (this has nothing to do with silence-based
+tokenization):

 .. code:: python

@@ -262,16 +283,16 @@
     regions = regions / 5
     assert sum(regions) == region

-Note that if perfect division is possible, the last region might be a bit shorter
-than the previous N-1 regions.
+Note that if no perfect division is possible, the last region might be a bit
+shorter than the previous N-1 regions.

 Slice a region by samples, seconds or milliseconds
 ==================================================

-Slicing an `AudioRegion` can be interesting in many situations. You can for
+Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
 example remove a fixed-size portion of audio data from the beginning or from the
 end of a region or crop a region by an arbitrary amount as a data augmentation
-strategy, etc.
+strategy.

 The most accurate way to slice an `AudioRegion` is to use indices that
 directly refer to raw audio samples. In the following example, assuming that the
@@ -286,7 +307,7 @@
     stop = 25 * 16000
     five_second_region = region[start:stop]

-This allows you to practically start and stop at any audio sample of the region.
+This allows you to practically start and stop at any audio sample within the region.
 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
 also use negative indices:

@@ -297,11 +318,11 @@
     start = -3 * region.sr # `sr` is an alias of `sampling_rate`
     three_last_seconds = region[start:]

-While slicing by raw samples is accurate, slicing with temporal indices is more
-intuitive. You can do so by accessing the `millis` or `seconds` views of an
+While slicing by raw samples is flexible, slicing with temporal indices is more
+intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
 `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).

-With the `millis` view:
+With the ``millis`` view:

 .. code:: python

@@ -309,7 +330,7 @@
     region = auditok.load("audio.wav")
     five_second_region = region.millis[5000:10000]

-or with the `seconds` view:
+or with the ``seconds`` view:

 .. code:: python

@@ -317,7 +338,7 @@
     region = auditok.load("audio.wav")
     five_second_region = region.seconds[5:10]

-`seconds` indices can also be floats:
+``seconds`` indices can also be floats:

 .. code:: python

@@ -338,6 +359,7 @@
     import auditok
     region = auditok.load("audio.wav")
     samples = region.samples
+    assert len(samples) == region.channels


 If `numpy` is not installed you can use:
--- a/doc/index.rst	Wed Feb 17 22:44:23 2021 +0100
+++ b/doc/index.rst	Wed Feb 17 19:22:18 2021 +0100
@@ -24,6 +24,14 @@
     examples

 .. toctree::
+    :caption: Command-line guide
+    :maxdepth: 3
+
+    command_line_usage
+
+
+
+.. toctree::
     :caption: API Reference
     :maxdepth: 3