Mercurial > hg > auditok
changeset 379:df2a320e10d5
Add command-line guide
Update documentation
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 17 Feb 2021 19:22:18 +0100 |
parents | 0860204227de |
children | d9748949d940 |
files | auditok/cmdline.py doc/command_line_usage.rst doc/examples.rst doc/index.rst |
diffstat | 4 files changed, 254 insertions(+), 31 deletions(-) [+] |
line wrap: on
line diff
--- a/auditok/cmdline.py Wed Feb 17 22:44:23 2021 +0100 +++ b/auditok/cmdline.py Wed Feb 17 19:22:18 2021 +0100 @@ -11,7 +11,7 @@ @copyright: 2015-2021 Mohamed El Amine SEHILI @license: MIT @contact: amine.sehili@gmail.com -@deffield updated: 17 Feb 2021 +@deffield updated: 18 Feb 2021 """ import sys
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/command_line_usage.rst Wed Feb 17 19:22:18 2021 +0100 @@ -0,0 +1,193 @@ +``auditok`` can also be used from the command-line. For more information about +parameters and their description type: + + +.. code:: bash + + auditok -h + +In the following we'll a few examples that covers most use-cases. + + +Read and split audio data online +-------------------------------- + +To try ``auditok`` from the command line with you voice, you should either +install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so that ``auditok`` +can directly read data from the microphone, or record data with an external program +(e.g., `sox`) and redirect its output to ``auditok``. + +Read data from the microphone (`pyaudio` installed): + +.. code:: bash + + auditok + +This will print the *id*, *start time* and *end time* of each detected audio +event. Note that we didn't pass any additional arguments to the previous command, +so ``auditok`` will use default values. The most important arguments are: + + +- ``-n``, ``--min-duration`` : minimum duration of a valid audio event in seconds, default: 0.2 +- ``-m``, ``--max-duration`` : maximum duration of a valid audio event in seconds, default: 5 +- ``-s``, ``--max-silence`` : maximum duration of a consecutive silence within a valid audio event in seconds, default: 0.3 +- ``-e``, ``--energy-threshold`` : energy threshold for detection, default: 50 + + +Read audio data with an external program +---------------------------------------- + +If you don't have `pyaudio`, you can use `sox` for data acquisition +(`sudo apt-get install sox`) and make ``auditok`` read data from standard input: + +.. code:: bash + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1 + +Note that when data is read from standard input, the same audio parameters must +be used for both `sox` (or any other data generation/acquisition tool) and +``auditok``. The following table summarizes audio parameters. + + ++-----------------+------------+------------------+-----------------------+ +| Audio parameter | sox option | `auditok` option | `auditok` default | ++=================+============+==================+=======================+ +| Sampling rate | -r | -r | 16000 | ++-----------------+------------+------------------+-----------------------+ +| Sample width | -b (bits) | -w (bytes) | 2 | ++-----------------+------------+------------------+-----------------------+ +| Channels | -c | -c | 1 | ++-----------------+------------+------------------+-----------------------+ +| Encoding | -e | NA | always a signed int | ++-----------------+------------+------------------+-----------------------+ + +According to this table, the previous command can be run with the default +parameters as: + +.. code:: bash + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - + +Play back audio detections +-------------------------- + +Use the ``-E`` option (for echo): + +.. code:: bash + + auditok -E + # or + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E + +The second command works without further argument because data is recorded with +``auditok``'s default audio parameters . If one of the parameters is not at the +default value you should specify it alongside ``-E``. + + + +Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C`` +(used to run an external command with detected audio event as argument): + +.. code:: bash + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}" + +Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav +file, fill the ``{file}`` placeholder with the temporary name and run the +command. In the above example we used ``-C`` to play audio data with an external +program but you can use it to run any other command. + + +Print out detection information +------------------------------- + +By default ``auditok`` prints out the **id**, the **start** and the **end** of +each detected audio event. The latter two values represent the absolute position +of the event within input stream (file or microphone) in seconds. The following +listing is an example output with the default format: + +.. code:: bash + + 1 1.160 2.390 + 2 3.420 4.330 + 3 5.010 5.720 + 4 7.230 7.800 + +The format of the output is controlled by the ``--printf`` option. Alongside +``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and +``{timestamp}`` (system timestamp of detected event) placeholders. + +Using the following format for example: + +.. code:: bash + + auditok audio.wav --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}" + +the output would be something like: + +.. code:: bash + + 1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230 + 2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910 + 3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710 + 4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570 + + +The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default: +`"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}`` +by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed +format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds) +and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i). + +To completely disable printing detection information use ``-q``. + +Save detections +--------------- + +You can save audio events to disk as they're detected using ``-o`` or +``--save-detections-as``. To get a uniq file name for each event, you can use +``{id}``, ``{start}``, ``{end}`` and ``{duration}`` placeholders. Example: + + +.. code:: bash + + auditok --save-detections-as "{id}_{start}_{end}.wav" + +When using ``{start}``, ``{end}`` and ``{duration}`` placeholders, it's +recommended that the number of decimals of the corresponding values be limited +to 3. You can use something like: + +.. code:: bash + + auditok -o "{id}_{start:.3f}_{end:.3f}.wav" + + +Save whole audio stream +----------------------- + +When reading audio data from the microphone, you most certainly want to save it +to disk. For this you can use the ``-O`` or ``--save-stream`` option. + +.. code:: bash + + auditok --save-stream "stream.wav" + +Note this will work even if you read data from another file on disk. + + +Plot detections +--------------- + +Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option. +You can also save plot to disk using ``--save-image``. The following example +does both: + +.. code:: bash + + auditok -p --save-image "plot.png" # can also be 'pdf' or another image format + +output example: + +.. image:: figures/example_1.png + +Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.
--- a/doc/examples.rst Wed Feb 17 22:44:23 2021 +0100 +++ b/doc/examples.rst Wed Feb 17 19:22:18 2021 +0100 @@ -1,10 +1,13 @@ Loading audio data ------------------ +Audio data is loaded with the :func:`load` function which can read from audio +files, the microphone or use raw audio data. + From a file =========== -If the first argument of `load` is a string, it should be a path to an audio +If the first argument of :func:`load` is a string, it should be a path to an audio file. .. code:: python @@ -21,14 +24,15 @@ region = auditok.load("audio.dat", audio_format="raw", - sr=44100, - sw=2 - ch=1) + sr=44100, # alias for `sampling_rate` + sw=2 # alias for `sample_width` + ch=1 # alias for `channels` + ) From a `bytes` object ===================== -If the first argument is of type `bytes` it's interpreted as raw audio data: +If the type of the first argument `bytes`, it's interpreted as raw audio data: .. code:: python @@ -36,7 +40,7 @@ sw = 2 ch = 1 data = b"\0" * sr * sw * ch - load(data, sr=sr, sw=sw, ch=ch) + region = auditok.load(data, sr=sr, sw=sw, ch=ch) print(region) output: @@ -48,8 +52,8 @@ From the microphone =================== -If the first argument is `None`, `load` will try to read data from the microphone. -Audio parameters, as well as the `max_read` parameter are mandatory: +If the first argument is `None`, :func:`load` will try to read data from the +microphone. Audio parameters, as well as the `max_read` parameter are mandatory: .. code:: python @@ -70,7 +74,7 @@ Skip part of audio data ======================= -If the `skip` parameter is > 0, `load` will skip that leading amount of audio +If the `skip` parameter is > 0, :func:`load` will skip that leading amount of audio data: .. code:: python @@ -84,6 +88,20 @@ Basic split example ------------------- +In the following we'll use the :func:`split` function to tokenize an audio file, +requiring that valid audio events be at least 0.2 second long, at most 4 seconds +long and contain a maximum of 0.3 second of continuous silence. Limiting the size +of detected events to 4 seconds means that an event of, say, 9.5 seconds will +be returned as two 4-second events plus a third 1.5-second event. Moreover, a +valid event might contain many *silences* as far as none of them exceeds 0.3 +second. + +:func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion` +can be played, saved, repeated (i.e., multiplied by an integer) and concatenated +with another region (see examples below). Notice that :class:`AudioRegion` objects +returned by :func:`split` have a ``start`` a ``stop`` information stored in +their meta data that can be accessed like `object.meta.start`. + .. code:: python import auditok @@ -145,8 +163,8 @@ Read and split data from the microphone --------------------------------------- -If the first argument of `split` is None, audio data is read from the microphone -(requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): +If the first argument of :func:`split` is None, audio data is read from the +microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): .. code:: python @@ -165,15 +183,15 @@ pass -`split` will continue reading audio data until you press ``Ctrl-C``. If you want -to read a specific amount of audio data, pass the desired number of seconds with -the `max_read` argument. +:func:`split` will continue reading audio data until you press ``Ctrl-C``. If +you want to read a specific amount of audio data, pass the desired number of +seconds with the `max_read` argument. Accessing recorded data after split ----------------------------------- -Using a `Recorder` object you can get hold of acquired audio: +Using a :class:`Recorder` object you can get hold of acquired audio data: .. code:: python @@ -196,15 +214,17 @@ rec.rewind() full_audio = load(rec.data, sr=sr, sw=sw, ch=ch) + # alternatively you can use + full_audio = auditok.AudioRegion(rec.data, sr, sw, ch) -`Recorder` also accepts a `max_read` argument. +:class:`Recorder` also accepts a `max_read` argument. Working with AudioRegions ------------------------- -Beyond splitting, there are a couple of interesting operations you can do with -`AudioRegion` objects. +The following are a couple of interesting operations you can do with +:class:`AudioRegion` objects. Basic region information @@ -231,7 +251,7 @@ region_2 = auditok.load("audio_2.wav") region_3 = region_1 + region_2 -Particularly useful if you want to join regions returned by `split`: +Particularly useful if you want to join regions returned by :func:`split`: .. code:: python @@ -253,7 +273,8 @@ Split one region into N regions of equal size ============================================= -Divide by a positive integer: +Divide by a positive integer (this has nothing to do with silence-based +tokenization): .. code:: python @@ -262,16 +283,16 @@ regions = regions / 5 assert sum(regions) == region -Note that if perfect division is possible, the last region might be a bit shorter -than the previous N-1 regions. +Note that if no perfect division is possible, the last region might be a bit +shorter than the previous N-1 regions. Slice a region by samples, seconds or milliseconds ================================================== -Slicing an `AudioRegion` can be interesting in many situations. You can for +Slicing an :class:`AudioRegion` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or from the end of a region or crop a region by an arbitrary amount as a data augmentation -strategy, etc. +strategy. The most accurate way to slice an `AudioRegion` is to use indices that directly refer to raw audio samples. In the following example, assuming that the @@ -286,7 +307,7 @@ stop = 25 * 16000 five_second_region = region[start:stop] -This allows you to practically start and stop at any audio sample of the region. +This allows you to practically start and stop at any audio sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices: @@ -297,11 +318,11 @@ start = -3 * region.sr # `sr` is an alias of `sampling_rate` three_last_seconds = region[start:] -While slicing by raw samples is accurate, slicing with temporal indices is more -intuitive. You can do so by accessing the `millis` or `seconds` views of an +While slicing by raw samples is flexible, slicing with temporal indices is more +intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`). -With the `millis` view: +With the ``millis`` view: .. code:: python @@ -309,7 +330,7 @@ region = auditok.load("audio.wav") five_second_region = region.millis[5000:10000] -or with the `seconds` view: +or with the ``seconds`` view: .. code:: python @@ -317,7 +338,7 @@ region = auditok.load("audio.wav") five_second_region = region.seconds[5:10] -`seconds` indices can also be floats: +``seconds`` indices can also be floats: .. code:: python @@ -338,6 +359,7 @@ import auditok region = auditok.load("audio.wav") samples = region.samples + assert len(samples) == region.channels If `numpy` is not installed you can use: