amine@432: Command-line guide
amine@432: ==================
amine@379: 
amine@432: ``auditok`` can also be used from the command line. For information
amine@432: about available parameters and descriptions, type:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok -h
amine@379: 
amine@449: 
amine@449: .. code::
amine@449: 
amine@449:     usage: auditok [-h] [--version] [-I INT] [-F INT] [-f STRING] [-M FLOAT] [-L] [-O FILE] [-o STRING] [-j FLOAT] [-T STRING] [-u INT/STRING]
amine@449:                [-a FLOAT] [-n FLOAT] [-m FLOAT] [-s FLOAT] [-d] [-R] [-e FLOAT] [-r INT] [-c INT] [-w INT] [-C STRING] [-E] [-B] [-p]
amine@449:                [--save-image FILE] [--printf STRING] [--time-format STRING] [--timestamp-format TIMESTAMP_FORMAT] [-q] [-D] [--debug-file FILE]
amine@449:                [input]
amine@449: 
amine@449:     auditok, an audio tokenization tool.
amine@449: 
amine@449:     options:
amine@449:     -h, --help            show this help message and exit
amine@449:     --version, -v         show program's version number and exit
amine@449:     -q, --quiet           Quiet mode: Do not display any information on the screen.
amine@449:     -D, --debug           Debug mode: output processing operations to STDOUT.
amine@449:     --debug-file FILE     Save processing operations to the specified file.
amine@449: 
amine@449:     Input-Output options::
amine@449:     input                 Input audio or video file. Use '-' for stdin [Default: read from a microphone using PyAudio].
amine@449:     -I INT, --input-device-index INT
amine@449:                             Audio device index [Default: None]. Optional and only effective when using PyAudio.
amine@449:     -F INT, --audio-frame-per-buffer INT
amine@449:                             Audio frame per buffer [Default: 1024]. Optional and only effective when using PyAudio.
amine@449:     -f STRING, --input-format STRING
amine@449:                             Specify the input audio file format. If not provided, the format is inferred from the file extension. If the output file
amine@449:                             name lacks an extension, the format is guessed from the file header (requires pydub). If neither condition is met, an
amine@449:                             error is raised.
amine@449:     -M FLOAT, --max-read FLOAT
amine@449:                             Maximum data (in seconds) to read from a microphone or a file [Default: read until the end of the file or stream].
amine@449:     -L, --large-file      Whether the input file should be treated as a large file. If True, data will be read from file on demand, otherwise all
amine@449:                             audio data is loaded into memory before tokenization.
amine@449:     -O FILE, --save-stream FILE
amine@449:                             Save read audio data (from a file or a microphone) to a file. If omitted, no audio data will be saved.
amine@449:     -o STRING, --save-detections-as STRING
amine@449:                             Specify the file name format to save detected events. You can use the following placeholders to construct the output
amine@449:                             file name: {id} (sequential, starting from 1), {start}, {end}, and {duration}. Time placeholders are in seconds.
amine@449:                             Example: 'Event_{id}{start}-{end}{duration:.3f}.wav'
amine@449:     -j FLOAT, --join-detections FLOAT
amine@449:                             Join (glue) detected audio events with a specified duration of silence between them. To be used in combination with the
amine@449:                             --save-stream / -O option.
amine@449:     -T STRING, --output-format STRING
amine@449:                             Specify the audio format for saving detections and/or the main stream. If not provided, the format will be (1) inferred
amine@449:                             from the file extension or (2) default to raw format.
amine@449:     -u INT/STRING, --use-channel INT/STRING
amine@449:                             Specify the audio channel to use for tokenization when the input stream is multi-channel (0 refers to the first
amine@449:                             channel). By default, this is set to None, meaning all channels are used, capturing any valid audio event from any
amine@449:                             channel. Alternatively, set this to 'mix' (or 'avg'/'average') to combine all channels into a single averaged channel
amine@449:                             for tokenization. Regardless of theoption chosen, saved audio events will have the same number of channels as the input
amine@449:                             stream. [Default: None, use all channels].
amine@449: 
amine@449:     Tokenization options::
amine@449:     Set audio events' duration and set the threshold for detection.
amine@449: 
amine@449:     -a FLOAT, --analysis-window FLOAT
amine@449:                             Specify the size of the analysis window in seconds. [Default: 0.01 (10ms)].
amine@449:     -n FLOAT, --min-duration FLOAT
amine@449:                             Minimum duration of a valid audio event in seconds. [Default: 0.2].
amine@449:     -m FLOAT, --max-duration FLOAT
amine@449:                             Maximum duration of a valid audio event in seconds. [Default: 5].
amine@449:     -s FLOAT, --max-silence FLOAT
amine@449:                             Maximum duration of consecutive silence allowed within a valid audio event in seconds. [Default: 0.3]
amine@449:     -d, --drop-trailing-silence
amine@449:                             Remove trailing silence from a detection. [Default: trailing silence is retained].
amine@449:     -R, --strict-min-duration
amine@449:                             Reject events shorter than --min-duration, even if adjacent to the most recent valid event that reached max-duration.
amine@449:                             [Default: retain such events].
amine@449:     -e FLOAT, --energy-threshold FLOAT
amine@449:                             Set the log energy threshold for detection. [Default: 50]
amine@449: 
amine@449:     Audio parameters::
amine@449:     Set audio parameters when reading from a headerless file (raw or stdin) or when using custom microphone settings.
amine@449: 
amine@449:     -r INT, --rate INT    Sampling rate of audio data [Default: 16000].
amine@449:     -c INT, --channels INT
amine@449:                             Number of channels of audio data [Default: 1].
amine@449:     -w INT, --width INT   Number of bytes per audio sample [Default: 2].
amine@449: 
amine@449:     Use audio events::
amine@449:     Use these options to print, play, or plot detected audio events.
amine@449: 
amine@449:     -C STRING, --command STRING
amine@449:                             Provide a command to execute when an audio event is detected. Use '{file}' as a placeholder for the temporary WAV file
amine@449:                             containing the event data (e.g., `-C 'du -h {file}'` to display the file size or `-C 'play -q {file}'` to play audio
amine@449:                             with sox).
amine@449:     -E, --echo            Immediately play back a detected audio event using pyaudio.
amine@449:     -B, --progress-bar    Show a progress bar when playing audio.
amine@449:     -p, --plot            Plot and displays the audio signal along with detections (requires matplotlib).
amine@449:     --save-image FILE     Save the plotted audio signal and detections as a picture or a PDF file (requires matplotlib).
amine@449:     --printf STRING       Prints information about each audio event on a new line using the specified format. The format can include text and
amine@449:                             placeholders: {id} (sequential, starting from 1), {start}, {end}, {duration}, and {timestamp}. The first three time
amine@449:                             placeholders are in seconds, with formatting controlled by the --time-format argument. {timestamp} represents the system
amine@449:                             date and time of the event, configurable with the --timestamp-format argument. Example: '[{id}]: {start} -> {end} --
amine@449:                             {timestamp}'.
amine@449:     --time-format STRING  Specify the format for printing {start}, {end}, and {duration} placeholders with --printf. [Default: %S]. Accepted
amine@449:                             formats are : - %S: absolute time in seconds - %I: absolute time in milliseconds - %h, %m, %s, %i: converts time into
amine@449:                             hours, minutes, seconds, and milliseconds (e.g., %h:%m:%s.%i) and only displays provided fields. Note that %S and %I can
amine@449:                             only be used independently.
amine@449:     --timestamp-format TIMESTAMP_FORMAT
amine@449:                             Specify the format used for printing {timestamp}. Should be a format accepted by the 'datetime' standard module.
amine@449:                             [Default: '%Y/%m/%d %H:%M:%S'].
amine@449: 
amine@449: 
amine@432: Below, we provide several examples covering the most common use cases.
amine@379: 
amine@379: 
amine@441: Real-Time audio acquisition and event detection
amine@441: -----------------------------------------------
amine@379: 
amine@432: To try ``auditok`` from the command line with your own voice, you’ll need to
amine@432: either install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so
amine@432: that ``auditok`` can read directly from the microphone, or record audio with
amine@432: an external program (e.g., `sox`) and redirect its output to ``auditok``.
amine@379: 
amine@432: To read data directly from the microphone and use default parameters for audio
amine@432: data and tokenization, simply type:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok
amine@379: 
amine@432: This will print the **id**, **start time**, and **end time** of each detected
amine@432: audio event. As mentioned above, no additional arguments were passed in the
amine@432: previous command, so ``auditok`` will use its default values. The most important
amine@432: arguments are:
amine@379: 
amine@379: 
amine@432: - ``-n``, ``--min-duration``: minimum duration of a valid audio event in seconds, default: 0.2
amine@432: - ``-m``, ``--max-duration``: maximum duration of a valid audio event in seconds, default: 5
amine@432: - ``-s``, ``--max-silence``: maximum duration of a continuous silence within a valid audio event in seconds, default: 0.3
amine@432: - ``-e``, ``--energy-threshold``: energy threshold for detection, default: 50
amine@379: 
amine@379: 
amine@379: Read audio data with an external program
amine@379: ----------------------------------------
amine@432: You can use an external program, such as `sox` (``sudo apt-get install sox``),
amine@432: to record audio data in real-time, redirect it, and have `auditok` read the data
amine@432: from standard input:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1
amine@379: 
amine@432: Note that when reading data from standard input, the same audio parameters must
amine@432: be set for both `sox` (or any other data generation/acquisition tool) and ``auditok``.
amine@432: The following table provides a summary of the audio parameters:
amine@379: 
amine@379: +-----------------+------------+------------------+-----------------------+
amine@379: | Audio parameter | sox option | `auditok` option | `auditok` default     |
amine@379: +=================+============+==================+=======================+
amine@379: | Sampling rate   | -r         | -r               |                 16000 |
amine@379: +-----------------+------------+------------------+-----------------------+
amine@379: | Sample width    | -b (bits)  | -w (bytes)       |                     2 |
amine@379: +-----------------+------------+------------------+-----------------------+
amine@379: | Channels        | -c         | -c               |                     1 |
amine@379: +-----------------+------------+------------------+-----------------------+
amine@379: | Encoding        | -e         | NA               | always a signed int   |
amine@379: +-----------------+------------+------------------+-----------------------+
amine@379: 
amine@432: Based on the table, the previous command can be run with the default parameters as:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@432:     rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -
amine@432: 
amine@379: 
amine@379: Play back audio detections
amine@379: --------------------------
amine@379: 
amine@432: Use the ``-E`` (or ``--echo``) option :
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok -E
amine@379:     # or
amine@379:     rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E
amine@379: 
amine@379: 
amine@379: Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C``
amine@379: (used to run an external command with detected audio event as argument):
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}"
amine@379: 
amine@379: Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav
amine@379: file, fill the ``{file}`` placeholder with the temporary name and run the
amine@379: command. In the above example we used ``-C`` to play audio data with an external
amine@379: program but you can use it to run any other command.
amine@379: 
amine@379: 
amine@441: Output detection details
amine@441: ------------------------
amine@379: 
amine@432: By default, ``auditok`` outputs the **id**, **start**, and **end** times for each
amine@432: detected audio event. The start and end values indicate the beginning and end of
amine@432: the event within the input stream (file or microphone) in seconds. Below is an
amine@432: example of the output in the default format:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     1 1.160 2.390
amine@379:     2 3.420 4.330
amine@379:     3 5.010 5.720
amine@379:     4 7.230 7.800
amine@379: 
amine@379: The format of the output is controlled by the ``--printf`` option. Alongside
amine@379: ``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and
amine@379: ``{timestamp}`` (system timestamp of detected event) placeholders.
amine@379: 
amine@379: Using the following format for example:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok audio.wav  --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}"
amine@379: 
amine@432: the output will look like:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230
amine@379:     2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910
amine@379:     3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710
amine@379:     4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570
amine@379: 
amine@379: 
amine@379: The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default:
amine@379: `"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}``
amine@379: by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed
amine@379: format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds)
amine@379: and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i).
amine@379: 
amine@379: To completely disable printing detection information use ``-q``.
amine@379: 
amine@441: 
amine@379: Save detections
amine@379: ---------------
amine@379: 
amine@379: You can save audio events to disk as they're detected using ``-o`` or
amine@441: ``--save-detections-as`` followed by a file name with placeholders. To create
amine@441: a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}``
amine@441: and ``{duration}`` placeholders as in this example:
amine@379: 
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok --save-detections-as "{id}_{start}_{end}.wav"
amine@379: 
amine@432: When using ``{start}``, ``{end}``, and ``{duration}`` placeholders, it is
amine@432: recommended to limit the number of decimal places for these values to 3. You
amine@432: can do this with a format like:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok -o "{id}_{start:.3f}_{end:.3f}.wav"
amine@379: 
amine@379: 
amine@441: Save the full audio stream
amine@441: --------------------------
amine@379: 
amine@432: When reading audio data from the microphone, you may want to save it to disk.
amine@432: To do this, use the ``-O`` or ``--save-stream`` option:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@432:     auditok --save-stream output.wav
amine@379: 
amine@432: Note that this will work even if you read data from a file on disk.
amine@379: 
amine@379: 
amine@437: Join detected audio events, inserting a silence between them
amine@437: ------------------------------------------------------------
amine@432: 
amine@441: Sometimes, you may want to detect audio events and create a new file containing
amine@441: these events with pauses of a specific duration between them. This is useful if
amine@441: you wish to preserve your original audio data while adjusting the length of pauses
amine@441: (either shortening or extending them).
amine@432: 
amine@437: To achieve this, use the ``-j`` or ``--join-detections`` option together
amine@432: with the ``-O`` / ``--save-stream`` option. In the example below, we
amine@441: read data from ``input.wav`` and save audio events to ``output.wav``, adding
amine@432: 1-second pauses between them:
amine@432: 
amine@432: .. code:: bash
amine@432: 
amine@432:     auditok input.wav --join-detections 1 -O output.wav
amine@432: 
amine@441: 
amine@379: Plot detections
amine@379: ---------------
amine@379: 
amine@379: Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option.
amine@441: You can also save the plot to disk using ``--save-image``. The following example
amine@432: demonstrates both:
amine@379: 
amine@379: .. code:: bash
amine@379: 
amine@379:     auditok -p --save-image "plot.png" # can also be 'pdf' or another image format
amine@379: 
amine@379: output example:
amine@379: 
amine@379: .. image:: figures/example_1.png
amine@379: 
amine@379: Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.