amine@432: Command-line guide amine@432: ================== amine@379: amine@432: ``auditok`` can also be used from the command line. For information amine@432: about available parameters and descriptions, type: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok -h amine@379: amine@449: amine@449: .. code:: amine@449: amine@449: usage: auditok [-h] [--version] [-I INT] [-F INT] [-f STRING] [-M FLOAT] [-L] [-O FILE] [-o STRING] [-j FLOAT] [-T STRING] [-u INT/STRING] amine@449: [-a FLOAT] [-n FLOAT] [-m FLOAT] [-s FLOAT] [-d] [-R] [-e FLOAT] [-r INT] [-c INT] [-w INT] [-C STRING] [-E] [-B] [-p] amine@449: [--save-image FILE] [--printf STRING] [--time-format STRING] [--timestamp-format TIMESTAMP_FORMAT] [-q] [-D] [--debug-file FILE] amine@449: [input] amine@449: amine@449: auditok, an audio tokenization tool. amine@449: amine@449: options: amine@449: -h, --help show this help message and exit amine@449: --version, -v show program's version number and exit amine@449: -q, --quiet Quiet mode: Do not display any information on the screen. amine@449: -D, --debug Debug mode: output processing operations to STDOUT. amine@449: --debug-file FILE Save processing operations to the specified file. amine@449: amine@449: Input-Output options:: amine@449: input Input audio or video file. Use '-' for stdin [Default: read from a microphone using PyAudio]. amine@449: -I INT, --input-device-index INT amine@449: Audio device index [Default: None]. Optional and only effective when using PyAudio. amine@449: -F INT, --audio-frame-per-buffer INT amine@449: Audio frame per buffer [Default: 1024]. Optional and only effective when using PyAudio. amine@449: -f STRING, --input-format STRING amine@449: Specify the input audio file format. If not provided, the format is inferred from the file extension. If the output file amine@449: name lacks an extension, the format is guessed from the file header (requires pydub). If neither condition is met, an amine@449: error is raised. amine@449: -M FLOAT, --max-read FLOAT amine@449: Maximum data (in seconds) to read from a microphone or a file [Default: read until the end of the file or stream]. amine@449: -L, --large-file Whether the input file should be treated as a large file. If True, data will be read from file on demand, otherwise all amine@449: audio data is loaded into memory before tokenization. amine@449: -O FILE, --save-stream FILE amine@449: Save read audio data (from a file or a microphone) to a file. If omitted, no audio data will be saved. amine@449: -o STRING, --save-detections-as STRING amine@449: Specify the file name format to save detected events. You can use the following placeholders to construct the output amine@449: file name: {id} (sequential, starting from 1), {start}, {end}, and {duration}. Time placeholders are in seconds. amine@449: Example: 'Event_{id}{start}-{end}{duration:.3f}.wav' amine@449: -j FLOAT, --join-detections FLOAT amine@449: Join (glue) detected audio events with a specified duration of silence between them. To be used in combination with the amine@449: --save-stream / -O option. amine@449: -T STRING, --output-format STRING amine@449: Specify the audio format for saving detections and/or the main stream. If not provided, the format will be (1) inferred amine@449: from the file extension or (2) default to raw format. amine@449: -u INT/STRING, --use-channel INT/STRING amine@449: Specify the audio channel to use for tokenization when the input stream is multi-channel (0 refers to the first amine@449: channel). By default, this is set to None, meaning all channels are used, capturing any valid audio event from any amine@449: channel. Alternatively, set this to 'mix' (or 'avg'/'average') to combine all channels into a single averaged channel amine@449: for tokenization. Regardless of theoption chosen, saved audio events will have the same number of channels as the input amine@449: stream. [Default: None, use all channels]. amine@449: amine@449: Tokenization options:: amine@449: Set audio events' duration and set the threshold for detection. amine@449: amine@449: -a FLOAT, --analysis-window FLOAT amine@449: Specify the size of the analysis window in seconds. [Default: 0.01 (10ms)]. amine@449: -n FLOAT, --min-duration FLOAT amine@449: Minimum duration of a valid audio event in seconds. [Default: 0.2]. amine@449: -m FLOAT, --max-duration FLOAT amine@449: Maximum duration of a valid audio event in seconds. [Default: 5]. amine@449: -s FLOAT, --max-silence FLOAT amine@449: Maximum duration of consecutive silence allowed within a valid audio event in seconds. [Default: 0.3] amine@449: -d, --drop-trailing-silence amine@449: Remove trailing silence from a detection. [Default: trailing silence is retained]. amine@449: -R, --strict-min-duration amine@449: Reject events shorter than --min-duration, even if adjacent to the most recent valid event that reached max-duration. amine@449: [Default: retain such events]. amine@449: -e FLOAT, --energy-threshold FLOAT amine@449: Set the log energy threshold for detection. [Default: 50] amine@449: amine@449: Audio parameters:: amine@449: Set audio parameters when reading from a headerless file (raw or stdin) or when using custom microphone settings. amine@449: amine@449: -r INT, --rate INT Sampling rate of audio data [Default: 16000]. amine@449: -c INT, --channels INT amine@449: Number of channels of audio data [Default: 1]. amine@449: -w INT, --width INT Number of bytes per audio sample [Default: 2]. amine@449: amine@449: Use audio events:: amine@449: Use these options to print, play, or plot detected audio events. amine@449: amine@449: -C STRING, --command STRING amine@449: Provide a command to execute when an audio event is detected. Use '{file}' as a placeholder for the temporary WAV file amine@449: containing the event data (e.g., `-C 'du -h {file}'` to display the file size or `-C 'play -q {file}'` to play audio amine@449: with sox). amine@449: -E, --echo Immediately play back a detected audio event using pyaudio. amine@449: -B, --progress-bar Show a progress bar when playing audio. amine@449: -p, --plot Plot and displays the audio signal along with detections (requires matplotlib). amine@449: --save-image FILE Save the plotted audio signal and detections as a picture or a PDF file (requires matplotlib). amine@449: --printf STRING Prints information about each audio event on a new line using the specified format. The format can include text and amine@449: placeholders: {id} (sequential, starting from 1), {start}, {end}, {duration}, and {timestamp}. The first three time amine@449: placeholders are in seconds, with formatting controlled by the --time-format argument. {timestamp} represents the system amine@449: date and time of the event, configurable with the --timestamp-format argument. Example: '[{id}]: {start} -> {end} -- amine@449: {timestamp}'. amine@449: --time-format STRING Specify the format for printing {start}, {end}, and {duration} placeholders with --printf. [Default: %S]. Accepted amine@449: formats are : - %S: absolute time in seconds - %I: absolute time in milliseconds - %h, %m, %s, %i: converts time into amine@449: hours, minutes, seconds, and milliseconds (e.g., %h:%m:%s.%i) and only displays provided fields. Note that %S and %I can amine@449: only be used independently. amine@449: --timestamp-format TIMESTAMP_FORMAT amine@449: Specify the format used for printing {timestamp}. Should be a format accepted by the 'datetime' standard module. amine@449: [Default: '%Y/%m/%d %H:%M:%S']. amine@449: amine@449: amine@432: Below, we provide several examples covering the most common use cases. amine@379: amine@379: amine@441: Real-Time audio acquisition and event detection amine@441: ----------------------------------------------- amine@379: amine@432: To try ``auditok`` from the command line with your own voice, you’ll need to amine@432: either install `pyaudio `_ so amine@432: that ``auditok`` can read directly from the microphone, or record audio with amine@432: an external program (e.g., `sox`) and redirect its output to ``auditok``. amine@379: amine@432: To read data directly from the microphone and use default parameters for audio amine@432: data and tokenization, simply type: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok amine@379: amine@432: This will print the **id**, **start time**, and **end time** of each detected amine@432: audio event. As mentioned above, no additional arguments were passed in the amine@432: previous command, so ``auditok`` will use its default values. The most important amine@432: arguments are: amine@379: amine@379: amine@432: - ``-n``, ``--min-duration``: minimum duration of a valid audio event in seconds, default: 0.2 amine@432: - ``-m``, ``--max-duration``: maximum duration of a valid audio event in seconds, default: 5 amine@432: - ``-s``, ``--max-silence``: maximum duration of a continuous silence within a valid audio event in seconds, default: 0.3 amine@432: - ``-e``, ``--energy-threshold``: energy threshold for detection, default: 50 amine@379: amine@379: amine@379: Read audio data with an external program amine@379: ---------------------------------------- amine@432: You can use an external program, such as `sox` (``sudo apt-get install sox``), amine@432: to record audio data in real-time, redirect it, and have `auditok` read the data amine@432: from standard input: amine@379: amine@379: .. code:: bash amine@379: amine@379: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1 amine@379: amine@432: Note that when reading data from standard input, the same audio parameters must amine@432: be set for both `sox` (or any other data generation/acquisition tool) and ``auditok``. amine@432: The following table provides a summary of the audio parameters: amine@379: amine@379: +-----------------+------------+------------------+-----------------------+ amine@379: | Audio parameter | sox option | `auditok` option | `auditok` default | amine@379: +=================+============+==================+=======================+ amine@379: | Sampling rate | -r | -r | 16000 | amine@379: +-----------------+------------+------------------+-----------------------+ amine@379: | Sample width | -b (bits) | -w (bytes) | 2 | amine@379: +-----------------+------------+------------------+-----------------------+ amine@379: | Channels | -c | -c | 1 | amine@379: +-----------------+------------+------------------+-----------------------+ amine@379: | Encoding | -e | NA | always a signed int | amine@379: +-----------------+------------+------------------+-----------------------+ amine@379: amine@432: Based on the table, the previous command can be run with the default parameters as: amine@379: amine@379: .. code:: bash amine@379: amine@432: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - amine@432: amine@379: amine@379: Play back audio detections amine@379: -------------------------- amine@379: amine@432: Use the ``-E`` (or ``--echo``) option : amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok -E amine@379: # or amine@379: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E amine@379: amine@379: amine@379: Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C`` amine@379: (used to run an external command with detected audio event as argument): amine@379: amine@379: .. code:: bash amine@379: amine@379: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}" amine@379: amine@379: Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav amine@379: file, fill the ``{file}`` placeholder with the temporary name and run the amine@379: command. In the above example we used ``-C`` to play audio data with an external amine@379: program but you can use it to run any other command. amine@379: amine@379: amine@441: Output detection details amine@441: ------------------------ amine@379: amine@432: By default, ``auditok`` outputs the **id**, **start**, and **end** times for each amine@432: detected audio event. The start and end values indicate the beginning and end of amine@432: the event within the input stream (file or microphone) in seconds. Below is an amine@432: example of the output in the default format: amine@379: amine@379: .. code:: bash amine@379: amine@379: 1 1.160 2.390 amine@379: 2 3.420 4.330 amine@379: 3 5.010 5.720 amine@379: 4 7.230 7.800 amine@379: amine@379: The format of the output is controlled by the ``--printf`` option. Alongside amine@379: ``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and amine@379: ``{timestamp}`` (system timestamp of detected event) placeholders. amine@379: amine@379: Using the following format for example: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok audio.wav --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}" amine@379: amine@432: the output will look like: amine@379: amine@379: .. code:: bash amine@379: amine@379: 1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230 amine@379: 2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910 amine@379: 3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710 amine@379: 4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570 amine@379: amine@379: amine@379: The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default: amine@379: `"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}`` amine@379: by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed amine@379: format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds) amine@379: and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i). amine@379: amine@379: To completely disable printing detection information use ``-q``. amine@379: amine@441: amine@379: Save detections amine@379: --------------- amine@379: amine@379: You can save audio events to disk as they're detected using ``-o`` or amine@441: ``--save-detections-as`` followed by a file name with placeholders. To create amine@441: a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}`` amine@441: and ``{duration}`` placeholders as in this example: amine@379: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok --save-detections-as "{id}_{start}_{end}.wav" amine@379: amine@432: When using ``{start}``, ``{end}``, and ``{duration}`` placeholders, it is amine@432: recommended to limit the number of decimal places for these values to 3. You amine@432: can do this with a format like: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok -o "{id}_{start:.3f}_{end:.3f}.wav" amine@379: amine@379: amine@441: Save the full audio stream amine@441: -------------------------- amine@379: amine@432: When reading audio data from the microphone, you may want to save it to disk. amine@432: To do this, use the ``-O`` or ``--save-stream`` option: amine@379: amine@379: .. code:: bash amine@379: amine@432: auditok --save-stream output.wav amine@379: amine@432: Note that this will work even if you read data from a file on disk. amine@379: amine@379: amine@437: Join detected audio events, inserting a silence between them amine@437: ------------------------------------------------------------ amine@432: amine@441: Sometimes, you may want to detect audio events and create a new file containing amine@441: these events with pauses of a specific duration between them. This is useful if amine@441: you wish to preserve your original audio data while adjusting the length of pauses amine@441: (either shortening or extending them). amine@432: amine@437: To achieve this, use the ``-j`` or ``--join-detections`` option together amine@432: with the ``-O`` / ``--save-stream`` option. In the example below, we amine@441: read data from ``input.wav`` and save audio events to ``output.wav``, adding amine@432: 1-second pauses between them: amine@432: amine@432: .. code:: bash amine@432: amine@432: auditok input.wav --join-detections 1 -O output.wav amine@432: amine@441: amine@379: Plot detections amine@379: --------------- amine@379: amine@379: Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option. amine@441: You can also save the plot to disk using ``--save-image``. The following example amine@432: demonstrates both: amine@379: amine@379: .. code:: bash amine@379: amine@379: auditok -p --save-image "plot.png" # can also be 'pdf' or another image format amine@379: amine@379: output example: amine@379: amine@379: .. image:: figures/example_1.png amine@379: amine@379: Plotting requires `matplotlib `_.