annotate doc/command_line_usage.rst @ 455:7dae98b84cdd tip master

Merge branch 'master' of https://github.com/amsehili/auditok
author www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date Tue, 03 Dec 2024 09:18:01 +0000
parents f91576bf2a29
children
rev   line source
amine@432 1 Command-line guide
amine@432 2 ==================
amine@379 3
amine@432 4 ``auditok`` can also be used from the command line. For information
amine@432 5 about available parameters and descriptions, type:
amine@379 6
amine@379 7 .. code:: bash
amine@379 8
amine@379 9 auditok -h
amine@379 10
amine@449 11
amine@449 12 .. code::
amine@449 13
amine@449 14 usage: auditok [-h] [--version] [-I INT] [-F INT] [-f STRING] [-M FLOAT] [-L] [-O FILE] [-o STRING] [-j FLOAT] [-T STRING] [-u INT/STRING]
amine@449 15 [-a FLOAT] [-n FLOAT] [-m FLOAT] [-s FLOAT] [-d] [-R] [-e FLOAT] [-r INT] [-c INT] [-w INT] [-C STRING] [-E] [-B] [-p]
amine@449 16 [--save-image FILE] [--printf STRING] [--time-format STRING] [--timestamp-format TIMESTAMP_FORMAT] [-q] [-D] [--debug-file FILE]
amine@449 17 [input]
amine@449 18
amine@449 19 auditok, an audio tokenization tool.
amine@449 20
amine@449 21 options:
amine@449 22 -h, --help show this help message and exit
amine@449 23 --version, -v show program's version number and exit
amine@449 24 -q, --quiet Quiet mode: Do not display any information on the screen.
amine@449 25 -D, --debug Debug mode: output processing operations to STDOUT.
amine@449 26 --debug-file FILE Save processing operations to the specified file.
amine@449 27
amine@449 28 Input-Output options::
amine@449 29 input Input audio or video file. Use '-' for stdin [Default: read from a microphone using PyAudio].
amine@449 30 -I INT, --input-device-index INT
amine@449 31 Audio device index [Default: None]. Optional and only effective when using PyAudio.
amine@449 32 -F INT, --audio-frame-per-buffer INT
amine@449 33 Audio frame per buffer [Default: 1024]. Optional and only effective when using PyAudio.
amine@449 34 -f STRING, --input-format STRING
amine@449 35 Specify the input audio file format. If not provided, the format is inferred from the file extension. If the output file
amine@449 36 name lacks an extension, the format is guessed from the file header (requires pydub). If neither condition is met, an
amine@449 37 error is raised.
amine@449 38 -M FLOAT, --max-read FLOAT
amine@449 39 Maximum data (in seconds) to read from a microphone or a file [Default: read until the end of the file or stream].
amine@449 40 -L, --large-file Whether the input file should be treated as a large file. If True, data will be read from file on demand, otherwise all
amine@449 41 audio data is loaded into memory before tokenization.
amine@449 42 -O FILE, --save-stream FILE
amine@449 43 Save read audio data (from a file or a microphone) to a file. If omitted, no audio data will be saved.
amine@449 44 -o STRING, --save-detections-as STRING
amine@449 45 Specify the file name format to save detected events. You can use the following placeholders to construct the output
amine@449 46 file name: {id} (sequential, starting from 1), {start}, {end}, and {duration}. Time placeholders are in seconds.
amine@449 47 Example: 'Event_{id}{start}-{end}{duration:.3f}.wav'
amine@449 48 -j FLOAT, --join-detections FLOAT
amine@449 49 Join (glue) detected audio events with a specified duration of silence between them. To be used in combination with the
amine@449 50 --save-stream / -O option.
amine@449 51 -T STRING, --output-format STRING
amine@449 52 Specify the audio format for saving detections and/or the main stream. If not provided, the format will be (1) inferred
amine@449 53 from the file extension or (2) default to raw format.
amine@449 54 -u INT/STRING, --use-channel INT/STRING
amine@449 55 Specify the audio channel to use for tokenization when the input stream is multi-channel (0 refers to the first
amine@449 56 channel). By default, this is set to None, meaning all channels are used, capturing any valid audio event from any
amine@449 57 channel. Alternatively, set this to 'mix' (or 'avg'/'average') to combine all channels into a single averaged channel
amine@449 58 for tokenization. Regardless of theoption chosen, saved audio events will have the same number of channels as the input
amine@449 59 stream. [Default: None, use all channels].
amine@449 60
amine@449 61 Tokenization options::
amine@449 62 Set audio events' duration and set the threshold for detection.
amine@449 63
amine@449 64 -a FLOAT, --analysis-window FLOAT
amine@449 65 Specify the size of the analysis window in seconds. [Default: 0.01 (10ms)].
amine@449 66 -n FLOAT, --min-duration FLOAT
amine@449 67 Minimum duration of a valid audio event in seconds. [Default: 0.2].
amine@449 68 -m FLOAT, --max-duration FLOAT
amine@449 69 Maximum duration of a valid audio event in seconds. [Default: 5].
amine@449 70 -s FLOAT, --max-silence FLOAT
amine@449 71 Maximum duration of consecutive silence allowed within a valid audio event in seconds. [Default: 0.3]
amine@449 72 -d, --drop-trailing-silence
amine@449 73 Remove trailing silence from a detection. [Default: trailing silence is retained].
amine@449 74 -R, --strict-min-duration
amine@449 75 Reject events shorter than --min-duration, even if adjacent to the most recent valid event that reached max-duration.
amine@449 76 [Default: retain such events].
amine@449 77 -e FLOAT, --energy-threshold FLOAT
amine@449 78 Set the log energy threshold for detection. [Default: 50]
amine@449 79
amine@449 80 Audio parameters::
amine@449 81 Set audio parameters when reading from a headerless file (raw or stdin) or when using custom microphone settings.
amine@449 82
amine@449 83 -r INT, --rate INT Sampling rate of audio data [Default: 16000].
amine@449 84 -c INT, --channels INT
amine@449 85 Number of channels of audio data [Default: 1].
amine@449 86 -w INT, --width INT Number of bytes per audio sample [Default: 2].
amine@449 87
amine@449 88 Use audio events::
amine@449 89 Use these options to print, play, or plot detected audio events.
amine@449 90
amine@449 91 -C STRING, --command STRING
amine@449 92 Provide a command to execute when an audio event is detected. Use '{file}' as a placeholder for the temporary WAV file
amine@449 93 containing the event data (e.g., `-C 'du -h {file}'` to display the file size or `-C 'play -q {file}'` to play audio
amine@449 94 with sox).
amine@449 95 -E, --echo Immediately play back a detected audio event using pyaudio.
amine@449 96 -B, --progress-bar Show a progress bar when playing audio.
amine@449 97 -p, --plot Plot and displays the audio signal along with detections (requires matplotlib).
amine@449 98 --save-image FILE Save the plotted audio signal and detections as a picture or a PDF file (requires matplotlib).
amine@449 99 --printf STRING Prints information about each audio event on a new line using the specified format. The format can include text and
amine@449 100 placeholders: {id} (sequential, starting from 1), {start}, {end}, {duration}, and {timestamp}. The first three time
amine@449 101 placeholders are in seconds, with formatting controlled by the --time-format argument. {timestamp} represents the system
amine@449 102 date and time of the event, configurable with the --timestamp-format argument. Example: '[{id}]: {start} -> {end} --
amine@449 103 {timestamp}'.
amine@449 104 --time-format STRING Specify the format for printing {start}, {end}, and {duration} placeholders with --printf. [Default: %S]. Accepted
amine@449 105 formats are : - %S: absolute time in seconds - %I: absolute time in milliseconds - %h, %m, %s, %i: converts time into
amine@449 106 hours, minutes, seconds, and milliseconds (e.g., %h:%m:%s.%i) and only displays provided fields. Note that %S and %I can
amine@449 107 only be used independently.
amine@449 108 --timestamp-format TIMESTAMP_FORMAT
amine@449 109 Specify the format used for printing {timestamp}. Should be a format accepted by the 'datetime' standard module.
amine@449 110 [Default: '%Y/%m/%d %H:%M:%S'].
amine@449 111
amine@449 112
amine@432 113 Below, we provide several examples covering the most common use cases.
amine@379 114
amine@379 115
amine@441 116 Real-Time audio acquisition and event detection
amine@441 117 -----------------------------------------------
amine@379 118
amine@432 119 To try ``auditok`` from the command line with your own voice, you’ll need to
amine@432 120 either install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so
amine@432 121 that ``auditok`` can read directly from the microphone, or record audio with
amine@432 122 an external program (e.g., `sox`) and redirect its output to ``auditok``.
amine@379 123
amine@432 124 To read data directly from the microphone and use default parameters for audio
amine@432 125 data and tokenization, simply type:
amine@379 126
amine@379 127 .. code:: bash
amine@379 128
amine@379 129 auditok
amine@379 130
amine@432 131 This will print the **id**, **start time**, and **end time** of each detected
amine@432 132 audio event. As mentioned above, no additional arguments were passed in the
amine@432 133 previous command, so ``auditok`` will use its default values. The most important
amine@432 134 arguments are:
amine@379 135
amine@379 136
amine@432 137 - ``-n``, ``--min-duration``: minimum duration of a valid audio event in seconds, default: 0.2
amine@432 138 - ``-m``, ``--max-duration``: maximum duration of a valid audio event in seconds, default: 5
amine@432 139 - ``-s``, ``--max-silence``: maximum duration of a continuous silence within a valid audio event in seconds, default: 0.3
amine@432 140 - ``-e``, ``--energy-threshold``: energy threshold for detection, default: 50
amine@379 141
amine@379 142
amine@379 143 Read audio data with an external program
amine@379 144 ----------------------------------------
amine@432 145 You can use an external program, such as `sox` (``sudo apt-get install sox``),
amine@432 146 to record audio data in real-time, redirect it, and have `auditok` read the data
amine@432 147 from standard input:
amine@379 148
amine@379 149 .. code:: bash
amine@379 150
amine@379 151 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1
amine@379 152
amine@432 153 Note that when reading data from standard input, the same audio parameters must
amine@432 154 be set for both `sox` (or any other data generation/acquisition tool) and ``auditok``.
amine@432 155 The following table provides a summary of the audio parameters:
amine@379 156
amine@379 157 +-----------------+------------+------------------+-----------------------+
amine@379 158 | Audio parameter | sox option | `auditok` option | `auditok` default |
amine@379 159 +=================+============+==================+=======================+
amine@379 160 | Sampling rate | -r | -r | 16000 |
amine@379 161 +-----------------+------------+------------------+-----------------------+
amine@379 162 | Sample width | -b (bits) | -w (bytes) | 2 |
amine@379 163 +-----------------+------------+------------------+-----------------------+
amine@379 164 | Channels | -c | -c | 1 |
amine@379 165 +-----------------+------------+------------------+-----------------------+
amine@379 166 | Encoding | -e | NA | always a signed int |
amine@379 167 +-----------------+------------+------------------+-----------------------+
amine@379 168
amine@432 169 Based on the table, the previous command can be run with the default parameters as:
amine@379 170
amine@379 171 .. code:: bash
amine@379 172
amine@432 173 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -
amine@432 174
amine@379 175
amine@379 176 Play back audio detections
amine@379 177 --------------------------
amine@379 178
amine@432 179 Use the ``-E`` (or ``--echo``) option :
amine@379 180
amine@379 181 .. code:: bash
amine@379 182
amine@379 183 auditok -E
amine@379 184 # or
amine@379 185 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E
amine@379 186
amine@379 187
amine@379 188 Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C``
amine@379 189 (used to run an external command with detected audio event as argument):
amine@379 190
amine@379 191 .. code:: bash
amine@379 192
amine@379 193 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}"
amine@379 194
amine@379 195 Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav
amine@379 196 file, fill the ``{file}`` placeholder with the temporary name and run the
amine@379 197 command. In the above example we used ``-C`` to play audio data with an external
amine@379 198 program but you can use it to run any other command.
amine@379 199
amine@379 200
amine@441 201 Output detection details
amine@441 202 ------------------------
amine@379 203
amine@432 204 By default, ``auditok`` outputs the **id**, **start**, and **end** times for each
amine@432 205 detected audio event. The start and end values indicate the beginning and end of
amine@432 206 the event within the input stream (file or microphone) in seconds. Below is an
amine@432 207 example of the output in the default format:
amine@379 208
amine@379 209 .. code:: bash
amine@379 210
amine@379 211 1 1.160 2.390
amine@379 212 2 3.420 4.330
amine@379 213 3 5.010 5.720
amine@379 214 4 7.230 7.800
amine@379 215
amine@379 216 The format of the output is controlled by the ``--printf`` option. Alongside
amine@379 217 ``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and
amine@379 218 ``{timestamp}`` (system timestamp of detected event) placeholders.
amine@379 219
amine@379 220 Using the following format for example:
amine@379 221
amine@379 222 .. code:: bash
amine@379 223
amine@379 224 auditok audio.wav --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}"
amine@379 225
amine@432 226 the output will look like:
amine@379 227
amine@379 228 .. code:: bash
amine@379 229
amine@379 230 1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230
amine@379 231 2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910
amine@379 232 3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710
amine@379 233 4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570
amine@379 234
amine@379 235
amine@379 236 The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default:
amine@379 237 `"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}``
amine@379 238 by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed
amine@379 239 format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds)
amine@379 240 and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i).
amine@379 241
amine@379 242 To completely disable printing detection information use ``-q``.
amine@379 243
amine@441 244
amine@379 245 Save detections
amine@379 246 ---------------
amine@379 247
amine@379 248 You can save audio events to disk as they're detected using ``-o`` or
amine@441 249 ``--save-detections-as`` followed by a file name with placeholders. To create
amine@441 250 a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}``
amine@441 251 and ``{duration}`` placeholders as in this example:
amine@379 252
amine@379 253
amine@379 254 .. code:: bash
amine@379 255
amine@379 256 auditok --save-detections-as "{id}_{start}_{end}.wav"
amine@379 257
amine@432 258 When using ``{start}``, ``{end}``, and ``{duration}`` placeholders, it is
amine@432 259 recommended to limit the number of decimal places for these values to 3. You
amine@432 260 can do this with a format like:
amine@379 261
amine@379 262 .. code:: bash
amine@379 263
amine@379 264 auditok -o "{id}_{start:.3f}_{end:.3f}.wav"
amine@379 265
amine@379 266
amine@441 267 Save the full audio stream
amine@441 268 --------------------------
amine@379 269
amine@432 270 When reading audio data from the microphone, you may want to save it to disk.
amine@432 271 To do this, use the ``-O`` or ``--save-stream`` option:
amine@379 272
amine@379 273 .. code:: bash
amine@379 274
amine@432 275 auditok --save-stream output.wav
amine@379 276
amine@432 277 Note that this will work even if you read data from a file on disk.
amine@379 278
amine@379 279
amine@437 280 Join detected audio events, inserting a silence between them
amine@437 281 ------------------------------------------------------------
amine@432 282
amine@441 283 Sometimes, you may want to detect audio events and create a new file containing
amine@441 284 these events with pauses of a specific duration between them. This is useful if
amine@441 285 you wish to preserve your original audio data while adjusting the length of pauses
amine@441 286 (either shortening or extending them).
amine@432 287
amine@437 288 To achieve this, use the ``-j`` or ``--join-detections`` option together
amine@432 289 with the ``-O`` / ``--save-stream`` option. In the example below, we
amine@441 290 read data from ``input.wav`` and save audio events to ``output.wav``, adding
amine@432 291 1-second pauses between them:
amine@432 292
amine@432 293 .. code:: bash
amine@432 294
amine@432 295 auditok input.wav --join-detections 1 -O output.wav
amine@432 296
amine@441 297
amine@379 298 Plot detections
amine@379 299 ---------------
amine@379 300
amine@379 301 Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option.
amine@441 302 You can also save the plot to disk using ``--save-image``. The following example
amine@432 303 demonstrates both:
amine@379 304
amine@379 305 .. code:: bash
amine@379 306
amine@379 307 auditok -p --save-image "plot.png" # can also be 'pdf' or another image format
amine@379 308
amine@379 309 output example:
amine@379 310
amine@379 311 .. image:: figures/example_1.png
amine@379 312
amine@379 313 Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.