auditok: doc/command_line_usage.rst annotate

annotate doc/command_line_usage.rst @ 455:7dae98b84cdd tip master

Merge branch 'master' of https://github.com/amsehili/auditok

author	www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date	Tue, 03 Dec 2024 09:18:01 +0000
parents	f91576bf2a29
children

rev	line source
amine@432	1 Command-line guide
amine@432	2 ==================
amine@379	3
amine@432	4 ``auditok`` can also be used from the command line. For information
amine@432	5 about available parameters and descriptions, type:
amine@379	6
amine@379	7 .. code:: bash
amine@379	8
amine@379	9 auditok -h
amine@379	10
amine@449	11
amine@449	12 .. code::
amine@449	13
amine@449	14 usage: auditok [-h] [--version] [-I INT] [-F INT] [-f STRING] [-M FLOAT] [-L] [-O FILE] [-o STRING] [-j FLOAT] [-T STRING] [-u INT/STRING]
amine@449	15 [-a FLOAT] [-n FLOAT] [-m FLOAT] [-s FLOAT] [-d] [-R] [-e FLOAT] [-r INT] [-c INT] [-w INT] [-C STRING] [-E] [-B] [-p]
amine@449	16 [--save-image FILE] [--printf STRING] [--time-format STRING] [--timestamp-format TIMESTAMP_FORMAT] [-q] [-D] [--debug-file FILE]
amine@449	17 [input]
amine@449	18
amine@449	19 auditok, an audio tokenization tool.
amine@449	20
amine@449	21 options:
amine@449	22 -h, --help show this help message and exit
amine@449	23 --version, -v show program's version number and exit
amine@449	24 -q, --quiet Quiet mode: Do not display any information on the screen.
amine@449	25 -D, --debug Debug mode: output processing operations to STDOUT.
amine@449	26 --debug-file FILE Save processing operations to the specified file.
amine@449	27
amine@449	28 Input-Output options::
amine@449	29 input Input audio or video file. Use '-' for stdin [Default: read from a microphone using PyAudio].
amine@449	30 -I INT, --input-device-index INT
amine@449	31 Audio device index [Default: None]. Optional and only effective when using PyAudio.
amine@449	32 -F INT, --audio-frame-per-buffer INT
amine@449	33 Audio frame per buffer [Default: 1024]. Optional and only effective when using PyAudio.
amine@449	34 -f STRING, --input-format STRING
amine@449	35 Specify the input audio file format. If not provided, the format is inferred from the file extension. If the output file
amine@449	36 name lacks an extension, the format is guessed from the file header (requires pydub). If neither condition is met, an
amine@449	37 error is raised.
amine@449	38 -M FLOAT, --max-read FLOAT
amine@449	39 Maximum data (in seconds) to read from a microphone or a file [Default: read until the end of the file or stream].
amine@449	40 -L, --large-file Whether the input file should be treated as a large file. If True, data will be read from file on demand, otherwise all
amine@449	41 audio data is loaded into memory before tokenization.
amine@449	42 -O FILE, --save-stream FILE
amine@449	43 Save read audio data (from a file or a microphone) to a file. If omitted, no audio data will be saved.
amine@449	44 -o STRING, --save-detections-as STRING
amine@449	45 Specify the file name format to save detected events. You can use the following placeholders to construct the output
amine@449	46 file name: {id} (sequential, starting from 1), {start}, {end}, and {duration}. Time placeholders are in seconds.
amine@449	47 Example: 'Event_{id}{start}-{end}{duration:.3f}.wav'
amine@449	48 -j FLOAT, --join-detections FLOAT
amine@449	49 Join (glue) detected audio events with a specified duration of silence between them. To be used in combination with the
amine@449	50 --save-stream / -O option.
amine@449	51 -T STRING, --output-format STRING
amine@449	52 Specify the audio format for saving detections and/or the main stream. If not provided, the format will be (1) inferred
amine@449	53 from the file extension or (2) default to raw format.
amine@449	54 -u INT/STRING, --use-channel INT/STRING
amine@449	55 Specify the audio channel to use for tokenization when the input stream is multi-channel (0 refers to the first
amine@449	56 channel). By default, this is set to None, meaning all channels are used, capturing any valid audio event from any
amine@449	57 channel. Alternatively, set this to 'mix' (or 'avg'/'average') to combine all channels into a single averaged channel
amine@449	58 for tokenization. Regardless of theoption chosen, saved audio events will have the same number of channels as the input
amine@449	59 stream. [Default: None, use all channels].
amine@449	60
amine@449	61 Tokenization options::
amine@449	62 Set audio events' duration and set the threshold for detection.
amine@449	63
amine@449	64 -a FLOAT, --analysis-window FLOAT
amine@449	65 Specify the size of the analysis window in seconds. [Default: 0.01 (10ms)].
amine@449	66 -n FLOAT, --min-duration FLOAT
amine@449	67 Minimum duration of a valid audio event in seconds. [Default: 0.2].
amine@449	68 -m FLOAT, --max-duration FLOAT
amine@449	69 Maximum duration of a valid audio event in seconds. [Default: 5].
amine@449	70 -s FLOAT, --max-silence FLOAT
amine@449	71 Maximum duration of consecutive silence allowed within a valid audio event in seconds. [Default: 0.3]
amine@449	72 -d, --drop-trailing-silence
amine@449	73 Remove trailing silence from a detection. [Default: trailing silence is retained].
amine@449	74 -R, --strict-min-duration
amine@449	75 Reject events shorter than --min-duration, even if adjacent to the most recent valid event that reached max-duration.
amine@449	76 [Default: retain such events].
amine@449	77 -e FLOAT, --energy-threshold FLOAT
amine@449	78 Set the log energy threshold for detection. [Default: 50]
amine@449	79
amine@449	80 Audio parameters::
amine@449	81 Set audio parameters when reading from a headerless file (raw or stdin) or when using custom microphone settings.
amine@449	82
amine@449	83 -r INT, --rate INT Sampling rate of audio data [Default: 16000].
amine@449	84 -c INT, --channels INT
amine@449	85 Number of channels of audio data [Default: 1].
amine@449	86 -w INT, --width INT Number of bytes per audio sample [Default: 2].
amine@449	87
amine@449	88 Use audio events::
amine@449	89 Use these options to print, play, or plot detected audio events.
amine@449	90
amine@449	91 -C STRING, --command STRING
amine@449	92 Provide a command to execute when an audio event is detected. Use '{file}' as a placeholder for the temporary WAV file
amine@449	93 containing the event data (e.g., `-C 'du -h {file}'` to display the file size or `-C 'play -q {file}'` to play audio
amine@449	94 with sox).
amine@449	95 -E, --echo Immediately play back a detected audio event using pyaudio.
amine@449	96 -B, --progress-bar Show a progress bar when playing audio.
amine@449	97 -p, --plot Plot and displays the audio signal along with detections (requires matplotlib).
amine@449	98 --save-image FILE Save the plotted audio signal and detections as a picture or a PDF file (requires matplotlib).
amine@449	99 --printf STRING Prints information about each audio event on a new line using the specified format. The format can include text and
amine@449	100 placeholders: {id} (sequential, starting from 1), {start}, {end}, {duration}, and {timestamp}. The first three time
amine@449	101 placeholders are in seconds, with formatting controlled by the --time-format argument. {timestamp} represents the system
amine@449	102 date and time of the event, configurable with the --timestamp-format argument. Example: '[{id}]: {start} -> {end} --
amine@449	103 {timestamp}'.
amine@449	104 --time-format STRING Specify the format for printing {start}, {end}, and {duration} placeholders with --printf. [Default: %S]. Accepted
amine@449	105 formats are : - %S: absolute time in seconds - %I: absolute time in milliseconds - %h, %m, %s, %i: converts time into
amine@449	106 hours, minutes, seconds, and milliseconds (e.g., %h:%m:%s.%i) and only displays provided fields. Note that %S and %I can
amine@449	107 only be used independently.
amine@449	108 --timestamp-format TIMESTAMP_FORMAT
amine@449	109 Specify the format used for printing {timestamp}. Should be a format accepted by the 'datetime' standard module.
amine@449	110 [Default: '%Y/%m/%d %H:%M:%S'].
amine@449	111
amine@449	112
amine@432	113 Below, we provide several examples covering the most common use cases.
amine@379	114
amine@379	115
amine@441	116 Real-Time audio acquisition and event detection
amine@441	117 -----------------------------------------------
amine@379	118
amine@432	119 To try ``auditok`` from the command line with your own voice, you’ll need to
amine@432	120 either install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so
amine@432	121 that ``auditok`` can read directly from the microphone, or record audio with
amine@432	122 an external program (e.g., `sox`) and redirect its output to ``auditok``.
amine@379	123
amine@432	124 To read data directly from the microphone and use default parameters for audio
amine@432	125 data and tokenization, simply type:
amine@379	126
amine@379	127 .. code:: bash
amine@379	128
amine@379	129 auditok
amine@379	130
amine@432	131 This will print the id, start time, and end time of each detected
amine@432	132 audio event. As mentioned above, no additional arguments were passed in the
amine@432	133 previous command, so ``auditok`` will use its default values. The most important
amine@432	134 arguments are:
amine@379	135
amine@379	136
amine@432	137 - ``-n``, ``--min-duration``: minimum duration of a valid audio event in seconds, default: 0.2
amine@432	138 - ``-m``, ``--max-duration``: maximum duration of a valid audio event in seconds, default: 5
amine@432	139 - ``-s``, ``--max-silence``: maximum duration of a continuous silence within a valid audio event in seconds, default: 0.3
amine@432	140 - ``-e``, ``--energy-threshold``: energy threshold for detection, default: 50
amine@379	141
amine@379	142
amine@379	143 Read audio data with an external program
amine@379	144 ----------------------------------------
amine@432	145 You can use an external program, such as `sox` (``sudo apt-get install sox``),
amine@432	146 to record audio data in real-time, redirect it, and have `auditok` read the data
amine@432	147 from standard input:
amine@379	148
amine@379	149 .. code:: bash
amine@379	150
amine@379	151 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok - -r 16000 -w 2 -c 1
amine@379	152
amine@432	153 Note that when reading data from standard input, the same audio parameters must
amine@432	154 be set for both `sox` (or any other data generation/acquisition tool) and ``auditok``.
amine@432	155 The following table provides a summary of the audio parameters:
amine@379	156
amine@379	157 +-----------------+------------+------------------+-----------------------+
amine@379	158 \| Audio parameter \| sox option \| `auditok` option \| `auditok` default \|
amine@379	159 +=================+============+==================+=======================+
amine@379	160 \| Sampling rate \| -r \| -r \| 16000 \|
amine@379	161 +-----------------+------------+------------------+-----------------------+
amine@379	162 \| Sample width \| -b (bits) \| -w (bytes) \| 2 \|
amine@379	163 +-----------------+------------+------------------+-----------------------+
amine@379	164 \| Channels \| -c \| -c \| 1 \|
amine@379	165 +-----------------+------------+------------------+-----------------------+
amine@379	166 \| Encoding \| -e \| NA \| always a signed int \|
amine@379	167 +-----------------+------------+------------------+-----------------------+
amine@379	168
amine@432	169 Based on the table, the previous command can be run with the default parameters as:
amine@379	170
amine@379	171 .. code:: bash
amine@379	172
amine@432	173 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -
amine@432	174
amine@379	175
amine@379	176 Play back audio detections
amine@379	177 --------------------------
amine@379	178
amine@432	179 Use the ``-E`` (or ``--echo``) option :
amine@379	180
amine@379	181 .. code:: bash
amine@379	182
amine@379	183 auditok -E
amine@379	184 # or
amine@379	185 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok - -E
amine@379	186
amine@379	187
amine@379	188 Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C``
amine@379	189 (used to run an external command with detected audio event as argument):
amine@379	190
amine@379	191 .. code:: bash
amine@379	192
amine@379	193 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok - -C "play -q {file}"
amine@379	194
amine@379	195 Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav
amine@379	196 file, fill the ``{file}`` placeholder with the temporary name and run the
amine@379	197 command. In the above example we used ``-C`` to play audio data with an external
amine@379	198 program but you can use it to run any other command.
amine@379	199
amine@379	200
amine@441	201 Output detection details
amine@441	202 ------------------------
amine@379	203
amine@432	204 By default, ``auditok`` outputs the id, start, and end times for each
amine@432	205 detected audio event. The start and end values indicate the beginning and end of
amine@432	206 the event within the input stream (file or microphone) in seconds. Below is an
amine@432	207 example of the output in the default format:
amine@379	208
amine@379	209 .. code:: bash
amine@379	210
amine@379	211 1 1.160 2.390
amine@379	212 2 3.420 4.330
amine@379	213 3 5.010 5.720
amine@379	214 4 7.230 7.800
amine@379	215
amine@379	216 The format of the output is controlled by the ``--printf`` option. Alongside
amine@379	217 ``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and
amine@379	218 ``{timestamp}`` (system timestamp of detected event) placeholders.
amine@379	219
amine@379	220 Using the following format for example:
amine@379	221
amine@379	222 .. code:: bash
amine@379	223
amine@379	224 auditok audio.wav --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}"
amine@379	225
amine@432	226 the output will look like:
amine@379	227
amine@379	228 .. code:: bash
amine@379	229
amine@379	230 1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230
amine@379	231 2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910
amine@379	232 3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710
amine@379	233 4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570
amine@379	234
amine@379	235
amine@379	236 The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default:
amine@379	237 `"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}``
amine@379	238 by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed
amine@379	239 format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds)
amine@379	240 and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i).
amine@379	241
amine@379	242 To completely disable printing detection information use ``-q``.
amine@379	243
amine@441	244
amine@379	245 Save detections
amine@379	246 ---------------
amine@379	247
amine@379	248 You can save audio events to disk as they're detected using ``-o`` or
amine@441	249 ``--save-detections-as`` followed by a file name with placeholders. To create
amine@441	250 a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}``
amine@441	251 and ``{duration}`` placeholders as in this example:
amine@379	252
amine@379	253
amine@379	254 .. code:: bash
amine@379	255
amine@379	256 auditok --save-detections-as "{id}_{start}_{end}.wav"
amine@379	257
amine@432	258 When using ``{start}``, ``{end}``, and ``{duration}`` placeholders, it is
amine@432	259 recommended to limit the number of decimal places for these values to 3. You
amine@432	260 can do this with a format like:
amine@379	261
amine@379	262 .. code:: bash
amine@379	263
amine@379	264 auditok -o "{id}_{start:.3f}_{end:.3f}.wav"
amine@379	265
amine@379	266
amine@441	267 Save the full audio stream
amine@441	268 --------------------------
amine@379	269
amine@432	270 When reading audio data from the microphone, you may want to save it to disk.
amine@432	271 To do this, use the ``-O`` or ``--save-stream`` option:
amine@379	272
amine@379	273 .. code:: bash
amine@379	274
amine@432	275 auditok --save-stream output.wav
amine@379	276
amine@432	277 Note that this will work even if you read data from a file on disk.
amine@379	278
amine@379	279
amine@437	280 Join detected audio events, inserting a silence between them
amine@437	281 ------------------------------------------------------------
amine@432	282
amine@441	283 Sometimes, you may want to detect audio events and create a new file containing
amine@441	284 these events with pauses of a specific duration between them. This is useful if
amine@441	285 you wish to preserve your original audio data while adjusting the length of pauses
amine@441	286 (either shortening or extending them).
amine@432	287
amine@437	288 To achieve this, use the ``-j`` or ``--join-detections`` option together
amine@432	289 with the ``-O`` / ``--save-stream`` option. In the example below, we
amine@441	290 read data from ``input.wav`` and save audio events to ``output.wav``, adding
amine@432	291 1-second pauses between them:
amine@432	292
amine@432	293 .. code:: bash
amine@432	294
amine@432	295 auditok input.wav --join-detections 1 -O output.wav
amine@432	296
amine@441	297
amine@379	298 Plot detections
amine@379	299 ---------------
amine@379	300
amine@379	301 Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option.
amine@441	302 You can also save the plot to disk using ``--save-image``. The following example
amine@432	303 demonstrates both:
amine@379	304
amine@379	305 .. code:: bash
amine@379	306
amine@379	307 auditok -p --save-image "plot.png" # can also be 'pdf' or another image format
amine@379	308
amine@379	309 output example:
amine@379	310
amine@379	311 .. image:: figures/example_1.png
amine@379	312
amine@379	313 Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.

Mercurial > hg > auditok

annotate doc/command_line_usage.rst @ 455:7dae98b84cdd tip master