amine@432
|
1 Command-line guide
|
amine@432
|
2 ==================
|
amine@379
|
3
|
amine@432
|
4 ``auditok`` can also be used from the command line. For information
|
amine@432
|
5 about available parameters and descriptions, type:
|
amine@379
|
6
|
amine@379
|
7 .. code:: bash
|
amine@379
|
8
|
amine@379
|
9 auditok -h
|
amine@379
|
10
|
amine@449
|
11
|
amine@449
|
12 .. code::
|
amine@449
|
13
|
amine@449
|
14 usage: auditok [-h] [--version] [-I INT] [-F INT] [-f STRING] [-M FLOAT] [-L] [-O FILE] [-o STRING] [-j FLOAT] [-T STRING] [-u INT/STRING]
|
amine@449
|
15 [-a FLOAT] [-n FLOAT] [-m FLOAT] [-s FLOAT] [-d] [-R] [-e FLOAT] [-r INT] [-c INT] [-w INT] [-C STRING] [-E] [-B] [-p]
|
amine@449
|
16 [--save-image FILE] [--printf STRING] [--time-format STRING] [--timestamp-format TIMESTAMP_FORMAT] [-q] [-D] [--debug-file FILE]
|
amine@449
|
17 [input]
|
amine@449
|
18
|
amine@449
|
19 auditok, an audio tokenization tool.
|
amine@449
|
20
|
amine@449
|
21 options:
|
amine@449
|
22 -h, --help show this help message and exit
|
amine@449
|
23 --version, -v show program's version number and exit
|
amine@449
|
24 -q, --quiet Quiet mode: Do not display any information on the screen.
|
amine@449
|
25 -D, --debug Debug mode: output processing operations to STDOUT.
|
amine@449
|
26 --debug-file FILE Save processing operations to the specified file.
|
amine@449
|
27
|
amine@449
|
28 Input-Output options::
|
amine@449
|
29 input Input audio or video file. Use '-' for stdin [Default: read from a microphone using PyAudio].
|
amine@449
|
30 -I INT, --input-device-index INT
|
amine@449
|
31 Audio device index [Default: None]. Optional and only effective when using PyAudio.
|
amine@449
|
32 -F INT, --audio-frame-per-buffer INT
|
amine@449
|
33 Audio frame per buffer [Default: 1024]. Optional and only effective when using PyAudio.
|
amine@449
|
34 -f STRING, --input-format STRING
|
amine@449
|
35 Specify the input audio file format. If not provided, the format is inferred from the file extension. If the output file
|
amine@449
|
36 name lacks an extension, the format is guessed from the file header (requires pydub). If neither condition is met, an
|
amine@449
|
37 error is raised.
|
amine@449
|
38 -M FLOAT, --max-read FLOAT
|
amine@449
|
39 Maximum data (in seconds) to read from a microphone or a file [Default: read until the end of the file or stream].
|
amine@449
|
40 -L, --large-file Whether the input file should be treated as a large file. If True, data will be read from file on demand, otherwise all
|
amine@449
|
41 audio data is loaded into memory before tokenization.
|
amine@449
|
42 -O FILE, --save-stream FILE
|
amine@449
|
43 Save read audio data (from a file or a microphone) to a file. If omitted, no audio data will be saved.
|
amine@449
|
44 -o STRING, --save-detections-as STRING
|
amine@449
|
45 Specify the file name format to save detected events. You can use the following placeholders to construct the output
|
amine@449
|
46 file name: {id} (sequential, starting from 1), {start}, {end}, and {duration}. Time placeholders are in seconds.
|
amine@449
|
47 Example: 'Event_{id}{start}-{end}{duration:.3f}.wav'
|
amine@449
|
48 -j FLOAT, --join-detections FLOAT
|
amine@449
|
49 Join (glue) detected audio events with a specified duration of silence between them. To be used in combination with the
|
amine@449
|
50 --save-stream / -O option.
|
amine@449
|
51 -T STRING, --output-format STRING
|
amine@449
|
52 Specify the audio format for saving detections and/or the main stream. If not provided, the format will be (1) inferred
|
amine@449
|
53 from the file extension or (2) default to raw format.
|
amine@449
|
54 -u INT/STRING, --use-channel INT/STRING
|
amine@449
|
55 Specify the audio channel to use for tokenization when the input stream is multi-channel (0 refers to the first
|
amine@449
|
56 channel). By default, this is set to None, meaning all channels are used, capturing any valid audio event from any
|
amine@449
|
57 channel. Alternatively, set this to 'mix' (or 'avg'/'average') to combine all channels into a single averaged channel
|
amine@449
|
58 for tokenization. Regardless of theoption chosen, saved audio events will have the same number of channels as the input
|
amine@449
|
59 stream. [Default: None, use all channels].
|
amine@449
|
60
|
amine@449
|
61 Tokenization options::
|
amine@449
|
62 Set audio events' duration and set the threshold for detection.
|
amine@449
|
63
|
amine@449
|
64 -a FLOAT, --analysis-window FLOAT
|
amine@449
|
65 Specify the size of the analysis window in seconds. [Default: 0.01 (10ms)].
|
amine@449
|
66 -n FLOAT, --min-duration FLOAT
|
amine@449
|
67 Minimum duration of a valid audio event in seconds. [Default: 0.2].
|
amine@449
|
68 -m FLOAT, --max-duration FLOAT
|
amine@449
|
69 Maximum duration of a valid audio event in seconds. [Default: 5].
|
amine@449
|
70 -s FLOAT, --max-silence FLOAT
|
amine@449
|
71 Maximum duration of consecutive silence allowed within a valid audio event in seconds. [Default: 0.3]
|
amine@449
|
72 -d, --drop-trailing-silence
|
amine@449
|
73 Remove trailing silence from a detection. [Default: trailing silence is retained].
|
amine@449
|
74 -R, --strict-min-duration
|
amine@449
|
75 Reject events shorter than --min-duration, even if adjacent to the most recent valid event that reached max-duration.
|
amine@449
|
76 [Default: retain such events].
|
amine@449
|
77 -e FLOAT, --energy-threshold FLOAT
|
amine@449
|
78 Set the log energy threshold for detection. [Default: 50]
|
amine@449
|
79
|
amine@449
|
80 Audio parameters::
|
amine@449
|
81 Set audio parameters when reading from a headerless file (raw or stdin) or when using custom microphone settings.
|
amine@449
|
82
|
amine@449
|
83 -r INT, --rate INT Sampling rate of audio data [Default: 16000].
|
amine@449
|
84 -c INT, --channels INT
|
amine@449
|
85 Number of channels of audio data [Default: 1].
|
amine@449
|
86 -w INT, --width INT Number of bytes per audio sample [Default: 2].
|
amine@449
|
87
|
amine@449
|
88 Use audio events::
|
amine@449
|
89 Use these options to print, play, or plot detected audio events.
|
amine@449
|
90
|
amine@449
|
91 -C STRING, --command STRING
|
amine@449
|
92 Provide a command to execute when an audio event is detected. Use '{file}' as a placeholder for the temporary WAV file
|
amine@449
|
93 containing the event data (e.g., `-C 'du -h {file}'` to display the file size or `-C 'play -q {file}'` to play audio
|
amine@449
|
94 with sox).
|
amine@449
|
95 -E, --echo Immediately play back a detected audio event using pyaudio.
|
amine@449
|
96 -B, --progress-bar Show a progress bar when playing audio.
|
amine@449
|
97 -p, --plot Plot and displays the audio signal along with detections (requires matplotlib).
|
amine@449
|
98 --save-image FILE Save the plotted audio signal and detections as a picture or a PDF file (requires matplotlib).
|
amine@449
|
99 --printf STRING Prints information about each audio event on a new line using the specified format. The format can include text and
|
amine@449
|
100 placeholders: {id} (sequential, starting from 1), {start}, {end}, {duration}, and {timestamp}. The first three time
|
amine@449
|
101 placeholders are in seconds, with formatting controlled by the --time-format argument. {timestamp} represents the system
|
amine@449
|
102 date and time of the event, configurable with the --timestamp-format argument. Example: '[{id}]: {start} -> {end} --
|
amine@449
|
103 {timestamp}'.
|
amine@449
|
104 --time-format STRING Specify the format for printing {start}, {end}, and {duration} placeholders with --printf. [Default: %S]. Accepted
|
amine@449
|
105 formats are : - %S: absolute time in seconds - %I: absolute time in milliseconds - %h, %m, %s, %i: converts time into
|
amine@449
|
106 hours, minutes, seconds, and milliseconds (e.g., %h:%m:%s.%i) and only displays provided fields. Note that %S and %I can
|
amine@449
|
107 only be used independently.
|
amine@449
|
108 --timestamp-format TIMESTAMP_FORMAT
|
amine@449
|
109 Specify the format used for printing {timestamp}. Should be a format accepted by the 'datetime' standard module.
|
amine@449
|
110 [Default: '%Y/%m/%d %H:%M:%S'].
|
amine@449
|
111
|
amine@449
|
112
|
amine@432
|
113 Below, we provide several examples covering the most common use cases.
|
amine@379
|
114
|
amine@379
|
115
|
amine@441
|
116 Real-Time audio acquisition and event detection
|
amine@441
|
117 -----------------------------------------------
|
amine@379
|
118
|
amine@432
|
119 To try ``auditok`` from the command line with your own voice, you’ll need to
|
amine@432
|
120 either install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so
|
amine@432
|
121 that ``auditok`` can read directly from the microphone, or record audio with
|
amine@432
|
122 an external program (e.g., `sox`) and redirect its output to ``auditok``.
|
amine@379
|
123
|
amine@432
|
124 To read data directly from the microphone and use default parameters for audio
|
amine@432
|
125 data and tokenization, simply type:
|
amine@379
|
126
|
amine@379
|
127 .. code:: bash
|
amine@379
|
128
|
amine@379
|
129 auditok
|
amine@379
|
130
|
amine@432
|
131 This will print the **id**, **start time**, and **end time** of each detected
|
amine@432
|
132 audio event. As mentioned above, no additional arguments were passed in the
|
amine@432
|
133 previous command, so ``auditok`` will use its default values. The most important
|
amine@432
|
134 arguments are:
|
amine@379
|
135
|
amine@379
|
136
|
amine@432
|
137 - ``-n``, ``--min-duration``: minimum duration of a valid audio event in seconds, default: 0.2
|
amine@432
|
138 - ``-m``, ``--max-duration``: maximum duration of a valid audio event in seconds, default: 5
|
amine@432
|
139 - ``-s``, ``--max-silence``: maximum duration of a continuous silence within a valid audio event in seconds, default: 0.3
|
amine@432
|
140 - ``-e``, ``--energy-threshold``: energy threshold for detection, default: 50
|
amine@379
|
141
|
amine@379
|
142
|
amine@379
|
143 Read audio data with an external program
|
amine@379
|
144 ----------------------------------------
|
amine@432
|
145 You can use an external program, such as `sox` (``sudo apt-get install sox``),
|
amine@432
|
146 to record audio data in real-time, redirect it, and have `auditok` read the data
|
amine@432
|
147 from standard input:
|
amine@379
|
148
|
amine@379
|
149 .. code:: bash
|
amine@379
|
150
|
amine@379
|
151 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -r 16000 -w 2 -c 1
|
amine@379
|
152
|
amine@432
|
153 Note that when reading data from standard input, the same audio parameters must
|
amine@432
|
154 be set for both `sox` (or any other data generation/acquisition tool) and ``auditok``.
|
amine@432
|
155 The following table provides a summary of the audio parameters:
|
amine@379
|
156
|
amine@379
|
157 +-----------------+------------+------------------+-----------------------+
|
amine@379
|
158 | Audio parameter | sox option | `auditok` option | `auditok` default |
|
amine@379
|
159 +=================+============+==================+=======================+
|
amine@379
|
160 | Sampling rate | -r | -r | 16000 |
|
amine@379
|
161 +-----------------+------------+------------------+-----------------------+
|
amine@379
|
162 | Sample width | -b (bits) | -w (bytes) | 2 |
|
amine@379
|
163 +-----------------+------------+------------------+-----------------------+
|
amine@379
|
164 | Channels | -c | -c | 1 |
|
amine@379
|
165 +-----------------+------------+------------------+-----------------------+
|
amine@379
|
166 | Encoding | -e | NA | always a signed int |
|
amine@379
|
167 +-----------------+------------+------------------+-----------------------+
|
amine@379
|
168
|
amine@432
|
169 Based on the table, the previous command can be run with the default parameters as:
|
amine@379
|
170
|
amine@379
|
171 .. code:: bash
|
amine@379
|
172
|
amine@432
|
173 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -
|
amine@432
|
174
|
amine@379
|
175
|
amine@379
|
176 Play back audio detections
|
amine@379
|
177 --------------------------
|
amine@379
|
178
|
amine@432
|
179 Use the ``-E`` (or ``--echo``) option :
|
amine@379
|
180
|
amine@379
|
181 .. code:: bash
|
amine@379
|
182
|
amine@379
|
183 auditok -E
|
amine@379
|
184 # or
|
amine@379
|
185 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -E
|
amine@379
|
186
|
amine@379
|
187
|
amine@379
|
188 Using ``-E`` requires `pyaudio`, if it's not installed you can use the ``-C``
|
amine@379
|
189 (used to run an external command with detected audio event as argument):
|
amine@379
|
190
|
amine@379
|
191 .. code:: bash
|
amine@379
|
192
|
amine@379
|
193 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok - -C "play -q {file}"
|
amine@379
|
194
|
amine@379
|
195 Using the ``-C`` option, ``auditok`` will save a detected event to a temporary wav
|
amine@379
|
196 file, fill the ``{file}`` placeholder with the temporary name and run the
|
amine@379
|
197 command. In the above example we used ``-C`` to play audio data with an external
|
amine@379
|
198 program but you can use it to run any other command.
|
amine@379
|
199
|
amine@379
|
200
|
amine@441
|
201 Output detection details
|
amine@441
|
202 ------------------------
|
amine@379
|
203
|
amine@432
|
204 By default, ``auditok`` outputs the **id**, **start**, and **end** times for each
|
amine@432
|
205 detected audio event. The start and end values indicate the beginning and end of
|
amine@432
|
206 the event within the input stream (file or microphone) in seconds. Below is an
|
amine@432
|
207 example of the output in the default format:
|
amine@379
|
208
|
amine@379
|
209 .. code:: bash
|
amine@379
|
210
|
amine@379
|
211 1 1.160 2.390
|
amine@379
|
212 2 3.420 4.330
|
amine@379
|
213 3 5.010 5.720
|
amine@379
|
214 4 7.230 7.800
|
amine@379
|
215
|
amine@379
|
216 The format of the output is controlled by the ``--printf`` option. Alongside
|
amine@379
|
217 ``{id}``, ``{start}`` and ``{end}`` placeholders, you can use ``{duration}`` and
|
amine@379
|
218 ``{timestamp}`` (system timestamp of detected event) placeholders.
|
amine@379
|
219
|
amine@379
|
220 Using the following format for example:
|
amine@379
|
221
|
amine@379
|
222 .. code:: bash
|
amine@379
|
223
|
amine@379
|
224 auditok audio.wav --printf "{id}: [{timestamp}] start:{start}, end:{end}, dur: {duration}"
|
amine@379
|
225
|
amine@432
|
226 the output will look like:
|
amine@379
|
227
|
amine@379
|
228 .. code:: bash
|
amine@379
|
229
|
amine@379
|
230 1: [2021/02/17 20:16:02] start:1.160, end:2.390, dur: 1.230
|
amine@379
|
231 2: [2021/02/17 20:16:04] start:3.420, end:4.330, dur: 0.910
|
amine@379
|
232 3: [2021/02/17 20:16:06] start:5.010, end:5.720, dur: 0.710
|
amine@379
|
233 4: [2021/02/17 20:16:08] start:7.230, end:7.800, dur: 0.570
|
amine@379
|
234
|
amine@379
|
235
|
amine@379
|
236 The format of ``{timestamp}`` is controlled by ``--timestamp-format`` (default:
|
amine@379
|
237 `"%Y/%m/%d %H:%M:%S"`) whereas that of ``{start}``, ``{end}`` and ``{duration}``
|
amine@379
|
238 by ``--time-format`` (default: `%S`, absolute number of seconds). A more detailed
|
amine@379
|
239 format with ``--time-format`` using `%h` (hours), `%m` (minutes), `%s` (seconds)
|
amine@379
|
240 and `%i` (milliseconds) directives is possible (e.g., "%h:%m:%s.%i).
|
amine@379
|
241
|
amine@379
|
242 To completely disable printing detection information use ``-q``.
|
amine@379
|
243
|
amine@441
|
244
|
amine@379
|
245 Save detections
|
amine@379
|
246 ---------------
|
amine@379
|
247
|
amine@379
|
248 You can save audio events to disk as they're detected using ``-o`` or
|
amine@441
|
249 ``--save-detections-as`` followed by a file name with placeholders. To create
|
amine@441
|
250 a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}``
|
amine@441
|
251 and ``{duration}`` placeholders as in this example:
|
amine@379
|
252
|
amine@379
|
253
|
amine@379
|
254 .. code:: bash
|
amine@379
|
255
|
amine@379
|
256 auditok --save-detections-as "{id}_{start}_{end}.wav"
|
amine@379
|
257
|
amine@432
|
258 When using ``{start}``, ``{end}``, and ``{duration}`` placeholders, it is
|
amine@432
|
259 recommended to limit the number of decimal places for these values to 3. You
|
amine@432
|
260 can do this with a format like:
|
amine@379
|
261
|
amine@379
|
262 .. code:: bash
|
amine@379
|
263
|
amine@379
|
264 auditok -o "{id}_{start:.3f}_{end:.3f}.wav"
|
amine@379
|
265
|
amine@379
|
266
|
amine@441
|
267 Save the full audio stream
|
amine@441
|
268 --------------------------
|
amine@379
|
269
|
amine@432
|
270 When reading audio data from the microphone, you may want to save it to disk.
|
amine@432
|
271 To do this, use the ``-O`` or ``--save-stream`` option:
|
amine@379
|
272
|
amine@379
|
273 .. code:: bash
|
amine@379
|
274
|
amine@432
|
275 auditok --save-stream output.wav
|
amine@379
|
276
|
amine@432
|
277 Note that this will work even if you read data from a file on disk.
|
amine@379
|
278
|
amine@379
|
279
|
amine@437
|
280 Join detected audio events, inserting a silence between them
|
amine@437
|
281 ------------------------------------------------------------
|
amine@432
|
282
|
amine@441
|
283 Sometimes, you may want to detect audio events and create a new file containing
|
amine@441
|
284 these events with pauses of a specific duration between them. This is useful if
|
amine@441
|
285 you wish to preserve your original audio data while adjusting the length of pauses
|
amine@441
|
286 (either shortening or extending them).
|
amine@432
|
287
|
amine@437
|
288 To achieve this, use the ``-j`` or ``--join-detections`` option together
|
amine@432
|
289 with the ``-O`` / ``--save-stream`` option. In the example below, we
|
amine@441
|
290 read data from ``input.wav`` and save audio events to ``output.wav``, adding
|
amine@432
|
291 1-second pauses between them:
|
amine@432
|
292
|
amine@432
|
293 .. code:: bash
|
amine@432
|
294
|
amine@432
|
295 auditok input.wav --join-detections 1 -O output.wav
|
amine@432
|
296
|
amine@441
|
297
|
amine@379
|
298 Plot detections
|
amine@379
|
299 ---------------
|
amine@379
|
300
|
amine@379
|
301 Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option.
|
amine@441
|
302 You can also save the plot to disk using ``--save-image``. The following example
|
amine@432
|
303 demonstrates both:
|
amine@379
|
304
|
amine@379
|
305 .. code:: bash
|
amine@379
|
306
|
amine@379
|
307 auditok -p --save-image "plot.png" # can also be 'pdf' or another image format
|
amine@379
|
308
|
amine@379
|
309 output example:
|
amine@379
|
310
|
amine@379
|
311 .. image:: figures/example_1.png
|
amine@379
|
312
|
amine@379
|
313 Plotting requires `matplotlib <https://matplotlib.org/stable/index.html>`_.
|