amine@32: `auditok` Command-line Usage Guide amine@32: ================================== amine@32: amine@32: This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases. amine@32: amine@32: .. contents:: `Contents` amine@32: :depth: 3 amine@32: amine@32: amine@32: ********************** amine@32: Two-figure explanation amine@32: ********************** amine@32: amine@35: The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to amine@35: a given threshold (red dashed line). They respectively depict the detection result when: amine@32: amine@32: 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event): amine@32: amine@32: .. figure:: figures/figure_1.png amine@32: :align: center amine@33: :alt: Output from a detector that tolerates silence periods up to 300 ms amine@32: :figclass: align-center amine@33: :scale: 40 % amine@32: amine@32: 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second: amine@32: amine@32: .. figure:: figures/figure_2.png amine@32: :align: center amine@33: :alt: Output from a detector that tolerates silence periods up to 200 ms amine@32: :figclass: align-center amine@33: :scale: 40 % amine@32: amine@35: Beyond plotting signal and detections, you can play back audio activities as they are detected, save them or run a user command each time there is an activity, amine@35: using, optionally, the file name of audio activity as an argument for the command. amine@32: amine@32: ****************** amine@32: Command line usage amine@32: ****************** amine@32: amine@32: Try the detector with your voice amine@32: ################################ amine@32: amine@35: The first thing you want to check is perhaps how **auditok** detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop): amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok amine@32: amine@35: This will print **id** **start-time** and **end-time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell **auditok** to read data from standard input: amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1 amine@32: amine@35: Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and **auditok**. The following table summarizes audio parameters. amine@32: amine@32: amine@35: +-----------------+------------+------------------+-----------------------+ amine@35: | Audio parameter | sox option | `auditok` option | `auditok` default | amine@35: +=================+============+==================+=======================+ amine@35: | Sampling rate | -r | -r | 16000 | amine@35: +-----------------+------------+------------------+-----------------------+ amine@35: | Sample width | -b (bits) | -w (bytes) | 2 | amine@35: +-----------------+------------+------------------+-----------------------+ amine@35: | Channels | -c | -c | 1 | amine@35: +-----------------+------------+------------------+-----------------------+ amine@35: | Encoding | -e | None | always signed integer | amine@35: +-----------------+------------+------------------+-----------------------+ amine@32: amine@32: According to this table, the previous command can be run as: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - amine@32: amine@32: Play back detections amine@32: #################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -E amine@32: amine@35: :or: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E amine@32: amine@35: Option `-E` stands for echo, so **auditok** will play back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" amine@32: amine@35: The `-C` option tells **auditok** to interpret its content as a command that should be run whenever **auditok** detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data. amine@32: amine@32: `rec` and `play` are just an alias for `sox`. amine@32: amine@32: The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence. amine@32: amine@32: Set detection threshold amine@32: ####################### amine@32: amine@32: If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -E -e 55 amine@32: amine@35: :or: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" amine@32: amine@32: If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`. amine@32: amine@32: Set format for printed detections information amine@32: ############################################# amine@32: amine@35: By default, **auditok** prints the **id**, **start-time** and **end-time** of each detected activity: amine@32: amine@32: .. code:: bash amine@32: amine@32: 1 1.87 2.67 amine@32: 2 3.05 3.73 amine@32: 3 3.97 4.49 amine@32: ... amine@32: amine@35: If you want to customize the output format, use `--printf` option: amine@35: amine@35: .. code:: bash amine@32: amine@32: auditok -e 55 --printf "[{id}]: {start} to {end}" amine@32: amine@35: :output: amine@32: amine@32: .. code:: bash amine@32: amine@32: [1]: 0.22 to 0.67 amine@32: [2]: 2.81 to 4.18 amine@32: [3]: 5.53 to 6.44 amine@32: [4]: 7.32 to 7.82 amine@32: ... amine@32: amine@32: Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`: amine@32: amine@32: auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i" amine@32: amine@35: :output: amine@32: amine@32: .. code:: bash amine@32: amine@32: [1]: 00:00:01.080 to 00:00:01.760 amine@32: [2]: 00:00:02.420 to 00:00:03.440 amine@32: [3]: 00:00:04.930 to 00:00:05.570 amine@32: [4]: 00:00:05.690 to 00:00:06.020 amine@32: [5]: 00:00:07.470 to 00:00:07.980 amine@32: ... amine@32: amine@32: Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. amine@32: amine@32: 1st Practical use case example: generate a subtitles template amine@32: ############################################################# amine@32: amine@32: Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" amine@32: amine@35: :output: amine@32: amine@32: .. code:: bash amine@32: amine@32: 1 amine@32: 00:00:00.730 --> 00:00:01.460 amine@32: Put some text here... amine@32: amine@32: 2 amine@32: 00:00:02.440 --> 00:00:03.900 amine@32: Put some text here... amine@32: amine@32: 3 amine@32: 00:00:06.410 --> 00:00:06.970 amine@32: Put some text here... amine@32: amine@32: 4 amine@32: 00:00:07.260 --> 00:00:08.340 amine@32: Put some text here... amine@32: amine@32: 5 amine@32: 00:00:09.510 --> 00:00:09.820 amine@32: Put some text here... amine@32: amine@32: amine@33: 2nd Practical use case example: build a (very) basic voice control application amine@33: ############################################################################## amine@32: amine@32: `This repository `_ supplies a bash script the can send audio data to Google's amine@32: Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component amine@32: of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain amine@32: number of commands that make up the rest of our voice control application. amine@32: amine@32: Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is: amine@32: amine@32: 1- Convert raw audio data to flac using **sox**: amine@32: amine@32: .. code:: bash amine@32: amine@32: sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac amine@32: amine@35: 2- Send flac audio data to Google and get its filtered transcription using `speech-rec.sh `_ : amine@32: amine@32: .. code:: bash amine@32: amine@32: speech-rec.sh -i output.flac -r 16000 amine@32: amine@35: 3- Use **grep** to select lines that contain *transcript*: amine@32: amine@32: .. code:: bash amine@32: amine@32: grep transcript amine@32: amine@32: amine@35: 4- Launch the following script, giving it the transcription as input: amine@32: amine@32: .. code:: bash amine@32: amine@32: #!/bin/bash amine@32: amine@32: read line amine@32: amine@32: RES=`echo "$line" | grep -i "open firefox"` amine@32: amine@32: if [[ $RES ]] amine@32: then amine@32: echo "Launch command: 'firefox &' ... " amine@32: firefox & amine@32: exit 0 amine@32: fi amine@32: amine@32: exit 0 amine@32: amine@35: As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **open firefox**. amine@32: Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**). amine@32: amine@35: Now, thanks to option `-C`, we will use the four instructions with a pipe and tell **auditok** to run them each time it detects amine@32: an audio activity. Try the following command and say *open firefox*: amine@32: amine@32: amine@32: .. code:: bash amine@32: amine@35: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file file.log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh" amine@32: amine@35: Here we used option `-M 5` to limit the amount of read audio data to 5 seconds (**auditok** stops if there are no more data) and amine@35: option `-n 1` to tell **auditok** to only accept tokens of 1 second or more and throw any token shorter than 1 second. amine@32: amine@35: With `--debug-file file.log`, all processing steps are written into file.log with their timestamps, including any run command and the file name the command was given. amine@32: amine@32: amine@32: Plot signal and detections amine@32: ########################## amine@32: amine@32: use option `-p`. Requires `matplotlib` and `numpy`. amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok ... -p amine@32: amine@32: amine@32: Save plot as image or PDF amine@32: ######################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok ... --save-image output.png amine@32: amine@32: Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff. amine@32: amine@32: amine@32: Read data from file amine@32: ################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -i input.wav ... amine@32: amine@32: Install `pydub` for other audio formats. amine@32: amine@32: amine@32: Limit the length of acquired data amine@32: ################################# amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -M 12 ... amine@32: amine@35: Time is in seconds. This is valid for data read from an audio device, stdin or an audio file. amine@32: amine@32: amine@32: Save the whole acquired audio signal amine@32: #################################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -O output.wav ... amine@32: amine@32: Install `pydub` for other audio formats. amine@32: amine@32: amine@32: Save each detection into a separate audio file amine@32: ############################################## amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -o det_{N}_{start}_{end}.wav ... amine@32: amine@32: You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -o {start}-{end}.wav ... amine@32: amine@32: Install `pydub` for more audio formats. amine@32: amine@32: amine@32: Setting detection parameters amine@32: ############################ amine@32: amine@32: Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table: amine@32: amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | Option | Description | Unit | Default | amine@32: +========+=======================================================+=========+==================+ amine@32: | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-m` | Maximum length an accepted audio activity should reach| second | 5. | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | amine@32: | | an accepted audio activity | | | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: amine@32: amine@35: Normally, `auditok` does keeps trailing silence of a detected activity. Trailing silence is at most as long as maximum length of a continuous silence (option `-m`) and can be important for some applications such as speech recognition. If you want to drop trailing silence anyway use option `-d`. The following two figures show the output of the detector when it keeps the trailing silence and when it drops it respectively: amine@35: amine@35: amine@35: .. figure:: figures/figure_3_keep_trailing_silence.png amine@35: :align: center amine@35: :alt: Output from a detector that keeps trailing silence amine@35: :figclass: align-center amine@35: :scale: 40 % amine@35: amine@35: amine@35: .. code:: bash amine@35: amine@35: auditok ... -d amine@35: amine@35: amine@35: .. figure:: figures/figure_4_drop_trailing_silence.png amine@35: :align: center amine@35: :alt: Output from a detector that drop trailing silence amine@35: :figclass: align-center amine@35: :scale: 40 % amine@35: amine@35: You might want to only consider audio activities if they are above a certain duration. The next figure is the result of a detector that only accepts detections of 0.8 second and longer: amine@35: amine@35: .. code:: bash amine@35: amine@35: auditok ... -n 0.8 amine@35: amine@35: amine@35: .. figure:: figures/figure_5_min_800ms.png amine@35: :align: center amine@35: :alt: Output from a detector that detect activities of 800 ms or over amine@35: :figclass: align-center amine@35: :scale: 40 % amine@35: amine@35: amine@35: Finally it is almost always interesting to limit the length of detected audio activities. In any case, one does not want a too long audio event such as an alarm or a drill to hog the detector. For illustration purposes, we set the maximum duration to 0.4 second for this detector, so an audio activity is delivered as soon as it reaches 0.4 second: amine@35: amine@35: .. code:: bash amine@35: amine@35: auditok ... -m 0.4 amine@35: amine@35: amine@35: .. figure:: figures/figure_6_max_400ms.png amine@35: :align: center amine@35: :alt: Output from a detector that delivers audio activities that reach 400 ms amine@35: :figclass: align-center amine@35: :scale: 40 % amine@35: amine@35: amine@35: Debugging amine@35: ######### amine@35: amine@35: If you want to print what happens when something is detected, use option `-D`. amine@35: amine@35: .. code:: bash amine@35: amine@35: auditok ... -D amine@35: amine@35: amine@35: If you want to save everything into a log file, use `--debug-file file.log`. amine@35: amine@35: .. code:: bash amine@35: amine@35: auditok ... --debug-file file.log amine@35: amine@35: amine@35: amine@35: amine@32: ******* amine@32: License amine@32: ******* amine@32: amine@35: **auditok** is published under the GNU General Public License Version 3. amine@32: amine@32: ****** amine@32: Author amine@32: ****** amine@32: Amine Sehili ()