amine@32: `auditok` Command-line Usage Guide amine@32: ================================== amine@32: amine@32: This user guide will go through a few of the most useful operations you can use **auditok** for and present two practical use cases. amine@32: amine@32: amine@32: .. contents:: `Contents` amine@32: :depth: 3 amine@32: amine@32: amine@32: ********************** amine@32: Two-figure explanation amine@32: ********************** amine@32: amine@32: The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when: amine@32: amine@32: 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event): amine@32: amine@32: .. figure:: figures/figure_1.png amine@32: :align: center amine@33: :alt: Output from a detector that tolerates silence periods up to 300 ms amine@32: :figclass: align-center amine@33: :scale: 40 % amine@32: amine@32: 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second: amine@32: amine@32: .. figure:: figures/figure_2.png amine@32: :align: center amine@33: :alt: Output from a detector that tolerates silence periods up to 200 ms amine@32: :figclass: align-center amine@33: :scale: 40 % amine@32: amine@32: amine@32: ****************** amine@32: Command line usage amine@32: ****************** amine@32: amine@32: Try the detector with your voice amine@32: ################################ amine@32: amine@32: The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop): amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok amine@32: amine@32: This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input: amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1 amine@32: amine@32: Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters. amine@32: amine@32: amine@32: +-----------------+------------+----------------+-----------------------+ amine@32: | Audio parameter | sox option | auditok option | `auditok` default | amine@32: +=================+============+================+=======================+ amine@32: | Sampling rate | -r | -r | 16000 | amine@32: +-----------------+------------+----------------+-----------------------+ amine@32: | Sample width | -b (bits) | -w (bytes) | 2 | amine@32: +-----------------+------------+----------------+-----------------------+ amine@32: | Channels | -c | -c | 1 | amine@32: +-----------------+------------+----------------+-----------------------+ amine@32: | Encoding | -e | None | always signed integer | amine@32: +-----------------+------------+----------------+-----------------------+ amine@32: amine@32: According to this table, the previous command can be run as: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - amine@32: amine@32: Play back detections amine@32: #################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -E amine@32: amine@32: OR amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E amine@32: amine@32: Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" amine@32: amine@32: The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data. amine@32: amine@32: `rec` and `play` are just an alias for `sox`. amine@32: amine@32: The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence. amine@32: amine@32: Set detection threshold amine@32: ####################### amine@32: amine@32: If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -E -e 55 amine@32: amine@32: OR amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" amine@32: amine@32: If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`. amine@32: amine@32: Set format for printed detections information amine@32: ############################################# amine@32: amine@32: By default, `auditok` prints the `id` `start time` `end time` of each detected activity: amine@32: amine@32: .. code:: bash amine@32: amine@32: 1 1.87 2.67 amine@32: 2 3.05 3.73 amine@32: 3 3.97 4.49 amine@32: ... amine@32: amine@32: If you want to personalize the output format, use `--printf` option: amine@32: amine@32: auditok -e 55 --printf "[{id}]: {start} to {end}" amine@32: amine@32: Output: amine@32: amine@32: .. code:: bash amine@32: amine@32: [1]: 0.22 to 0.67 amine@32: [2]: 2.81 to 4.18 amine@32: [3]: 5.53 to 6.44 amine@32: [4]: 7.32 to 7.82 amine@32: ... amine@32: amine@32: Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`: amine@32: amine@32: auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i" amine@32: amine@32: Output: amine@32: amine@32: .. code:: bash amine@32: amine@32: [1]: 00:00:01.080 to 00:00:01.760 amine@32: [2]: 00:00:02.420 to 00:00:03.440 amine@32: [3]: 00:00:04.930 to 00:00:05.570 amine@32: [4]: 00:00:05.690 to 00:00:06.020 amine@32: [5]: 00:00:07.470 to 00:00:07.980 amine@32: ... amine@32: amine@32: Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. amine@32: amine@32: 1st Practical use case example: generate a subtitles template amine@32: ############################################################# amine@32: amine@32: Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" amine@32: amine@32: Output: amine@32: amine@32: .. code:: bash amine@32: amine@32: 1 amine@32: 00:00:00.730 --> 00:00:01.460 amine@32: Put some text here... amine@32: amine@32: 2 amine@32: 00:00:02.440 --> 00:00:03.900 amine@32: Put some text here... amine@32: amine@32: 3 amine@32: 00:00:06.410 --> 00:00:06.970 amine@32: Put some text here... amine@32: amine@32: 4 amine@32: 00:00:07.260 --> 00:00:08.340 amine@32: Put some text here... amine@32: amine@32: 5 amine@32: 00:00:09.510 --> 00:00:09.820 amine@32: Put some text here... amine@32: amine@32: amine@33: 2nd Practical use case example: build a (very) basic voice control application amine@33: ############################################################################## amine@32: amine@32: `This repository `_ supplies a bash script the can send audio data to Google's amine@32: Speech Recognition service and get its transcription. In the following we will use **auditok** as a lower layer component amine@32: of a voice control application. The basic idea is to tell **auditok** to run, for each detected audio activity, a certain amine@32: number of commands that make up the rest of our voice control application. amine@32: amine@32: Assume you have installed **sox** and downloaded the Speech Recognition script. The sequence of commands to run is: amine@32: amine@32: 1- Convert raw audio data to flac using **sox**: amine@32: amine@32: .. code:: bash amine@32: amine@32: sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac amine@32: amine@32: 2- Send falc audio to google and get its filtred transcription using `speech-rec.sh `_ : amine@32: amine@32: .. code:: bash amine@32: amine@32: speech-rec.sh -i output.flac -r 16000 amine@32: amine@32: 3- Use **grep** to select lines that coantain *transcript*: amine@32: amine@32: .. code:: bash amine@32: amine@32: grep transcript amine@32: amine@32: amine@32: 4- Launch the followin script, giving it the transcription as input: amine@32: amine@32: .. code:: bash amine@32: amine@32: #!/bin/bash amine@32: amine@32: read line amine@32: amine@32: RES=`echo "$line" | grep -i "open firefox"` amine@32: amine@32: if [[ $RES ]] amine@32: then amine@32: echo "Launch command: 'firefox &' ... " amine@32: firefox & amine@32: exit 0 amine@32: fi amine@32: amine@32: exit 0 amine@32: amine@32: As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains **run firefox**. amine@32: Save a script into a file named voice-control.sh (don't forget to run a **chmod u+x voice-control.sh**). amine@32: amine@32: Now, thanks to option `-C`, we will use the three instructions with a pipe and tell auditok to run them for every time it detects amine@32: an audio activity. Try the following command and say *open firefox*: amine@32: amine@32: amine@32: .. code:: bash amine@32: amine@32: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -M 5 -m 3 -n 1 --debug-file log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 | grep transcript | ./voice-control.sh" amine@32: amine@32: amine@32: amine@32: amine@32: Plot signal and detections amine@32: ########################## amine@32: amine@32: use option `-p`. Requires `matplotlib` and `numpy`. amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok ... -p amine@32: amine@32: amine@32: Save plot as image or PDF amine@32: ######################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok ... --save-image output.png amine@32: amine@32: Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff. amine@32: amine@32: amine@32: Read data from file amine@32: ################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -i input.wav ... amine@32: amine@32: Install `pydub` for other audio formats. amine@32: amine@32: amine@32: Limit the length of acquired data amine@32: ################################# amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -M 12 ... amine@32: amine@32: Time is in seconds. amine@32: amine@32: amine@32: Save the whole acquired audio signal amine@32: #################################### amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -O output.wav ... amine@32: amine@32: Install `pydub` for other audio formats. amine@32: amine@32: amine@32: Save each detection into a separate audio file amine@32: ############################################## amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -o det_{N}_{start}_{end}.wav ... amine@32: amine@32: You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example: amine@32: amine@32: .. code:: bash amine@32: amine@32: auditok -o {start}-{end}.wav ... amine@32: amine@32: Install `pydub` for more audio formats. amine@32: amine@32: amine@32: Setting detection parameters amine@32: ############################ amine@32: amine@32: Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table: amine@32: amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | Option | Description | Unit | Default | amine@32: +========+=======================================================+=========+==================+ amine@32: | `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-m` | Maximum length an accepted audio activity should reach| second | 5. | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | amine@32: | | an accepted audio activity | | | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-d` | Drop trailing silence from an accepted audio activity | boolean | False | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: | `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | amine@32: +--------+-------------------------------------------------------+---------+------------------+ amine@32: amine@32: amine@32: ******* amine@32: License amine@32: ******* amine@32: amine@32: `auditok` is published under the GNU General Public License Version 3. amine@32: amine@32: ****** amine@32: Author amine@32: ****** amine@32: Amine Sehili ()