auditok: doc/cmdline.rst annotate

annotate doc/cmdline.rst @ 33:d28d94bf6b39

doc update

author	Amine Sehili <amine.sehili@gmail.com>
date	Wed, 02 Dec 2015 11:10:54 +0100
parents	ea905bc19458
children	929c1e7477ac

rev	line source
amine@32	1 `auditok` Command-line Usage Guide
amine@32	2 ==================================
amine@32	3
amine@32	4 This user guide will go through a few of the most useful operations you can use auditok for and present two practical use cases.
amine@32	5
amine@32	6
amine@32	7 .. contents:: `Contents`
amine@32	8 :depth: 3
amine@32	9
amine@32	10
amine@32	11 **********************
amine@32	12 Two-figure explanation
amine@32	13 **********************
amine@32	14
amine@32	15 The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:
amine@32	16
amine@32	17 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
amine@32	18
amine@32	19 .. figure:: figures/figure_1.png
amine@32	20 :align: center
amine@33	21 :alt: Output from a detector that tolerates silence periods up to 300 ms
amine@32	22 :figclass: align-center
amine@33	23 :scale: 40 %
amine@32	24
amine@32	25 2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
amine@32	26
amine@32	27 .. figure:: figures/figure_2.png
amine@32	28 :align: center
amine@33	29 :alt: Output from a detector that tolerates silence periods up to 200 ms
amine@32	30 :figclass: align-center
amine@33	31 :scale: 40 %
amine@32	32
amine@32	33
amine@32	34 ******************
amine@32	35 Command line usage
amine@32	36 ******************
amine@32	37
amine@32	38 Try the detector with your voice
amine@32	39 ################################
amine@32	40
amine@32	41 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):
amine@32	42
amine@32	43 .. code:: bash
amine@32	44
amine@32	45 auditok
amine@32	46
amine@32	47 This will print id start time and end time for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:
amine@32	48
amine@32	49 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -r 16000 -w 2 -c 1
amine@32	50
amine@32	51 Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.
amine@32	52
amine@32	53
amine@32	54 +-----------------+------------+----------------+-----------------------+
amine@32	55 \| Audio parameter \| sox option \| auditok option \| `auditok` default \|
amine@32	56 +=================+============+================+=======================+
amine@32	57 \| Sampling rate \| -r \| -r \| 16000 \|
amine@32	58 +-----------------+------------+----------------+-----------------------+
amine@32	59 \| Sample width \| -b (bits) \| -w (bytes) \| 2 \|
amine@32	60 +-----------------+------------+----------------+-----------------------+
amine@32	61 \| Channels \| -c \| -c \| 1 \|
amine@32	62 +-----------------+------------+----------------+-----------------------+
amine@32	63 \| Encoding \| -e \| None \| always signed integer \|
amine@32	64 +-----------------+------------+----------------+-----------------------+
amine@32	65
amine@32	66 According to this table, the previous command can be run as:
amine@32	67
amine@32	68 .. code:: bash
amine@32	69
amine@32	70 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i -
amine@32	71
amine@32	72 Play back detections
amine@32	73 ####################
amine@32	74
amine@32	75 .. code:: bash
amine@32	76
amine@32	77 auditok -E
amine@32	78
amine@32	79 OR
amine@32	80
amine@32	81 .. code:: bash
amine@32	82
amine@32	83 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -E
amine@32	84
amine@32	85 Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
amine@32	86
amine@32	87 .. code:: bash
amine@32	88
amine@32	89 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32	90
amine@32	91 The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
amine@32	92
amine@32	93 `rec` and `play` are just an alias for `sox`.
amine@32	94
amine@32	95 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.
amine@32	96
amine@32	97 Set detection threshold
amine@32	98 #######################
amine@32	99
amine@32	100 If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
amine@32	101
amine@32	102 .. code:: bash
amine@32	103
amine@32	104 auditok -E -e 55
amine@32	105
amine@32	106 OR
amine@32	107
amine@32	108 .. code:: bash
amine@32	109
amine@32	110 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
amine@32	111
amine@32	112 If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
amine@32	113
amine@32	114 Set format for printed detections information
amine@32	115 #############################################
amine@32	116
amine@32	117 By default, `auditok` prints the `id` `start time` `end time` of each detected activity:
amine@32	118
amine@32	119 .. code:: bash
amine@32	120
amine@32	121 1 1.87 2.67
amine@32	122 2 3.05 3.73
amine@32	123 3 3.97 4.49
amine@32	124 ...
amine@32	125
amine@32	126 If you want to personalize the output format, use `--printf` option:
amine@32	127
amine@32	128 auditok -e 55 --printf "[{id}]: {start} to {end}"
amine@32	129
amine@32	130 Output:
amine@32	131
amine@32	132 .. code:: bash
amine@32	133
amine@32	134 [1]: 0.22 to 0.67
amine@32	135 [2]: 2.81 to 4.18
amine@32	136 [3]: 5.53 to 6.44
amine@32	137 [4]: 7.32 to 7.82
amine@32	138 ...
amine@32	139
amine@32	140 Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
amine@32	141
amine@32	142 auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
amine@32	143
amine@32	144 Output:
amine@32	145
amine@32	146 .. code:: bash
amine@32	147
amine@32	148 [1]: 00:00:01.080 to 00:00:01.760
amine@32	149 [2]: 00:00:02.420 to 00:00:03.440
amine@32	150 [3]: 00:00:04.930 to 00:00:05.570
amine@32	151 [4]: 00:00:05.690 to 00:00:06.020
amine@32	152 [5]: 00:00:07.470 to 00:00:07.980
amine@32	153 ...
amine@32	154
amine@32	155 Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
amine@32	156
amine@32	157 1st Practical use case example: generate a subtitles template
amine@32	158 #############################################################
amine@32	159
amine@32	160 Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an srt file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
amine@32	161
amine@32	162 .. code:: bash
amine@32	163
amine@32	164 auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
amine@32	165
amine@32	166 Output:
amine@32	167
amine@32	168 .. code:: bash
amine@32	169
amine@32	170 1
amine@32	171 00:00:00.730 --> 00:00:01.460
amine@32	172 Put some text here...
amine@32	173
amine@32	174 2
amine@32	175 00:00:02.440 --> 00:00:03.900
amine@32	176 Put some text here...
amine@32	177
amine@32	178 3
amine@32	179 00:00:06.410 --> 00:00:06.970
amine@32	180 Put some text here...
amine@32	181
amine@32	182 4
amine@32	183 00:00:07.260 --> 00:00:08.340
amine@32	184 Put some text here...
amine@32	185
amine@32	186 5
amine@32	187 00:00:09.510 --> 00:00:09.820
amine@32	188 Put some text here...
amine@32	189
amine@32	190
amine@33	191 2nd Practical use case example: build a (very) basic voice control application
amine@33	192 ##############################################################################
amine@32	193
amine@32	194 `This repository <https://github.com/amsehili/gspeech-rec>`_ supplies a bash script the can send audio data to Google's
amine@32	195 Speech Recognition service and get its transcription. In the following we will use auditok as a lower layer component
amine@32	196 of a voice control application. The basic idea is to tell auditok to run, for each detected audio activity, a certain
amine@32	197 number of commands that make up the rest of our voice control application.
amine@32	198
amine@32	199 Assume you have installed sox and downloaded the Speech Recognition script. The sequence of commands to run is:
amine@32	200
amine@32	201 1- Convert raw audio data to flac using sox:
amine@32	202
amine@32	203 .. code:: bash
amine@32	204
amine@32	205 sox -t raw -r 16000 -c 1 -b 16 -e signed raw_input output.flac
amine@32	206
amine@32	207 2- Send falc audio to google and get its filtred transcription using `speech-rec.sh <https://github.com/amsehili/gspeech-rec/blob/master/speech-rec.sh>`_ :
amine@32	208
amine@32	209 .. code:: bash
amine@32	210
amine@32	211 speech-rec.sh -i output.flac -r 16000
amine@32	212
amine@32	213 3- Use grep to select lines that coantain transcript:
amine@32	214
amine@32	215 .. code:: bash
amine@32	216
amine@32	217 grep transcript
amine@32	218
amine@32	219
amine@32	220 4- Launch the followin script, giving it the transcription as input:
amine@32	221
amine@32	222 .. code:: bash
amine@32	223
amine@32	224 #!/bin/bash
amine@32	225
amine@32	226 read line
amine@32	227
amine@32	228 RES=`echo "$line" \| grep -i "open firefox"`
amine@32	229
amine@32	230 if [[ $RES ]]
amine@32	231 then
amine@32	232 echo "Launch command: 'firefox &' ... "
amine@32	233 firefox &
amine@32	234 exit 0
amine@32	235 fi
amine@32	236
amine@32	237 exit 0
amine@32	238
amine@32	239 As you can see, the script can handle one single voice command. It runs firefox if the text it receives contains run firefox.
amine@32	240 Save a script into a file named voice-control.sh (don't forget to run a chmod u+x voice-control.sh).
amine@32	241
amine@32	242 Now, thanks to option `-C`, we will use the three instructions with a pipe and tell auditok to run them for every time it detects
amine@32	243 an audio activity. Try the following command and say open firefox:
amine@32	244
amine@32	245
amine@32	246 .. code:: bash
amine@32	247
amine@32	248 rec -q -t raw -r 16000 -c 1 -b 16 -e signed - \| auditok -M 5 -m 3 -n 1 --debug-file log -e 60 -C "sox -t raw -r 16000 -c 1 -b 16 -e signed $ audio.flac ; speech-rec.sh -i audio.flac -r 16000 \| grep transcript \| ./voice-control.sh"
amine@32	249
amine@32	250
amine@32	251
amine@32	252
amine@32	253 Plot signal and detections
amine@32	254 ##########################
amine@32	255
amine@32	256 use option `-p`. Requires `matplotlib` and `numpy`.
amine@32	257
amine@32	258 .. code:: bash
amine@32	259
amine@32	260 auditok ... -p
amine@32	261
amine@32	262
amine@32	263 Save plot as image or PDF
amine@32	264 #########################
amine@32	265
amine@32	266 .. code:: bash
amine@32	267
amine@32	268 auditok ... --save-image output.png
amine@32	269
amine@32	270 Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
amine@32	271
amine@32	272
amine@32	273 Read data from file
amine@32	274 ###################
amine@32	275
amine@32	276 .. code:: bash
amine@32	277
amine@32	278 auditok -i input.wav ...
amine@32	279
amine@32	280 Install `pydub` for other audio formats.
amine@32	281
amine@32	282
amine@32	283 Limit the length of acquired data
amine@32	284 #################################
amine@32	285
amine@32	286 .. code:: bash
amine@32	287
amine@32	288 auditok -M 12 ...
amine@32	289
amine@32	290 Time is in seconds.
amine@32	291
amine@32	292
amine@32	293 Save the whole acquired audio signal
amine@32	294 ####################################
amine@32	295
amine@32	296 .. code:: bash
amine@32	297
amine@32	298 auditok -O output.wav ...
amine@32	299
amine@32	300 Install `pydub` for other audio formats.
amine@32	301
amine@32	302
amine@32	303 Save each detection into a separate audio file
amine@32	304 ##############################################
amine@32	305
amine@32	306 .. code:: bash
amine@32	307
amine@32	308 auditok -o det_{N}_{start}_{end}.wav ...
amine@32	309
amine@32	310 You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example:
amine@32	311
amine@32	312 .. code:: bash
amine@32	313
amine@32	314 auditok -o {start}-{end}.wav ...
amine@32	315
amine@32	316 Install `pydub` for more audio formats.
amine@32	317
amine@32	318
amine@32	319 Setting detection parameters
amine@32	320 ############################
amine@32	321
amine@32	322 Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:
amine@32	323
amine@32	324 +--------+-------------------------------------------------------+---------+------------------+
amine@32	325 \| Option \| Description \| Unit \| Default \|
amine@32	326 +========+=======================================================+=========+==================+
amine@32	327 \| `-n` \| Minimum length an accepted audio activity should have \| second \| 0.2 (200 ms) \|
amine@32	328 +--------+-------------------------------------------------------+---------+------------------+
amine@32	329 \| `-m` \| Maximum length an accepted audio activity should reach\| second \| 5. \|
amine@32	330 +--------+-------------------------------------------------------+---------+------------------+
amine@32	331 \| `-s` \| Maximum length of a continuous silence period within \| second \| 0.3 (300 ms) \|
amine@32	332 \| \| an accepted audio activity \| \| \|
amine@32	333 +--------+-------------------------------------------------------+---------+------------------+
amine@32	334 \| `-d` \| Drop trailing silence from an accepted audio activity \| boolean \| False \|
amine@32	335 +--------+-------------------------------------------------------+---------+------------------+
amine@32	336 \| `-a` \| Analysis window length (default value should be good) \| second \| 0.01 (10 ms) \|
amine@32	337 +--------+-------------------------------------------------------+---------+------------------+
amine@32	338
amine@32	339
amine@32	340 *******
amine@32	341 License
amine@32	342 *******
amine@32	343
amine@32	344 `auditok` is published under the GNU General Public License Version 3.
amine@32	345
amine@32	346 ******
amine@32	347 Author
amine@32	348 ******
amine@32	349 Amine Sehili (<amine.sehili@gmail.com>)

Mercurial > hg > auditok

annotate doc/cmdline.rst @ 33:d28d94bf6b39