Mercurial > hg > auditok

--- a/README.md	Sun Nov 29 11:52:56 2015 +0100
+++ b/README.md	Sun Nov 29 11:54:36 2015 +0100
@@ -4,12 +4,33 @@

 `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.

-The following two figures illustrate the detector output when:
+- [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation)
+- [Requirements](https://github.com/amsehili/auditok#requirements)
+- [Installation](https://github.com/amsehili/auditok#installation)
+- [Command line usage](https://github.com/amsehili/auditok#command-line-usage)
+  - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice)
+  - [Play back detections](https://github.com/amsehili/auditok#play-back-detections)
+  - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold)
+  - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information)
+  - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template)
+  - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections)
+  - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf)
+  - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file)
+  - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data)
+  - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal)
+  - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file)
+- [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters)
+- [License](https://github.com/amsehili/auditok#license)
+- [uthor](https://github.com/amsehili/auditok#author)
+
+Two-figure explanation
+----------------------
+The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when:

 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):
 ![](doc/figures/figure_1.png)

-2. the detector splits an audio activity event into many activities if the within silence is over 0.2 second:
+2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:
 ![](doc/figures/figure_2.png)


@@ -28,38 +49,139 @@
 ------------
     python setup.py install

-Command line usage:
+Command line usage
 ------------------

+### Try the detector with your voice
+
 The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop):

-    auditok -D -E
+    auditok

-Option `-D` means debug, whereas `-E` stands for echo, so `auditok` plays back whatever it detects.
+This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input:

-If there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 45), use option `-e`:
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1
+
+Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters.

-    auditok -D -E -e 55
+| Audio parameter | sox	option | auditok option | `auditok` default     |
+| --------------- |------------|----------------|-----------------------|
+| Sampling rate   |     -r     |       -r       |      16000            |
+| Sample width    |  -b (bits) |     -w (bytes) |      2                |
+| Channels        |  -c        |     -c         |      1                |
+| Encoding        |  -e        |     None       | always signed integer |

-If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`):
+According to this table, the previous command can be run as:

-    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -r 16000 -i -
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -

-With `-i -`,  `auditok` reads data from standard input.
+### Play back detections

-`rec` and `play` are just an alias for `sox`. Doing so you won't be able to play audio detections (`-E` requires `Pyaudio`). Fortunately, `auditok` gives the possibility to call any command every time it detects an activity, passing the activity as a file to the user supplied command:
+    auditok -E

-    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
+OR
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E
+
+Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option:
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"

-The `-C` option tells `auditok` to interpret its content as a command that is run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
+The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data.
+
+`rec` and `play` are just an alias for `sox`.

 The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence.

+### Set detection threshold
+
+If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`:
+
+    auditok -E -e 55
+
+OR
+
+    rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $"
+
+If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`.
+
+### Set format for printed detections information
+
+By default, `auditok` prints the `id` `start time` `end time` of each detected activity:
+
+    1 1.87 2.67
+    2 3.05 3.73
+    3 3.97 4.49
+    ...
+
+If you want to personalize the output format, use `--printf` option:
+
+    auditok -e 55 --printf "[{id}]: {start} to {end}"
+
+Output:
+
+    [1]: 0.22 to 0.67
+    [2]: 2.81 to 4.18
+    [3]: 5.53 to 6.44
+    [4]: 7.32 to 7.82
+    ...
+
+Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`:
+
+    auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i"
+
+Output:
+
+    [1]: 00:00:01.080 to 00:00:01.760
+    [2]: 00:00:02.420 to 00:00:03.440
+    [3]: 00:00:04.930 to 00:00:05.570
+    [4]: 00:00:05.690 to 00:00:06.020
+    [5]: 00:00:07.470 to 00:00:07.980
+    ...
+
+Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively.
+
+### Practical use case: generate a subtitles template
+
+Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends:
+
+    auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i"
+
+Output:
+
+    1
+    00:00:00.730 --> 00:00:01.460
+    Put some text here...
+
+    2
+    00:00:02.440 --> 00:00:03.900
+    Put some text here...
+
+    3
+    00:00:06.410 --> 00:00:06.970
+    Put some text here...
+
+    4
+    00:00:07.260 --> 00:00:08.340
+    Put some text here...
+
+    5
+    00:00:09.510 --> 00:00:09.820
+    Put some text here...
+
 ### Plot signal and detections:

-use option `-p`. Requires `matplotlib` and `numpy`
+use option `-p`. Requires `matplotlib` and `numpy`.

-### read data from file
+    auditok ...  -p
+
+### Save plot as image or PDF
+
+    auditok ...  --save-image output.png
+
+Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.
+
+### Read data from file

     auditok -i input.wav ...

@@ -88,35 +210,22 @@

 Install `pydub` for more audio formats.

-Demos
------
-This code reads data from the microphone and plays back whatever it detects.

-    python demos/echo.py
+Setting detection parameters
+----------------------------

-`echo.py` accepts two arguments: energy threshold (default=45) and duration in seconds (default=10):
+Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table:

-    python demos/echo.py 50 15

-   If only one argument is given it will be used for energy.
-
-Try out this demo with an audio file (no argument is required):
+| Option | Description                                           | Unit    | Default          |
+| -------|-------------------------------------------------------|---------|------------------|
+| `-n`   | Minimum length an accepted audio activity should have | second  |   0.2 (200 ms)   |
+| `-m`   | Maximum length an accepted audio activity should reach| second  |   5.             |
+| `-s`   | Maximum length of a continuous silence period within  | second  |   0.3 (300 ms)   |
+|        | an accepted audio activity                            |         |                  |
+| `-d`   | Drop trailing silence from an accepted audio activity | boolean |   False          |
+| `-a`   | Analysis window length (default value should be good) | second  |   0.01 (10 ms)   |

-    python demos/audio_tokenize_demo.py
-
-Finally, in this demo `auditok` is used to remove tailing and leading silence from an audio file:
-
-    python demos/audio_trim_demo.py
-
-Documentation
--------------
-
-Check out this [quick start](https://github.com/amsehili/auditok/blob/master/quickstart.rst) or the  [API documentation](http://amsehili.github.io/auditok/pdoc/).
-
-
-Contribution
-------------
-Contributions are very appreciated !

 License
 -------
@@ -126,3 +235,4 @@
 ------
 Amine Sehili (<amine.sehili@gmail.com>)

+