Mercurial > hg > auditok
changeset 24:9699fc1478a5
doc update
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Sun, 29 Nov 2015 11:54:36 +0100 |
parents | 2beb3fb562f3 |
children | e32c7c349af6 |
files | README.md |
diffstat | 1 files changed, 149 insertions(+), 39 deletions(-) [+] |
line wrap: on
line diff
--- a/README.md Sun Nov 29 11:52:56 2015 +0100 +++ b/README.md Sun Nov 29 11:54:36 2015 +0100 @@ -4,12 +4,33 @@ `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API. -The following two figures illustrate the detector output when: +- [Two-figure explanation](https://github.com/amsehili/auditok#two-figure-explanation) +- [Requirements](https://github.com/amsehili/auditok#requirements) +- [Installation](https://github.com/amsehili/auditok#installation) +- [Command line usage](https://github.com/amsehili/auditok#command-line-usage) + - [Try the detector with your voice](https://github.com/amsehili/auditok#try-the-detector-with-your-voice) + - [Play back detections](https://github.com/amsehili/auditok#play-back-detections) + - [Set detection threshold](https://github.com/amsehili/auditok#set-detection-threshold) + - [Set format for printed detections information](https://github.com/amsehili/auditok#set-format-for-printed-detections-information) + - [Practical use case: generate a subtitles template](https://github.com/amsehili/auditok#practical-use-case-generate-a-subtitles-template) + - [Plot signal and detections:](https://github.com/amsehili/auditok#plot-signal-and-detections) + - [Save plot as image or PDF](https://github.com/amsehili/auditok#save-plot-as-image-or-pdf) + - [Read data from file](https://github.com/amsehili/auditok#read-data-from-file) + - [Limit the length of aquired/read data](https://github.com/amsehili/auditok#limit-the-length-of-aquired-data) + - [Save the whole acquired audio signal](https://github.com/amsehili/auditok#save-the-whole-acquired-audio-signal) + - [Save each detection into a separate audio file](https://github.com/amsehili/auditok#save-each-detection-into-a-separate-audio-file) +- [Setting detection parameters](https://github.com/amsehili/auditok#setting-detection-parameters) +- [License](https://github.com/amsehili/auditok#license) +- [uthor](https://github.com/amsehili/auditok#author) + +Two-figure explanation +---------------------- +The following two figures illustrate an audio signal (blue) and regions detected as valid audio activities (green rectangles) according to a given threshold (red dashed line). They respectively depict the detection result when: 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event):  -2. the detector splits an audio activity event into many activities if the within silence is over 0.2 second: +2. the detector splits an audio activity event into many activities if the within activity silence is over 0.2 second:  @@ -28,38 +49,139 @@ ------------ python setup.py install -Command line usage: +Command line usage ------------------ +### Try the detector with your voice + The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop): - auditok -D -E + auditok -Option `-D` means debug, whereas `-E` stands for echo, so `auditok` plays back whatever it detects. +This will print **id** **start time** and **end time** for each detected activity. If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`) and tell `auditok` to read data from standard input: -If there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 45), use option `-e`: + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -w 2 -c 1 + +Note that when data is read from standard input the same audio parameters must be used for both `sox` (or any other data generation/acquisition tool) and `auditok`. The following table summarizes audio parameters. - auditok -D -E -e 55 +| Audio parameter | sox option | auditok option | `auditok` default | +| --------------- |------------|----------------|-----------------------| +| Sampling rate | -r | -r | 16000 | +| Sample width | -b (bits) | -w (bytes) | 2 | +| Channels | -c | -c | 1 | +| Encoding | -e | None | always signed integer | -If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`): +According to this table, the previous command can be run as: - rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -r 16000 -i - + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -With `-i -`, `auditok` reads data from standard input. +### Play back detections -`rec` and `play` are just an alias for `sox`. Doing so you won't be able to play audio detections (`-E` requires `Pyaudio`). Fortunately, `auditok` gives the possibility to call any command every time it detects an activity, passing the activity as a file to the user supplied command: + auditok -E - rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" +OR + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -E + +Option `-E` stands for echo, so `auditok` plays back whatever it detects. Using `-E` requires `PyAudio`, if you don't have `PyAudio` and want to play detections with sox, use the `-C` option: + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" -The `-C` option tells `auditok` to interpret its content as a command that is run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data. +The `-C` option tells `auditok` to interpret its content as a command that should be run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data. + +`rec` and `play` are just an alias for `sox`. The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence. +### Set detection threshold + +If you notice that there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 50), use option `-e`: + + auditok -E -e 55 + +OR + + rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -e 55 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" + +If however you figure out that the detector is missing some of or all your audio activities, use a lower value for `-e`. + +### Set format for printed detections information + +By default, `auditok` prints the `id` `start time` `end time` of each detected activity: + + 1 1.87 2.67 + 2 3.05 3.73 + 3 3.97 4.49 + ... + +If you want to personalize the output format, use `--printf` option: + + auditok -e 55 --printf "[{id}]: {start} to {end}" + +Output: + + [1]: 0.22 to 0.67 + [2]: 2.81 to 4.18 + [3]: 5.53 to 6.44 + [4]: 7.32 to 7.82 + ... + +Keywords `{id}`, `{start}` and `{end}` can be placed and repeated anywhere in the text. Time is shown in seconds, if you want a more detailed time information, use `--time-format`: + + auditok -e 55 --printf "[{id}]: {start} to {end}" --time-format "%h:%m:%s.%i" + +Output: + + [1]: 00:00:01.080 to 00:00:01.760 + [2]: 00:00:02.420 to 00:00:03.440 + [3]: 00:00:04.930 to 00:00:05.570 + [4]: 00:00:05.690 to 00:00:06.020 + [5]: 00:00:07.470 to 00:00:07.980 + ... + +Valid time directives are: `%h` (hours) `%m` (minutes) `%s` (seconds) `%i` (milliseconds). Two other directives, `%S` (default) and `%I` can be used for absolute time in seconds and milliseconds respectively. + +### Practical use case: generate a subtitles template + +Using `--printf ` and `--time-format`, the following command, used with an input audio or video file, will generate and an **srt** file template that can be later edited with a subtitles editor in a way that reduces the time needed to define when each utterance starts and where it ends: + + auditok -e 55 -i input.wav -m 10 --printf "{id}\n{start} --> {end}\nPut some text here...\n" --time-format "%h:%m:%s.%i" + +Output: + + 1 + 00:00:00.730 --> 00:00:01.460 + Put some text here... + + 2 + 00:00:02.440 --> 00:00:03.900 + Put some text here... + + 3 + 00:00:06.410 --> 00:00:06.970 + Put some text here... + + 4 + 00:00:07.260 --> 00:00:08.340 + Put some text here... + + 5 + 00:00:09.510 --> 00:00:09.820 + Put some text here... + ### Plot signal and detections: -use option `-p`. Requires `matplotlib` and `numpy` +use option `-p`. Requires `matplotlib` and `numpy`. -### read data from file + auditok ... -p + +### Save plot as image or PDF + + auditok ... --save-image output.png + +Requires `matplotlib` and `numpy`. Accepted formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff. + +### Read data from file auditok -i input.wav ... @@ -88,35 +210,22 @@ Install `pydub` for more audio formats. -Demos ------ -This code reads data from the microphone and plays back whatever it detects. - python demos/echo.py +Setting detection parameters +---------------------------- -`echo.py` accepts two arguments: energy threshold (default=45) and duration in seconds (default=10): +Alongside the threshold option `-e` seen so far, a couple of other options can have a great impact on the detector behavior. These options are summarized in the following table: - python demos/echo.py 50 15 - If only one argument is given it will be used for energy. - -Try out this demo with an audio file (no argument is required): +| Option | Description | Unit | Default | +| -------|-------------------------------------------------------|---------|------------------| +| `-n` | Minimum length an accepted audio activity should have | second | 0.2 (200 ms) | +| `-m` | Maximum length an accepted audio activity should reach| second | 5. | +| `-s` | Maximum length of a continuous silence period within | second | 0.3 (300 ms) | +| | an accepted audio activity | | | +| `-d` | Drop trailing silence from an accepted audio activity | boolean | False | +| `-a` | Analysis window length (default value should be good) | second | 0.01 (10 ms) | - python demos/audio_tokenize_demo.py - -Finally, in this demo `auditok` is used to remove tailing and leading silence from an audio file: - - python demos/audio_trim_demo.py - -Documentation -------------- - -Check out this [quick start](https://github.com/amsehili/auditok/blob/master/quickstart.rst) or the [API documentation](http://amsehili.github.io/auditok/pdoc/). - - -Contribution ------------- -Contributions are very appreciated ! License ------- @@ -126,3 +235,4 @@ ------ Amine Sehili (<amine.sehili@gmail.com>) +