amsehili@11: [![Build Status](https://travis-ci.org/amsehili/auditok.svg?branch=master)](https://travis-ci.org/amsehili/auditok) amsehili@11: AUDIo TOKenizer amine@2: =============== amine@2: amsehili@20: `auditok` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API. amsehili@20: amsehili@20: The following two figures illustrate the detector output when: amsehili@20: amsehili@20: 1. the detector tolerates phases of silence of up to 0.3 second (300 ms) within an audio activity (also referred to as acoustic event): amsehili@20: ![](doc/figures/figure_1.png) amsehili@20: amsehili@20: 2. the detector splits an audio activity event into many activities if the within silence is over 0.2 second: amsehili@20: ![](doc/figures/figure_2.png) amsehili@20: amine@2: amine@2: Requirements amine@2: ------------ amsehili@20: `auditok` can be used with standard Python! amsehili@20: However if you want more features, the following packages are needed: amsehili@20: - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file amsehili@20: - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections amine@21: - `matplotlib`: plot audio signal and detections (see figures above) amine@21: - `numpy`: required by matplotlib. Also used for math operations instead of standard python if available amsehili@20: - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe. amsehili@20: amine@2: amine@2: Installation amine@2: ------------ amine@4: python setup.py install amine@2: amine@21: Command line usage: amine@21: ------------------ amine@21: amine@21: The first thing you want to check is perhaps how `auditok` detects your voice. If you have installed `PyAudio` just run (`Ctrl-C` to stop): amine@21: amine@21: auditok -D -E amine@21: amine@21: Option `-D` means debug, whereas `-E` stands for echo, so `auditok` plays back whatever it detects. amine@21: amine@21: If there are too many detections, use a higher value for energy threshold (the current version only implements a `validator` based on energy threshold. The use of spectral information is also desirable and might be part of future releases). To change the energy threshold (default: 45), use option `-e`: amine@21: amine@21: auditok -D -E -e 55 amine@21: amine@21: If you don't have `PyAudio`, you can use `sox` for data acquisition (`sudo apt-get install sox`): amine@21: amine@21: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -r 16000 -i - amine@21: amine@21: With `-i -`, `auditok` reads data from standard input. amine@21: amine@21: `rec` and `play` are just an alias for `sox`. Doing so you won't be able to play audio detections (`-E` requires `Pyaudio`). Fortunately, `auditok` gives the possibility to call any command every time it detects an activity, passing the activity as a file to the user supplied command: amine@21: amine@21: rec -q -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i - -r 16000 -C "play -q -t raw -r 16000 -c 1 -b 16 -e signed $" amine@21: amine@21: The `-C` option tells `auditok` to interpret its content as a command that is run whenever `auditok` detects an audio activity, replacing the `$` by a name of a temporary file into which the activity is saved as raw audio. Here we use `play` to play the activity, giving the necessary `play` arguments for raw data. amine@21: amine@21: The `-C` option can be useful in many cases. Imagine a command that sends audio data over a network only if there is an audio activity and saves bandwidth during silence. amine@21: amine@21: ### Plot signal and detections: amine@21: amine@21: use option `-p`. Requires `matplotlib` and `numpy` amine@21: amine@21: ### read data from file amine@21: amine@21: auditok -i input.wav ... amine@21: amine@21: Install `pydub` for other audio formats. amine@21: amine@21: ### Limit the length of aquired data amine@21: amine@21: auditok -M 12 ... amine@21: amine@21: Time is in seconds. amine@21: amine@21: ### Save the whole acquired audio signal amine@21: amine@21: auditok -O output.wav ... amine@21: amine@21: Install `pydub` for other audio formats. amine@21: amine@21: amine@21: ### Save each detection into a separate audio file amine@21: amine@21: auditok -o det_{N}_{start}_{end}.wav ... amine@21: amine@21: You can use a free text and place `{N}`, `{start}` and `{end}` wherever you want, they will be replaced by detection number, start time and end time respectively. Another example: amine@21: amine@21: auditok -o {start}-{end}.wav ... amine@21: amine@21: Install `pydub` for more audio formats. amine@21: amine@2: Demos amine@2: ----- amine@2: This code reads data from the microphone and plays back whatever it detects. amine@3: amine@2: python demos/echo.py amine@2: amine@2: `echo.py` accepts two arguments: energy threshold (default=45) and duration in seconds (default=10): amine@2: amine@2: python demos/echo.py 50 15 amine@2: amine@4: If only one argument is given it will be used for energy. amine@4: amine@4: Try out this demo with an audio file (no argument is required): amine@4: amine@4: python demos/audio_tokenize_demo.py amine@4: amsehili@6: Finally, in this demo `auditok` is used to remove tailing and leading silence from an audio file: amine@4: amine@4: python demos/audio_trim_demo.py amine@2: amine@2: Documentation amine@2: ------------- amine@2: amsehili@6: Check out this [quick start](https://github.com/amsehili/auditok/blob/master/quickstart.rst) or the [API documentation](http://amsehili.github.io/auditok/pdoc/). amsehili@6: amine@2: amine@2: Contribution amine@2: ------------ amine@2: Contributions are very appreciated ! amine@2: amine@2: License amine@2: ------- amine@2: `auditok` is published under the GNU General Public License Version 3. amine@2: amine@2: Author amine@2: ------ amine@2: Amine Sehili () amine@21: