amine@344: .. image:: doc/figures/auditok-logo.png
amsehili@343: :align: center
amine@391: :alt: Build status
amsehili@343:
amine@336: .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
amine@336: :target: https://travis-ci.org/amsehili/auditok
amine@336:
amine@336: .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
amine@336: :target: http://auditok.readthedocs.org/en/latest/?badge=latest
amine@391: :alt: Documentation status
amine@336:
amine@374: ``auditok`` is an **Audio Activity Detection** tool that can process online data
amine@374: (read from an audio device or from standard input) as well as audio files.
amine@387: It can be used as a command-line program or by calling its API.
amsehili@349:
amine@374: The latest version of the documentation can be found on
kevinwang@393: `readthedocs. `_
amsehili@349:
amsehili@349:
amsehili@349: Installation
amsehili@349: ------------
amsehili@349:
amine@374: A basic version of ``auditok`` will run with standard Python (>=3.4). However,
amine@374: without installing additional dependencies, ``auditok`` can only deal with audio
amine@374: files in *wav* or *raw* formats. if you want more features, the following
amine@374: packages are needed:
amine@374:
amine@375: - `pydub `_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
amine@377: - `pyaudio `_ : read audio data from the microphone and play audio back.
amine@375: - `tqdm `_ : show progress bar while playing audio clips.
amine@375: - `matplotlib `_ : plot audio signal and detections.
amine@375: - `numpy `_ : required by matplotlib. Also used for some math operations instead of standard python if available.
amine@375:
amine@375: Install the latest stable version with pip:
amine@375:
amsehili@383:
amine@375: .. code:: bash
amine@375:
amine@375: sudo pip install auditok
amine@375:
amsehili@383:
amine@391: Install the latest development version from github:
amine@374:
amsehili@349: .. code:: bash
amsehili@349:
amsehili@354: pip install git+https://github.com/amsehili/auditok
amsehili@354:
amine@375: or
amine@375:
amine@375: .. code:: bash
amine@375:
amine@375: git clone https://github.com/amsehili/auditok.git
amine@375: cd auditok
amine@375: python setup.py install
amine@375:
amsehili@349:
amsehili@343: Basic example
amsehili@343: -------------
amsehili@343:
amine@336: .. code:: python
amine@336:
amine@374: import auditok
amsehili@343:
amsehili@343: # split returns a generator of AudioRegion objects
amine@374: audio_regions = auditok.split(
amine@374: "audio.wav",
amine@374: min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@374: max_dur=4, # maximum duration of an event
amine@374: max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@374: energy_threshold=55 # threshold of detection
amine@374: )
amine@375:
amine@375: for i, r in enumerate(audio_regions):
amine@375:
amine@376: # Regions returned by `split` have 'start' and 'end' metadata fields
amine@375: print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@375:
amine@375: # play detection
amine@375: # r.play(progress_bar=True)
amine@375:
amine@376: # region's metadata can also be used with the `save` method
amine@375: # (no need to explicitly specify region's object and `format` arguments)
amine@375: filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@336: print("region saved as: {}".format(filename))
amine@336:
amine@375: output example:
amine@375:
amine@375: .. code:: bash
amine@375:
amine@375: Region 0: 0.700s -- 1.400s
amine@375: region saved as: region_0.700-1.400.wav
amine@375: Region 1: 3.800s -- 4.500s
amine@375: region saved as: region_3.800-4.500.wav
amine@375: Region 2: 8.750s -- 9.950s
amine@375: region saved as: region_8.750-9.950.wav
amine@375: Region 3: 11.700s -- 12.400s
amine@375: region saved as: region_11.700-12.400.wav
amine@375: Region 4: 15.050s -- 15.850s
amine@375: region saved as: region_15.050-15.850.wav
amine@375:
amine@375:
amine@374: Split and plot
amine@374: --------------
amine@336:
amine@375: Visualize audio signal and detections:
amine@375:
amine@336: .. code:: python
amine@336:
amine@374: import auditok
amine@374: region = auditok.load("audio.wav") # returns an AudioRegion object
amine@375: regions = region.split_and_plot(...) # or just region.splitp()
amine@336:
amsehili@349: output figure:
amine@336:
amine@336: .. image:: doc/figures/example_1.png
amsehili@349:
amine@375:
amine@374: Limitations
amine@374: -----------
amsehili@349:
amine@374: Currently, the core detection algorithm is based on the energy of audio signal.
amine@374: While this is fast and works very well for audio streams with low background
amine@374: noise (e.g., podcasts with few people talking, language lessons, audio recorded
amine@374: in a rather quiet environment, etc.) the performance can drop as the level of
amsehili@397: noise increases. Furthermore, the algorithm makes no distinction between speech
amine@374: and other kinds of sounds, so you shouldn't use it for Voice Activity Detection
amine@376: if your audio data also contain non-speech events.
amsehili@349:
amsehili@349: License
amsehili@349: -------
amsehili@349: MIT.