amine@344: .. image:: doc/figures/auditok-logo.png amsehili@343: :align: center amsehili@343: amine@336: .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master amine@336: :target: https://travis-ci.org/amsehili/auditok amine@336: amine@336: .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest amine@336: :target: http://auditok.readthedocs.org/en/latest/?badge=latest amine@336: :alt: Documentation Status amine@336: amine@374: ``auditok`` is an **Audio Activity Detection** tool that can process online data amine@374: (read from an audio device or from standard input) as well as audio files. amine@374: It can be used as a command line program or by calling its API. amsehili@349: amine@374: The latest version of the documentation can be found on amine@374: `readthedocs. `_ amsehili@349: amsehili@349: amsehili@349: Installation amsehili@349: ------------ amsehili@349: amine@374: A basic version of ``auditok`` will run with standard Python (>=3.4). However, amine@374: without installing additional dependencies, ``auditok`` can only deal with audio amine@374: files in *wav* or *raw* formats. if you want more features, the following amine@374: packages are needed: amine@374: amine@375: - `pydub `_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. amine@377: - `pyaudio `_ : read audio data from the microphone and play audio back. amine@375: - `tqdm `_ : show progress bar while playing audio clips. amine@375: - `matplotlib `_ : plot audio signal and detections. amine@375: - `numpy `_ : required by matplotlib. Also used for some math operations instead of standard python if available. amine@375: amine@375: Install the latest stable version with pip: amine@375: amsehili@383: amine@375: .. code:: bash amine@375: amine@375: sudo pip install auditok amine@375: amsehili@383: amsehili@383: Install with the latest development version from github: amine@374: amsehili@349: .. code:: bash amsehili@349: amsehili@354: pip install git+https://github.com/amsehili/auditok amsehili@354: amine@375: or amine@375: amine@375: .. code:: bash amine@375: amine@375: git clone https://github.com/amsehili/auditok.git amine@375: cd auditok amine@375: python setup.py install amine@375: amsehili@349: amsehili@343: Basic example amsehili@343: ------------- amsehili@343: amine@336: .. code:: python amine@336: amine@374: import auditok amsehili@343: amsehili@343: # split returns a generator of AudioRegion objects amine@374: audio_regions = auditok.split( amine@374: "audio.wav", amine@374: min_dur=0.2, # minimum duration of a valid audio event in seconds amine@374: max_dur=4, # maximum duration of an event amine@374: max_silence=0.3, # maximum duration of tolerated continuous silence within an event amine@374: energy_threshold=55 # threshold of detection amine@374: ) amine@375: amine@375: for i, r in enumerate(audio_regions): amine@375: amine@376: # Regions returned by `split` have 'start' and 'end' metadata fields amine@375: print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) amine@375: amine@375: # play detection amine@375: # r.play(progress_bar=True) amine@375: amine@376: # region's metadata can also be used with the `save` method amine@375: # (no need to explicitly specify region's object and `format` arguments) amine@375: filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") amine@336: print("region saved as: {}".format(filename)) amine@336: amine@375: output example: amine@375: amine@375: .. code:: bash amine@375: amine@375: Region 0: 0.700s -- 1.400s amine@375: region saved as: region_0.700-1.400.wav amine@375: Region 1: 3.800s -- 4.500s amine@375: region saved as: region_3.800-4.500.wav amine@375: Region 2: 8.750s -- 9.950s amine@375: region saved as: region_8.750-9.950.wav amine@375: Region 3: 11.700s -- 12.400s amine@375: region saved as: region_11.700-12.400.wav amine@375: Region 4: 15.050s -- 15.850s amine@375: region saved as: region_15.050-15.850.wav amine@375: amine@375: amine@374: Split and plot amine@374: -------------- amine@336: amine@375: Visualize audio signal and detections: amine@375: amine@336: .. code:: python amine@336: amine@374: import auditok amine@374: region = auditok.load("audio.wav") # returns an AudioRegion object amine@375: regions = region.split_and_plot(...) # or just region.splitp() amine@336: amsehili@349: output figure: amine@336: amine@336: .. image:: doc/figures/example_1.png amsehili@349: amine@375: amine@374: Limitations amine@374: ----------- amsehili@349: amine@374: Currently, the core detection algorithm is based on the energy of audio signal. amine@374: While this is fast and works very well for audio streams with low background amine@374: noise (e.g., podcasts with few people talking, language lessons, audio recorded amine@374: in a rather quiet environment, etc.) the performance can drop as the level of amine@374: noise increases. Furthermore, the algorithm makes now distinction between speech amine@374: and other kinds of sounds, so you shouldn't use it for Voice Activity Detection amine@376: if your audio data also contain non-speech events. amsehili@349: amsehili@349: License amsehili@349: ------- amsehili@349: MIT.