annotate README.rst @ 396:c565749e0b04

Merge branch 'master' of https://github.com/amsehili/auditok
author www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date Mon, 08 Aug 2022 23:17:57 +0100
parents a31d4d38c112
children c801276ddf11
rev   line source
amine@344 1 .. image:: doc/figures/auditok-logo.png
amsehili@343 2 :align: center
amine@391 3 :alt: Build status
amsehili@343 4
amine@336 5 .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
amine@336 6 :target: https://travis-ci.org/amsehili/auditok
amine@336 7
amine@336 8 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
amine@336 9 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
amine@391 10 :alt: Documentation status
amine@336 11
amine@374 12 ``auditok`` is an **Audio Activity Detection** tool that can process online data
amine@374 13 (read from an audio device or from standard input) as well as audio files.
amine@387 14 It can be used as a command-line program or by calling its API.
amsehili@349 15
amine@374 16 The latest version of the documentation can be found on
kevinwang@393 17 `readthedocs. <https://auditok.readthedocs.io/en/latest/>`_
amsehili@349 18
amsehili@349 19
amsehili@349 20 Installation
amsehili@349 21 ------------
amsehili@349 22
amine@374 23 A basic version of ``auditok`` will run with standard Python (>=3.4). However,
amine@374 24 without installing additional dependencies, ``auditok`` can only deal with audio
amine@374 25 files in *wav* or *raw* formats. if you want more features, the following
amine@374 26 packages are needed:
amine@374 27
amine@375 28 - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
amine@377 29 - `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ : read audio data from the microphone and play audio back.
amine@375 30 - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
amine@375 31 - `matplotlib <https://matplotlib.org/stable/index.html>`_ : plot audio signal and detections.
amine@375 32 - `numpy <https://numpy.org/>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.
amine@375 33
amine@375 34 Install the latest stable version with pip:
amine@375 35
amsehili@383 36
amine@375 37 .. code:: bash
amine@375 38
amine@375 39 sudo pip install auditok
amine@375 40
amsehili@383 41
amine@391 42 Install the latest development version from github:
amine@374 43
amsehili@349 44 .. code:: bash
amsehili@349 45
amsehili@354 46 pip install git+https://github.com/amsehili/auditok
amsehili@354 47
amine@375 48 or
amine@375 49
amine@375 50 .. code:: bash
amine@375 51
amsehili@381 52 pip install git+https://github.com/amsehili/auditok
amsehili@381 53 or
amsehili@381 54 .. code:: bash
amsehili@381 55
amine@375 56 git clone https://github.com/amsehili/auditok.git
amine@375 57 cd auditok
amine@375 58 python setup.py install
amine@375 59
amsehili@349 60
amsehili@343 61 Basic example
amsehili@343 62 -------------
amsehili@343 63
amine@336 64 .. code:: python
amine@336 65
amine@374 66 import auditok
amsehili@343 67
amsehili@343 68 # split returns a generator of AudioRegion objects
amine@374 69 audio_regions = auditok.split(
amine@374 70 "audio.wav",
amine@374 71 min_dur=0.2, # minimum duration of a valid audio event in seconds
amine@374 72 max_dur=4, # maximum duration of an event
amine@374 73 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@374 74 energy_threshold=55 # threshold of detection
amine@374 75 )
amine@375 76
amine@375 77 for i, r in enumerate(audio_regions):
amine@375 78
amine@376 79 # Regions returned by `split` have 'start' and 'end' metadata fields
amine@375 80 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@375 81
amine@375 82 # play detection
amine@375 83 # r.play(progress_bar=True)
amine@375 84
amine@376 85 # region's metadata can also be used with the `save` method
amine@375 86 # (no need to explicitly specify region's object and `format` arguments)
amine@375 87 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@336 88 print("region saved as: {}".format(filename))
amine@336 89
amine@375 90 output example:
amine@375 91
amine@375 92 .. code:: bash
amine@375 93
amine@375 94 Region 0: 0.700s -- 1.400s
amine@375 95 region saved as: region_0.700-1.400.wav
amine@375 96 Region 1: 3.800s -- 4.500s
amine@375 97 region saved as: region_3.800-4.500.wav
amine@375 98 Region 2: 8.750s -- 9.950s
amine@375 99 region saved as: region_8.750-9.950.wav
amine@375 100 Region 3: 11.700s -- 12.400s
amine@375 101 region saved as: region_11.700-12.400.wav
amine@375 102 Region 4: 15.050s -- 15.850s
amine@375 103 region saved as: region_15.050-15.850.wav
amine@375 104
amine@375 105
amine@374 106 Split and plot
amine@374 107 --------------
amine@336 108
amine@375 109 Visualize audio signal and detections:
amine@375 110
amine@336 111 .. code:: python
amine@336 112
amine@374 113 import auditok
amine@374 114 region = auditok.load("audio.wav") # returns an AudioRegion object
amine@375 115 regions = region.split_and_plot(...) # or just region.splitp()
amine@336 116
amsehili@349 117 output figure:
amine@336 118
amine@336 119 .. image:: doc/figures/example_1.png
amsehili@349 120
amine@375 121
amine@374 122 Limitations
amine@374 123 -----------
amsehili@349 124
amine@374 125 Currently, the core detection algorithm is based on the energy of audio signal.
amine@374 126 While this is fast and works very well for audio streams with low background
amine@374 127 noise (e.g., podcasts with few people talking, language lessons, audio recorded
amine@374 128 in a rather quiet environment, etc.) the performance can drop as the level of
amine@374 129 noise increases. Furthermore, the algorithm makes now distinction between speech
amine@374 130 and other kinds of sounds, so you shouldn't use it for Voice Activity Detection
amine@376 131 if your audio data also contain non-speech events.
amsehili@349 132
amsehili@349 133 License
amsehili@349 134 -------
amsehili@349 135 MIT.