amine@344
|
1 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
2 :align: center
|
amine@391
|
3 :alt: Build status
|
amsehili@343
|
4
|
amine@336
|
5 .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
|
amine@336
|
6 :target: https://travis-ci.org/amsehili/auditok
|
amine@336
|
7
|
amine@336
|
8 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
9 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@391
|
10 :alt: Documentation status
|
amine@336
|
11
|
amine@374
|
12 ``auditok`` is an **Audio Activity Detection** tool that can process online data
|
amine@374
|
13 (read from an audio device or from standard input) as well as audio files.
|
amine@387
|
14 It can be used as a command-line program or by calling its API.
|
amsehili@349
|
15
|
amine@374
|
16 The latest version of the documentation can be found on
|
kevinwang@393
|
17 `readthedocs. <https://auditok.readthedocs.io/en/latest/>`_
|
amsehili@349
|
18
|
amsehili@349
|
19
|
amsehili@349
|
20 Installation
|
amsehili@349
|
21 ------------
|
amsehili@349
|
22
|
amine@374
|
23 A basic version of ``auditok`` will run with standard Python (>=3.4). However,
|
amine@374
|
24 without installing additional dependencies, ``auditok`` can only deal with audio
|
amine@374
|
25 files in *wav* or *raw* formats. if you want more features, the following
|
amine@374
|
26 packages are needed:
|
amine@374
|
27
|
amine@375
|
28 - `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file.
|
amine@377
|
29 - `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ : read audio data from the microphone and play audio back.
|
amine@375
|
30 - `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips.
|
amine@375
|
31 - `matplotlib <https://matplotlib.org/stable/index.html>`_ : plot audio signal and detections.
|
amine@375
|
32 - `numpy <https://numpy.org/>`_ : required by matplotlib. Also used for some math operations instead of standard python if available.
|
amine@375
|
33
|
amine@375
|
34 Install the latest stable version with pip:
|
amine@375
|
35
|
amsehili@383
|
36
|
amine@375
|
37 .. code:: bash
|
amine@375
|
38
|
amine@375
|
39 sudo pip install auditok
|
amine@375
|
40
|
amsehili@383
|
41
|
amine@391
|
42 Install the latest development version from github:
|
amine@374
|
43
|
amsehili@349
|
44 .. code:: bash
|
amsehili@349
|
45
|
amsehili@354
|
46 pip install git+https://github.com/amsehili/auditok
|
amsehili@354
|
47
|
amine@375
|
48 or
|
amine@375
|
49
|
amine@375
|
50 .. code:: bash
|
amine@375
|
51
|
amsehili@381
|
52 pip install git+https://github.com/amsehili/auditok
|
amsehili@381
|
53 or
|
amsehili@381
|
54 .. code:: bash
|
amsehili@381
|
55
|
amine@375
|
56 git clone https://github.com/amsehili/auditok.git
|
amine@375
|
57 cd auditok
|
amine@375
|
58 python setup.py install
|
amine@375
|
59
|
amsehili@349
|
60
|
amsehili@343
|
61 Basic example
|
amsehili@343
|
62 -------------
|
amsehili@343
|
63
|
amine@336
|
64 .. code:: python
|
amine@336
|
65
|
amine@374
|
66 import auditok
|
amsehili@343
|
67
|
amsehili@343
|
68 # split returns a generator of AudioRegion objects
|
amine@374
|
69 audio_regions = auditok.split(
|
amine@374
|
70 "audio.wav",
|
amine@374
|
71 min_dur=0.2, # minimum duration of a valid audio event in seconds
|
amine@374
|
72 max_dur=4, # maximum duration of an event
|
amine@374
|
73 max_silence=0.3, # maximum duration of tolerated continuous silence within an event
|
amine@374
|
74 energy_threshold=55 # threshold of detection
|
amine@374
|
75 )
|
amine@375
|
76
|
amine@375
|
77 for i, r in enumerate(audio_regions):
|
amine@375
|
78
|
amine@376
|
79 # Regions returned by `split` have 'start' and 'end' metadata fields
|
amine@375
|
80 print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
|
amine@375
|
81
|
amine@375
|
82 # play detection
|
amine@375
|
83 # r.play(progress_bar=True)
|
amine@375
|
84
|
amine@376
|
85 # region's metadata can also be used with the `save` method
|
amine@375
|
86 # (no need to explicitly specify region's object and `format` arguments)
|
amine@375
|
87 filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
|
amine@336
|
88 print("region saved as: {}".format(filename))
|
amine@336
|
89
|
amine@375
|
90 output example:
|
amine@375
|
91
|
amine@375
|
92 .. code:: bash
|
amine@375
|
93
|
amine@375
|
94 Region 0: 0.700s -- 1.400s
|
amine@375
|
95 region saved as: region_0.700-1.400.wav
|
amine@375
|
96 Region 1: 3.800s -- 4.500s
|
amine@375
|
97 region saved as: region_3.800-4.500.wav
|
amine@375
|
98 Region 2: 8.750s -- 9.950s
|
amine@375
|
99 region saved as: region_8.750-9.950.wav
|
amine@375
|
100 Region 3: 11.700s -- 12.400s
|
amine@375
|
101 region saved as: region_11.700-12.400.wav
|
amine@375
|
102 Region 4: 15.050s -- 15.850s
|
amine@375
|
103 region saved as: region_15.050-15.850.wav
|
amine@375
|
104
|
amine@375
|
105
|
amine@374
|
106 Split and plot
|
amine@374
|
107 --------------
|
amine@336
|
108
|
amine@375
|
109 Visualize audio signal and detections:
|
amine@375
|
110
|
amine@336
|
111 .. code:: python
|
amine@336
|
112
|
amine@374
|
113 import auditok
|
amine@374
|
114 region = auditok.load("audio.wav") # returns an AudioRegion object
|
amine@375
|
115 regions = region.split_and_plot(...) # or just region.splitp()
|
amine@336
|
116
|
amsehili@349
|
117 output figure:
|
amine@336
|
118
|
amine@336
|
119 .. image:: doc/figures/example_1.png
|
amsehili@349
|
120
|
amine@375
|
121
|
amine@374
|
122 Limitations
|
amine@374
|
123 -----------
|
amsehili@349
|
124
|
amine@374
|
125 Currently, the core detection algorithm is based on the energy of audio signal.
|
amine@374
|
126 While this is fast and works very well for audio streams with low background
|
amine@374
|
127 noise (e.g., podcasts with few people talking, language lessons, audio recorded
|
amine@374
|
128 in a rather quiet environment, etc.) the performance can drop as the level of
|
amine@374
|
129 noise increases. Furthermore, the algorithm makes now distinction between speech
|
amine@374
|
130 and other kinds of sounds, so you shouldn't use it for Voice Activity Detection
|
amine@376
|
131 if your audio data also contain non-speech events.
|
amsehili@349
|
132
|
amsehili@349
|
133 License
|
amsehili@349
|
134 -------
|
amsehili@349
|
135 MIT.
|