amine@344
|
1 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
2 :align: center
|
amsehili@343
|
3
|
amine@428
|
4 .. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg
|
amine@428
|
5 :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/
|
amine@428
|
6 :alt: Build Status
|
amine@336
|
7
|
amine@446
|
8 .. image:: https://codecov.io/github/amsehili/auditok/graph/badge.svg?token=0rwAqYBdkf
|
amine@446
|
9 :target: https://codecov.io/github/amsehili/auditok
|
amine@446
|
10
|
amine@336
|
11 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
12 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@428
|
13 :alt: Documentation Status
|
amine@336
|
14
|
amine@428
|
15 ``auditok`` is an **Audio Activity Detection** tool that processes online data
|
amine@428
|
16 (from an audio device or standard input) and audio files. It can be used via the command line or through its API.
|
amsehili@349
|
17
|
amine@428
|
18 Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_.
|
amsehili@349
|
19
|
amsehili@349
|
20 Installation
|
amsehili@349
|
21 ------------
|
amsehili@349
|
22
|
amine@432
|
23 ``auditok`` requires Python 3.7 or higher.
|
amine@374
|
24
|
amine@428
|
25 To install the latest stable version, use pip:
|
amsehili@383
|
26
|
amine@375
|
27 .. code:: bash
|
amine@375
|
28
|
amine@375
|
29 sudo pip install auditok
|
amine@375
|
30
|
amine@428
|
31 To install the latest development version from GitHub:
|
amine@374
|
32
|
amsehili@349
|
33 .. code:: bash
|
amsehili@349
|
34
|
amsehili@354
|
35 pip install git+https://github.com/amsehili/auditok
|
amsehili@354
|
36
|
amine@428
|
37 Alternatively, clone the repository and install it manually:
|
amine@375
|
38
|
amine@375
|
39 .. code:: bash
|
amine@375
|
40
|
amsehili@381
|
41 pip install git+https://github.com/amsehili/auditok
|
amsehili@381
|
42 or
|
amsehili@381
|
43 .. code:: bash
|
amsehili@381
|
44
|
amine@375
|
45 git clone https://github.com/amsehili/auditok.git
|
amine@375
|
46 cd auditok
|
amine@375
|
47 python setup.py install
|
amine@375
|
48
|
amsehili@343
|
49 Basic example
|
amsehili@343
|
50 -------------
|
amsehili@343
|
51
|
amine@429
|
52 Here's a simple example of using ``auditok`` to detect audio events:
|
amine@428
|
53
|
amine@336
|
54 .. code:: python
|
amine@336
|
55
|
amine@374
|
56 import auditok
|
amsehili@343
|
57
|
amine@428
|
58 # `split` returns a generator of AudioRegion objects
|
amine@428
|
59 audio_events = auditok.split(
|
amine@374
|
60 "audio.wav",
|
amine@428
|
61 min_dur=0.2, # Minimum duration of a valid audio event in seconds
|
amine@428
|
62 max_dur=4, # Maximum duration of an event
|
amine@428
|
63 max_silence=0.3, # Maximum tolerated silence duration within an event
|
amine@428
|
64 energy_threshold=55 # Detection threshold
|
amine@374
|
65 )
|
amine@375
|
66
|
amine@428
|
67 for i, r in enumerate(audio_events):
|
amine@428
|
68 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
|
amine@428
|
69 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
|
amine@375
|
70
|
amine@428
|
71 # Play the audio event
|
amine@428
|
72 r.play(progress_bar=True)
|
amine@375
|
73
|
amine@428
|
74 # Save the event with start and end times in the filename
|
amine@428
|
75 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
|
amine@428
|
76 print(f"Event saved as: {filename}")
|
amine@375
|
77
|
amine@428
|
78 Example output:
|
amine@375
|
79
|
amine@375
|
80 .. code:: bash
|
amine@375
|
81
|
amine@428
|
82 Event 0: 0.700s -- 1.400s
|
amine@428
|
83 Event saved as: event_0.700-1.400.wav
|
amine@428
|
84 Event 1: 3.800s -- 4.500s
|
amine@428
|
85 Event saved as: event_3.800-4.500.wav
|
amine@428
|
86 Event 2: 8.750s -- 9.950s
|
amine@428
|
87 Event saved as: event_8.750-9.950.wav
|
amine@428
|
88 Event 3: 11.700s -- 12.400s
|
amine@428
|
89 Event saved as: event_11.700-12.400.wav
|
amine@428
|
90 Event 4: 15.050s -- 15.850s
|
amine@428
|
91 Event saved as: event_15.050-15.850.wav
|
amine@375
|
92
|
amine@374
|
93 Split and plot
|
amine@374
|
94 --------------
|
amine@336
|
95
|
amine@428
|
96 Visualize the audio signal with detected events:
|
amine@375
|
97
|
amine@336
|
98 .. code:: python
|
amine@336
|
99
|
amine@374
|
100 import auditok
|
amine@428
|
101 region = auditok.load("audio.wav") # Returns an AudioRegion object
|
amine@428
|
102 regions = region.split_and_plot(...) # Or simply use `region.splitp()`
|
amine@336
|
103
|
amine@428
|
104 Example output:
|
amine@336
|
105
|
amine@336
|
106 .. image:: doc/figures/example_1.png
|
amsehili@349
|
107
|
amine@428
|
108 Split an audio stream and re-join (glue) audio events with silence
|
amine@428
|
109 ------------------------------------------------------------------
|
amine@428
|
110
|
amine@429
|
111 The following code detects audio events within an audio stream, then insert
|
amine@429
|
112 1 second of silence between them to create an audio with pauses:
|
amine@428
|
113
|
amine@428
|
114 .. code:: python
|
amine@428
|
115
|
amine@428
|
116 # Create a 1-second silent audio region
|
amine@428
|
117 # Audio parameters must match the original stream
|
amine@428
|
118 from auditok import split, make_silence
|
amine@428
|
119 silence = make_silence(duration=1,
|
amine@428
|
120 sampling_rate=16000,
|
amine@428
|
121 sample_width=2,
|
amine@428
|
122 channels=1)
|
amine@428
|
123 events = split("audio.wav")
|
amine@428
|
124 audio_with_pauses = silence.join(events)
|
amine@428
|
125
|
amine@429
|
126 Alternatively, use ``split_and_join_with_silence``:
|
amine@428
|
127
|
amine@428
|
128 .. code:: python
|
amine@428
|
129
|
amine@428
|
130 from auditok import split_and_join_with_silence
|
amine@428
|
131 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
|
amine@375
|
132
|
amine@429
|
133 Export an ``AudioRegion`` as a ``numpy`` array
|
amine@429
|
134 ----------------------------------------------
|
amine@429
|
135
|
amine@429
|
136 .. code:: python
|
amine@429
|
137
|
amine@429
|
138 from auditok import load, AudioRegion
|
amine@429
|
139 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
|
amine@429
|
140 x = audio.numpy()
|
amine@429
|
141 assert x.shape[0] == audio.channels
|
amine@429
|
142 assert x.shape[1] == len(audio)
|
amine@429
|
143
|
amine@429
|
144
|
amine@374
|
145 Limitations
|
amine@374
|
146 -----------
|
amsehili@349
|
147
|
amine@428
|
148 The detection algorithm is based on audio signal energy. While it performs well
|
amine@428
|
149 in low-noise environments (e.g., podcasts, language lessons, or quiet recordings),
|
amine@428
|
150 performance may drop in noisy settings. Additionally, the algorithm does not
|
amine@428
|
151 distinguish between speech and other sounds, so it is not suitable for Voice
|
amine@428
|
152 Activity Detection in multi-sound environments.
|
amsehili@349
|
153
|
amsehili@349
|
154 License
|
amsehili@349
|
155 -------
|
amine@428
|
156
|
amsehili@349
|
157 MIT.
|