amine@344
|
1 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
2 :align: center
|
amsehili@343
|
3
|
amine@428
|
4 .. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg
|
amine@428
|
5 :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/
|
amine@428
|
6 :alt: Build Status
|
amine@336
|
7
|
amine@336
|
8 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
9 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@428
|
10 :alt: Documentation Status
|
amine@336
|
11
|
amine@428
|
12 ``auditok`` is an **Audio Activity Detection** tool that processes online data
|
amine@428
|
13 (from an audio device or standard input) and audio files. It can be used via the command line or through its API.
|
amsehili@349
|
14
|
amine@428
|
15 Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_.
|
amsehili@349
|
16
|
amsehili@349
|
17 Installation
|
amsehili@349
|
18 ------------
|
amsehili@349
|
19
|
amine@428
|
20 ``auditok`` requires Python 3.7+.
|
amine@374
|
21
|
amine@428
|
22 To install the latest stable version, use pip:
|
amsehili@383
|
23
|
amine@375
|
24 .. code:: bash
|
amine@375
|
25
|
amine@375
|
26 sudo pip install auditok
|
amine@375
|
27
|
amine@428
|
28 To install the latest development version from GitHub:
|
amine@374
|
29
|
amsehili@349
|
30 .. code:: bash
|
amsehili@349
|
31
|
amsehili@354
|
32 pip install git+https://github.com/amsehili/auditok
|
amsehili@354
|
33
|
amine@428
|
34 Alternatively, clone the repository and install it manually:
|
amine@375
|
35
|
amine@375
|
36 .. code:: bash
|
amine@375
|
37
|
amine@375
|
38 git clone https://github.com/amsehili/auditok.git
|
amine@375
|
39 cd auditok
|
amine@375
|
40 python setup.py install
|
amine@375
|
41
|
amsehili@343
|
42 Basic example
|
amsehili@343
|
43 -------------
|
amsehili@343
|
44
|
amine@429
|
45 Here's a simple example of using ``auditok`` to detect audio events:
|
amine@428
|
46
|
amine@336
|
47 .. code:: python
|
amine@336
|
48
|
amine@374
|
49 import auditok
|
amsehili@343
|
50
|
amine@428
|
51 # `split` returns a generator of AudioRegion objects
|
amine@428
|
52 audio_events = auditok.split(
|
amine@374
|
53 "audio.wav",
|
amine@428
|
54 min_dur=0.2, # Minimum duration of a valid audio event in seconds
|
amine@428
|
55 max_dur=4, # Maximum duration of an event
|
amine@428
|
56 max_silence=0.3, # Maximum tolerated silence duration within an event
|
amine@428
|
57 energy_threshold=55 # Detection threshold
|
amine@374
|
58 )
|
amine@375
|
59
|
amine@428
|
60 for i, r in enumerate(audio_events):
|
amine@428
|
61 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
|
amine@428
|
62 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
|
amine@375
|
63
|
amine@428
|
64 # Play the audio event
|
amine@428
|
65 r.play(progress_bar=True)
|
amine@375
|
66
|
amine@428
|
67 # Save the event with start and end times in the filename
|
amine@428
|
68 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
|
amine@428
|
69 print(f"Event saved as: {filename}")
|
amine@375
|
70
|
amine@428
|
71 Example output:
|
amine@375
|
72
|
amine@375
|
73 .. code:: bash
|
amine@375
|
74
|
amine@428
|
75 Event 0: 0.700s -- 1.400s
|
amine@428
|
76 Event saved as: event_0.700-1.400.wav
|
amine@428
|
77 Event 1: 3.800s -- 4.500s
|
amine@428
|
78 Event saved as: event_3.800-4.500.wav
|
amine@428
|
79 Event 2: 8.750s -- 9.950s
|
amine@428
|
80 Event saved as: event_8.750-9.950.wav
|
amine@428
|
81 Event 3: 11.700s -- 12.400s
|
amine@428
|
82 Event saved as: event_11.700-12.400.wav
|
amine@428
|
83 Event 4: 15.050s -- 15.850s
|
amine@428
|
84 Event saved as: event_15.050-15.850.wav
|
amine@375
|
85
|
amine@374
|
86 Split and plot
|
amine@374
|
87 --------------
|
amine@336
|
88
|
amine@428
|
89 Visualize the audio signal with detected events:
|
amine@375
|
90
|
amine@336
|
91 .. code:: python
|
amine@336
|
92
|
amine@374
|
93 import auditok
|
amine@428
|
94 region = auditok.load("audio.wav") # Returns an AudioRegion object
|
amine@428
|
95 regions = region.split_and_plot(...) # Or simply use `region.splitp()`
|
amine@336
|
96
|
amine@428
|
97 Example output:
|
amine@336
|
98
|
amine@336
|
99 .. image:: doc/figures/example_1.png
|
amsehili@349
|
100
|
amine@428
|
101 Split an audio stream and re-join (glue) audio events with silence
|
amine@428
|
102 ------------------------------------------------------------------
|
amine@428
|
103
|
amine@429
|
104 The following code detects audio events within an audio stream, then insert
|
amine@429
|
105 1 second of silence between them to create an audio with pauses:
|
amine@428
|
106
|
amine@428
|
107 .. code:: python
|
amine@428
|
108
|
amine@428
|
109 # Create a 1-second silent audio region
|
amine@428
|
110 # Audio parameters must match the original stream
|
amine@428
|
111 from auditok import split, make_silence
|
amine@428
|
112 silence = make_silence(duration=1,
|
amine@428
|
113 sampling_rate=16000,
|
amine@428
|
114 sample_width=2,
|
amine@428
|
115 channels=1)
|
amine@428
|
116 events = split("audio.wav")
|
amine@428
|
117 audio_with_pauses = silence.join(events)
|
amine@428
|
118
|
amine@429
|
119 Alternatively, use ``split_and_join_with_silence``:
|
amine@428
|
120
|
amine@428
|
121 .. code:: python
|
amine@428
|
122
|
amine@428
|
123 from auditok import split_and_join_with_silence
|
amine@428
|
124 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
|
amine@375
|
125
|
amine@429
|
126 Export an ``AudioRegion`` as a ``numpy`` array
|
amine@429
|
127 ----------------------------------------------
|
amine@429
|
128
|
amine@429
|
129 .. code:: python
|
amine@429
|
130
|
amine@429
|
131 from auditok import load, AudioRegion
|
amine@429
|
132 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
|
amine@429
|
133 x = audio.numpy()
|
amine@429
|
134 assert x.shape[0] == audio.channels
|
amine@429
|
135 assert x.shape[1] == len(audio)
|
amine@429
|
136
|
amine@429
|
137
|
amine@374
|
138 Limitations
|
amine@374
|
139 -----------
|
amsehili@349
|
140
|
amine@428
|
141 The detection algorithm is based on audio signal energy. While it performs well
|
amine@428
|
142 in low-noise environments (e.g., podcasts, language lessons, or quiet recordings),
|
amine@428
|
143 performance may drop in noisy settings. Additionally, the algorithm does not
|
amine@428
|
144 distinguish between speech and other sounds, so it is not suitable for Voice
|
amine@428
|
145 Activity Detection in multi-sound environments.
|
amsehili@349
|
146
|
amsehili@349
|
147 License
|
amsehili@349
|
148 -------
|
amine@428
|
149
|
amsehili@349
|
150 MIT.
|