amine@344
|
1 .. image:: doc/figures/auditok-logo.png
|
amsehili@343
|
2 :align: center
|
amsehili@343
|
3
|
amine@428
|
4 .. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg
|
amine@428
|
5 :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/
|
amine@428
|
6 :alt: Build Status
|
amine@336
|
7
|
amine@446
|
8 .. image:: https://codecov.io/github/amsehili/auditok/graph/badge.svg?token=0rwAqYBdkf
|
amine@446
|
9 :target: https://codecov.io/github/amsehili/auditok
|
amine@446
|
10
|
amine@336
|
11 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
|
amine@336
|
12 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
|
amine@428
|
13 :alt: Documentation Status
|
amine@336
|
14
|
amine@428
|
15 ``auditok`` is an **Audio Activity Detection** tool that processes online data
|
amine@428
|
16 (from an audio device or standard input) and audio files. It can be used via the command line or through its API.
|
amsehili@349
|
17
|
amine@428
|
18 Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_.
|
amsehili@349
|
19
|
amsehili@349
|
20 Installation
|
amsehili@349
|
21 ------------
|
amsehili@349
|
22
|
amine@432
|
23 ``auditok`` requires Python 3.7 or higher.
|
amine@374
|
24
|
amine@428
|
25 To install the latest stable version, use pip:
|
amsehili@383
|
26
|
amine@375
|
27 .. code:: bash
|
amine@375
|
28
|
amine@375
|
29 sudo pip install auditok
|
amine@375
|
30
|
amine@428
|
31 To install the latest development version from GitHub:
|
amine@374
|
32
|
amsehili@349
|
33 .. code:: bash
|
amsehili@349
|
34
|
amsehili@354
|
35 pip install git+https://github.com/amsehili/auditok
|
amsehili@354
|
36
|
amine@428
|
37 Alternatively, clone the repository and install it manually:
|
amine@375
|
38
|
amine@375
|
39 .. code:: bash
|
amine@375
|
40
|
amine@375
|
41 git clone https://github.com/amsehili/auditok.git
|
amine@375
|
42 cd auditok
|
amine@375
|
43 python setup.py install
|
amine@375
|
44
|
amsehili@343
|
45 Basic example
|
amsehili@343
|
46 -------------
|
amsehili@343
|
47
|
amine@429
|
48 Here's a simple example of using ``auditok`` to detect audio events:
|
amine@428
|
49
|
amine@336
|
50 .. code:: python
|
amine@336
|
51
|
amine@374
|
52 import auditok
|
amsehili@343
|
53
|
amine@428
|
54 # `split` returns a generator of AudioRegion objects
|
amine@428
|
55 audio_events = auditok.split(
|
amine@374
|
56 "audio.wav",
|
amine@428
|
57 min_dur=0.2, # Minimum duration of a valid audio event in seconds
|
amine@428
|
58 max_dur=4, # Maximum duration of an event
|
amine@428
|
59 max_silence=0.3, # Maximum tolerated silence duration within an event
|
amine@428
|
60 energy_threshold=55 # Detection threshold
|
amine@374
|
61 )
|
amine@375
|
62
|
amine@428
|
63 for i, r in enumerate(audio_events):
|
amine@428
|
64 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
|
amine@428
|
65 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
|
amine@375
|
66
|
amine@428
|
67 # Play the audio event
|
amine@428
|
68 r.play(progress_bar=True)
|
amine@375
|
69
|
amine@428
|
70 # Save the event with start and end times in the filename
|
amine@428
|
71 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
|
amine@428
|
72 print(f"Event saved as: {filename}")
|
amine@375
|
73
|
amine@428
|
74 Example output:
|
amine@375
|
75
|
amine@375
|
76 .. code:: bash
|
amine@375
|
77
|
amine@428
|
78 Event 0: 0.700s -- 1.400s
|
amine@428
|
79 Event saved as: event_0.700-1.400.wav
|
amine@428
|
80 Event 1: 3.800s -- 4.500s
|
amine@428
|
81 Event saved as: event_3.800-4.500.wav
|
amine@428
|
82 Event 2: 8.750s -- 9.950s
|
amine@428
|
83 Event saved as: event_8.750-9.950.wav
|
amine@428
|
84 Event 3: 11.700s -- 12.400s
|
amine@428
|
85 Event saved as: event_11.700-12.400.wav
|
amine@428
|
86 Event 4: 15.050s -- 15.850s
|
amine@428
|
87 Event saved as: event_15.050-15.850.wav
|
amine@375
|
88
|
amine@374
|
89 Split and plot
|
amine@374
|
90 --------------
|
amine@336
|
91
|
amine@428
|
92 Visualize the audio signal with detected events:
|
amine@375
|
93
|
amine@336
|
94 .. code:: python
|
amine@336
|
95
|
amine@374
|
96 import auditok
|
amine@428
|
97 region = auditok.load("audio.wav") # Returns an AudioRegion object
|
amine@428
|
98 regions = region.split_and_plot(...) # Or simply use `region.splitp()`
|
amine@336
|
99
|
amine@428
|
100 Example output:
|
amine@336
|
101
|
amine@336
|
102 .. image:: doc/figures/example_1.png
|
amsehili@349
|
103
|
amine@428
|
104 Split an audio stream and re-join (glue) audio events with silence
|
amine@428
|
105 ------------------------------------------------------------------
|
amine@428
|
106
|
amine@429
|
107 The following code detects audio events within an audio stream, then insert
|
amine@429
|
108 1 second of silence between them to create an audio with pauses:
|
amine@428
|
109
|
amine@428
|
110 .. code:: python
|
amine@428
|
111
|
amine@428
|
112 # Create a 1-second silent audio region
|
amine@428
|
113 # Audio parameters must match the original stream
|
amine@428
|
114 from auditok import split, make_silence
|
amine@428
|
115 silence = make_silence(duration=1,
|
amine@428
|
116 sampling_rate=16000,
|
amine@428
|
117 sample_width=2,
|
amine@428
|
118 channels=1)
|
amine@428
|
119 events = split("audio.wav")
|
amine@428
|
120 audio_with_pauses = silence.join(events)
|
amine@428
|
121
|
amine@429
|
122 Alternatively, use ``split_and_join_with_silence``:
|
amine@428
|
123
|
amine@428
|
124 .. code:: python
|
amine@428
|
125
|
amine@428
|
126 from auditok import split_and_join_with_silence
|
amine@428
|
127 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
|
amine@375
|
128
|
amine@429
|
129 Export an ``AudioRegion`` as a ``numpy`` array
|
amine@429
|
130 ----------------------------------------------
|
amine@429
|
131
|
amine@429
|
132 .. code:: python
|
amine@429
|
133
|
amine@429
|
134 from auditok import load, AudioRegion
|
amine@429
|
135 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
|
amine@429
|
136 x = audio.numpy()
|
amine@429
|
137 assert x.shape[0] == audio.channels
|
amine@429
|
138 assert x.shape[1] == len(audio)
|
amine@429
|
139
|
amine@429
|
140
|
amine@374
|
141 Limitations
|
amine@374
|
142 -----------
|
amsehili@349
|
143
|
amine@428
|
144 The detection algorithm is based on audio signal energy. While it performs well
|
amine@428
|
145 in low-noise environments (e.g., podcasts, language lessons, or quiet recordings),
|
amine@428
|
146 performance may drop in noisy settings. Additionally, the algorithm does not
|
amine@428
|
147 distinguish between speech and other sounds, so it is not suitable for Voice
|
amine@428
|
148 Activity Detection in multi-sound environments.
|
amsehili@349
|
149
|
amsehili@349
|
150 License
|
amsehili@349
|
151 -------
|
amine@428
|
152
|
amsehili@349
|
153 MIT.
|