amine@32
|
1 `auditok` API Tutorial
|
amine@32
|
2 ======================
|
amine@32
|
3
|
amine@32
|
4 .. contents:: `Contents`
|
amine@32
|
5 :depth: 3
|
amine@32
|
6
|
amine@32
|
7
|
amine@32
|
8 **auditok** is a module that can be used as a generic tool for data
|
amine@32
|
9 tokenization. Although its core motivation is **Acoustic Activity
|
amine@32
|
10 Detection** (AAD) and extraction from audio streams (i.e. detect
|
amine@32
|
11 where a noise/an acoustic activity occurs within an audio stream and
|
amine@32
|
12 extract the corresponding portion of signal), it can easily be
|
amine@32
|
13 adapted to other tasks.
|
amine@32
|
14
|
amine@32
|
15 Globally speaking, it can be used to extract, from a sequence of
|
amine@32
|
16 observations, all sub-sequences that meet a certain number of
|
amine@32
|
17 criteria in terms of:
|
amine@32
|
18
|
amine@32
|
19 1. Minimum length of a **valid** token (i.e. sub-sequence)
|
amine@35
|
20 2. Maximum length of a **valid** token
|
amine@32
|
21 3. Maximum tolerated consecutive **non-valid** observations within
|
amine@32
|
22 a valid token
|
amine@32
|
23
|
amine@32
|
24 Examples of a non-valid observation are: a non-numeric ascii symbol
|
amine@32
|
25 if you are interested in sub-sequences of numeric symbols, or a silent
|
amine@32
|
26 audio window (of 10, 20 or 100 milliseconds for instance) if what
|
amine@32
|
27 interests you are audio regions made up of a sequence of "noisy"
|
amine@32
|
28 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
|
amine@32
|
29
|
amine@32
|
30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer`
|
amine@32
|
31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be
|
amine@32
|
32 configured to detect the desired regions from a stream.
|
amine@32
|
33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource`
|
amine@32
|
34 object that has a `read` method. Read data can be of any type accepted
|
amine@32
|
35 by the `validator`.
|
amine@32
|
36
|
amine@32
|
37
|
amine@32
|
38 As the main aim of this module is **Audio Activity Detection**,
|
amine@32
|
39 it provides the :class:`auditok.util.ADSFactory` factory class that makes
|
amine@32
|
40 it very easy to create an :class:`auditok.util.ADSFactory.AudioDataSource`
|
amine@32
|
41 (a class that implements :class:`auditok.util.DataSource`) object, be that from:
|
amine@32
|
42
|
amine@32
|
43 - A file on the disk
|
amine@32
|
44 - A buffer of data
|
amine@32
|
45 - The built-in microphone (requires PyAudio)
|
amine@32
|
46
|
amine@32
|
47
|
amine@32
|
48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from
|
amine@32
|
49 :class:`auditok.util.DataSource` and supplies a higher abstraction level
|
amine@32
|
50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features:
|
amine@32
|
51
|
amine@32
|
52 - Define a fixed-length `block_size` (alias `bs`, i.e. analysis window)
|
amine@32
|
53 - Alternatively, use `block_dur` (duration in seconds, alias `bd`)
|
amine@32
|
54 - Allow overlap between two consecutive analysis windows
|
amine@32
|
55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`).
|
amine@32
|
56 This can be very important if your validator use the **spectral** information of audio data
|
amine@32
|
57 instead of raw audio samples.
|
amine@32
|
58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone)
|
amine@32
|
59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and
|
amine@32
|
60 you want to process it many times off-line and/or save it)
|
amine@32
|
61
|
amine@32
|
62 See :class:`auditok.util.ADSFactory` documentation for more information.
|
amine@32
|
63
|
amine@32
|
64 Last but not least, the current version has only one audio window validator based on
|
amine@32
|
65 signal energy (:class:`auditok.util.AudioEnergyValidator).
|
amine@32
|
66
|
amine@32
|
67 **********************************
|
amine@32
|
68 Illustrative examples with strings
|
amine@32
|
69 **********************************
|
amine@32
|
70
|
amine@32
|
71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class
|
amine@32
|
72 created for test and illustration purposes. Imagine that each character of
|
amine@33
|
73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for
|
amine@32
|
74 example. In the following examples we will use upper case letters to represent
|
amine@32
|
75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
|
amine@32
|
76 silent frames.
|
amine@32
|
77
|
amine@32
|
78
|
amine@32
|
79 Extract sub-sequences of consecutive upper case letters
|
amine@32
|
80 #######################################################
|
amine@32
|
81
|
amine@32
|
82
|
amine@32
|
83 We want to extract sub-sequences of characters that have:
|
amine@32
|
84
|
amine@32
|
85 - A minimum length of 1 (`min_length` = 1)
|
amine@32
|
86 - A maximum length of 9999 (`max_length` = 9999)
|
amine@32
|
87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
|
amine@32
|
88
|
amine@32
|
89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the
|
amine@32
|
90 checked character is in upper case and `False` otherwise.
|
amine@32
|
91
|
amine@32
|
92 .. code:: python
|
amine@32
|
93
|
amine@32
|
94 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@32
|
95
|
amine@32
|
96 class UpperCaseChecker(DataValidator):
|
amine@32
|
97 def is_valid(self, frame):
|
amine@32
|
98 return frame.isupper()
|
amine@32
|
99
|
amine@32
|
100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
|
amine@32
|
101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@32
|
102 min_length=1, max_length=9999, max_continuous_silence=0)
|
amine@32
|
103
|
amine@32
|
104 tokenizer.tokenize(dsource)
|
amine@32
|
105
|
amine@32
|
106 The output is a list of two tuples, each contains the extracted sub-sequence and its
|
amine@32
|
107 start and end position in the original sequence respectively:
|
amine@32
|
108
|
amine@32
|
109
|
amine@32
|
110 .. code:: python
|
amine@32
|
111
|
amine@32
|
112
|
amine@32
|
113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
|
amine@32
|
114
|
amine@32
|
115
|
amine@32
|
116 Tolerate up to two non-valid (lower case) letters within an extracted sequence
|
amine@32
|
117 ##############################################################################
|
amine@32
|
118
|
amine@32
|
119 To do so, we set `max_continuous_silence` =2:
|
amine@32
|
120
|
amine@32
|
121 .. code:: python
|
amine@32
|
122
|
amine@32
|
123
|
amine@32
|
124 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@32
|
125
|
amine@32
|
126 class UpperCaseChecker(DataValidator):
|
amine@32
|
127 def is_valid(self, frame):
|
amine@32
|
128 return frame.isupper()
|
amine@32
|
129
|
amine@32
|
130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@32
|
131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@32
|
132 min_length=1, max_length=9999, max_continuous_silence=2)
|
amine@32
|
133
|
amine@32
|
134 tokenizer.tokenize(dsource)
|
amine@32
|
135
|
amine@32
|
136
|
amine@32
|
137 output:
|
amine@32
|
138
|
amine@32
|
139 .. code:: python
|
amine@32
|
140
|
amine@32
|
141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
|
amine@32
|
142
|
amine@32
|
143 Notice the trailing lower case letters "dd" and "ee" at the end of the two
|
amine@32
|
144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing
|
amine@35
|
145 silence* if it does not exceed `max_continuous_silence`. This can be changed
|
amine@32
|
146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example).
|
amine@32
|
147
|
amine@32
|
148 Remove trailing silence
|
amine@32
|
149 #######################
|
amine@32
|
150
|
amine@32
|
151 Trailing silence can be useful for many sound recognition applications, including
|
amine@32
|
152 speech recognition. Moreover, from the human auditory system point of view, trailing
|
amine@32
|
153 low energy signal helps removing abrupt signal cuts.
|
amine@32
|
154
|
amine@32
|
155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
|
amine@32
|
156
|
amine@32
|
157 .. code:: python
|
amine@32
|
158
|
amine@32
|
159 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@32
|
160
|
amine@32
|
161 class UpperCaseChecker(DataValidator):
|
amine@32
|
162 def is_valid(self, frame):
|
amine@32
|
163 return frame.isupper()
|
amine@32
|
164
|
amine@32
|
165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@32
|
166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@32
|
167 min_length=1, max_length=9999, max_continuous_silence=2,
|
amine@32
|
168 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@32
|
169
|
amine@32
|
170 tokenizer.tokenize(dsource)
|
amine@32
|
171
|
amine@32
|
172 output:
|
amine@32
|
173
|
amine@32
|
174 .. code:: python
|
amine@32
|
175
|
amine@32
|
176 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
|
amine@32
|
177
|
amine@32
|
178
|
amine@32
|
179
|
amine@32
|
180 Limit the length of detected tokens
|
amine@32
|
181 ###################################
|
amine@32
|
182
|
amine@32
|
183
|
amine@32
|
184 Imagine that you just want to detect and recognize a small part of a long
|
amine@32
|
185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
|
amine@32
|
186 event hogs the tokenizer and prevent it from feeding the event to the next
|
amine@32
|
187 processing step (i.e. a sound recognizer). You can do this by:
|
amine@32
|
188
|
amine@32
|
189 - limiting the length of a detected token.
|
amine@32
|
190
|
amine@32
|
191 and
|
amine@32
|
192
|
amine@32
|
193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize`
|
amine@32
|
194 so that the tokenizer delivers a token as soon as it is detected.
|
amine@32
|
195
|
amine@32
|
196 The following code limits the length of a token to 5:
|
amine@32
|
197
|
amine@32
|
198 .. code:: python
|
amine@32
|
199
|
amine@32
|
200 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@32
|
201
|
amine@32
|
202 class UpperCaseChecker(DataValidator):
|
amine@32
|
203 def is_valid(self, frame):
|
amine@32
|
204 return frame.isupper()
|
amine@32
|
205
|
amine@32
|
206 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
|
amine@32
|
207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@32
|
208 min_length=1, max_length=5, max_continuous_silence=0)
|
amine@32
|
209
|
amine@32
|
210 def print_token(data, start, end):
|
amine@32
|
211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
|
amine@32
|
212
|
amine@32
|
213 tokenizer.tokenize(dsource, callback=print_token)
|
amine@32
|
214
|
amine@32
|
215
|
amine@32
|
216 output:
|
amine@32
|
217
|
amine@32
|
218 .. code:: python
|
amine@32
|
219
|
amine@32
|
220 "token = 'ABCDE', starts at 3, ends at 7"
|
amine@32
|
221 "token = 'FGHIJ', starts at 8, ends at 12"
|
amine@32
|
222 "token = 'K', starts at 13, ends at 13"
|
amine@32
|
223
|
amine@32
|
224
|
amine@32
|
225 ************************
|
amine@32
|
226 `auditok` and Audio Data
|
amine@32
|
227 ************************
|
amine@32
|
228
|
amine@35
|
229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator`
|
amine@35
|
230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any
|
amine@32
|
231 further it is worth, explaining a certain number of points.
|
amine@32
|
232
|
amine@35
|
233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource`
|
amine@35
|
234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference
|
amine@35
|
235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`.
|
amine@35
|
236
|
amine@35
|
237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization.
|
amine@35
|
238
|
amine@35
|
239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory.
|
amine@35
|
240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ
|
amine@35
|
241 depending on the passed arguments. Further details can be found in the respective method documentation.
|
amine@35
|
242
|
amine@35
|
243 Note however the following two calls that will create an :class:`AudioDataSource`
|
amine@35
|
244 that reads data from an audio file and from the built-in microphone respectively.
|
amine@32
|
245
|
amine@32
|
246 .. code:: python
|
amine@32
|
247
|
amine@32
|
248 from auditok import ADSFactory
|
amine@32
|
249
|
amine@32
|
250 # Get an AudioDataSource from a file
|
amine@35
|
251 # use 'filename', alias 'fn' keyword argument
|
amine@32
|
252 file_ads = ADSFactory.ads(filename = "path/to/file/")
|
amine@32
|
253
|
amine@32
|
254 # Get an AudioDataSource from the built-in microphone
|
amine@32
|
255 # The returned object has the default values for sampling
|
amine@32
|
256 # rate, sample width an number of channels. see method's
|
amine@32
|
257 # documentation for customized values
|
amine@32
|
258 mic_ads = ADSFactory.ads()
|
amine@32
|
259
|
amine@35
|
260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
|
amine@35
|
261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns
|
amine@35
|
262 one frame of data or None.
|
amine@32
|
263
|
amine@35
|
264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window*
|
amine@35
|
265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`).
|
amine@35
|
266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30
|
amine@35
|
267 (`int(0.3 / (10. / 1000)) == 30`).
|
amine@32
|
268
|
amine@35
|
269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an
|
amine@35
|
270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more
|
amine@35
|
271 data (returns what's left in stream or None).
|
amine@32
|
272
|
amine@35
|
273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of
|
amine@35
|
274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms.
|
amine@32
|
275
|
amine@35
|
276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling
|
amine@35
|
277 rate of your audio source/data (file, microphone, etc.).
|
amine@32
|
278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
|
amine@35
|
279
|
amine@35
|
280 You can use the `block_size` keyword (alias `bs`) to define your analysis window:
|
amine@32
|
281
|
amine@32
|
282 .. code:: python
|
amine@32
|
283
|
amine@32
|
284 from auditok import ADSFactory
|
amine@32
|
285
|
amine@35
|
286 '''
|
amine@35
|
287 Assume you have an audio file with a sampling rate of 16000
|
amine@35
|
288 '''
|
amine@35
|
289
|
amine@35
|
290 # file_ads.read() will return blocks of 160 sample
|
amine@32
|
291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
|
amine@32
|
292
|
amine@35
|
293 # file_ads.read() will return blocks of 320 sample
|
amine@35
|
294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320)
|
amine@32
|
295
|
amine@35
|
296
|
amine@35
|
297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur`
|
amine@35
|
298 (alias `bd`):
|
amine@32
|
299
|
amine@32
|
300 .. code:: python
|
amine@32
|
301
|
amine@35
|
302 from auditok import ADSFactory
|
amine@35
|
303 # use an analysis window of 20 ms
|
amine@35
|
304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02)
|
amine@35
|
305
|
amine@35
|
306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing
|
amine@35
|
307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to
|
amine@35
|
308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like:
|
amine@35
|
309
|
amine@35
|
310
|
amine@35
|
311 .. code:: python
|
amine@35
|
312
|
amine@35
|
313 analysis_win_seconds = 0.01 # 10 ms
|
amine@35
|
314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds)
|
amine@32
|
315 analysis_window_ms = analysis_win_seconds * 1000
|
amine@32
|
316
|
amine@35
|
317 # If you want your maximum continuous silence to be 300 ms use:
|
amine@32
|
318 max_continuous_silence = int(300. / analysis_window_ms)
|
amine@32
|
319
|
amine@35
|
320 # which is the same as:
|
amine@32
|
321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
|
amine@32
|
322
|
amine@35
|
323 # or simply:
|
amine@35
|
324 max_continuous_silence = 30
|
amine@35
|
325
|
amine@32
|
326
|
amine@32
|
327 ******************************
|
amine@32
|
328 Examples using real audio data
|
amine@32
|
329 ******************************
|
amine@32
|
330
|
amine@32
|
331
|
amine@32
|
332 Extract isolated phrases from an utterance
|
amine@32
|
333 ##########################################
|
amine@32
|
334
|
amine@32
|
335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from
|
amine@32
|
336 the database. The file contains of isolated pronunciation of digits from 1 to 1
|
amine@32
|
337 in Arabic as well as breath-in/out between 2 and 3. The code will play the
|
amine@32
|
338 original file then the detected sounds separately. Note that we use an
|
amine@32
|
339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
|
amine@32
|
340 on microphone quality, background noise and the amplitude of events you want to
|
amine@32
|
341 detect.
|
amine@32
|
342
|
amine@32
|
343 .. code:: python
|
amine@32
|
344
|
amine@32
|
345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@32
|
346
|
amine@32
|
347 # We set the `record` argument to True so that we can rewind the source
|
amine@32
|
348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
|
amine@32
|
349
|
amine@32
|
350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
|
amine@32
|
351
|
hoelzl@61
|
352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
|
amine@32
|
353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
|
amine@32
|
354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
|
amine@32
|
355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
|
amine@32
|
356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
|
amine@32
|
357
|
amine@32
|
358 asource.open()
|
amine@32
|
359 tokens = tokenizer.tokenize(asource)
|
amine@32
|
360
|
amine@32
|
361 # Play detected regions back
|
amine@32
|
362
|
amine@32
|
363 player = player_for(asource)
|
amine@32
|
364
|
amine@32
|
365 # Rewind and read the whole signal
|
amine@32
|
366 asource.rewind()
|
amine@32
|
367 original_signal = []
|
amine@32
|
368
|
amine@32
|
369 while True:
|
amine@32
|
370 w = asource.read()
|
amine@32
|
371 if w is None:
|
amine@32
|
372 break
|
amine@32
|
373 original_signal.append(w)
|
amine@32
|
374
|
amine@32
|
375 original_signal = ''.join(original_signal)
|
amine@32
|
376
|
amine@32
|
377 print("Playing the original file...")
|
amine@32
|
378 player.play(original_signal)
|
amine@32
|
379
|
amine@32
|
380 print("playing detected regions...")
|
amine@32
|
381 for t in tokens:
|
amine@32
|
382 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@32
|
383 data = ''.join(t[0])
|
amine@32
|
384 player.play(data)
|
amine@32
|
385
|
amine@32
|
386 assert len(tokens) == 8
|
amine@32
|
387
|
amine@32
|
388
|
amine@32
|
389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
|
amine@32
|
390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
|
amine@32
|
391 that, in the original file, the last three digit are closer to each other than the
|
amine@32
|
392 previous ones. If you wan them to be extracted as one single phrase, you can do so
|
amine@32
|
393 by tolerating a larger continuous silence within a detection:
|
amine@32
|
394
|
amine@32
|
395 .. code:: python
|
amine@32
|
396
|
amine@32
|
397 tokenizer.max_continuous_silence = 50
|
amine@32
|
398 asource.rewind()
|
amine@32
|
399 tokens = tokenizer.tokenize(asource)
|
amine@32
|
400
|
amine@32
|
401 for t in tokens:
|
amine@32
|
402 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@32
|
403 data = ''.join(t[0])
|
amine@32
|
404 player.play(data)
|
amine@32
|
405
|
amine@32
|
406 assert len(tokens) == 6
|
amine@32
|
407
|
amine@32
|
408
|
amine@32
|
409 Trim leading and trailing silence
|
amine@32
|
410 #################################
|
amine@32
|
411
|
amine@32
|
412 The tokenizer in the following example is set up to remove the silence
|
amine@32
|
413 that precedes the first acoustic activity or follows the last activity
|
amine@32
|
414 in a record. It preserves whatever it founds between the two activities.
|
amine@32
|
415 In other words, it removes the leading and trailing silence.
|
amine@32
|
416
|
amine@32
|
417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
|
amine@32
|
418 (i.e. block_size == 4410)
|
amine@32
|
419
|
amine@32
|
420 Energy threshold is 50.
|
amine@32
|
421
|
amine@32
|
422 The tokenizer will start accumulating windows up from the moment it encounters
|
amine@32
|
423 the first analysis window of an energy >= 50. ALL the following windows will be
|
amine@32
|
424 kept regardless of their energy. At the end of the analysis, it will drop trailing
|
amine@32
|
425 windows with an energy below 50.
|
amine@32
|
426
|
amine@32
|
427 This is an interesting example because the audio file we're analyzing contains a very
|
amine@32
|
428 brief noise that occurs within the leading silence. We certainly do want our tokenizer
|
amine@32
|
429 to stop at this point and considers whatever it comes after as a useful signal.
|
amine@32
|
430 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
|
amine@48
|
431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
|
amine@32
|
432 that a valid event must start with at least 3 noisy windows, between which there
|
amine@32
|
433 is at most 1 silent window.
|
amine@32
|
434
|
amine@32
|
435 Still with this configuration we can get the tokenizer detect that noise as a valid event
|
amine@32
|
436 (if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
|
amine@32
|
437 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
|
amine@32
|
438 longer silence and hence the energy of the overall analysis window will be below 50.
|
amine@32
|
439
|
amine@35
|
440 When using a shorter analysis window (of 10 ms for instance, block_size == 441), the brief
|
amine@32
|
441 noise contributes more to energy calculation which yields an energy of over 50 for the window.
|
amine@32
|
442 Again we can deal with this situation by using a higher energy threshold (55 for example).
|
amine@32
|
443
|
amine@32
|
444 .. code:: python
|
amine@32
|
445
|
amine@32
|
446 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@32
|
447
|
amine@32
|
448 # record = True so that we'll be able to rewind the source.
|
amine@32
|
449 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
|
amine@32
|
450 record=True, block_size=4410)
|
amine@32
|
451 asource.open()
|
amine@32
|
452
|
amine@32
|
453 original_signal = []
|
amine@32
|
454 # Read the whole signal
|
amine@32
|
455 while True:
|
amine@32
|
456 w = asource.read()
|
amine@32
|
457 if w is None:
|
amine@32
|
458 break
|
amine@32
|
459 original_signal.append(w)
|
amine@32
|
460
|
amine@32
|
461 original_signal = ''.join(original_signal)
|
amine@32
|
462
|
amine@32
|
463 # rewind source
|
amine@32
|
464 asource.rewind()
|
amine@32
|
465
|
amine@32
|
466 # Create a validator with an energy threshold of 50
|
amine@32
|
467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@32
|
468
|
amine@32
|
469 # Create a tokenizer with an unlimited token length and continuous silence within a token
|
amine@32
|
470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
|
amine@32
|
471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@32
|
472
|
amine@32
|
473 tokens = trimmer.tokenize(asource)
|
amine@32
|
474
|
amine@32
|
475 # Make sure we only have one token
|
amine@32
|
476 assert len(tokens) == 1, "Should have detected one single token"
|
amine@32
|
477
|
amine@32
|
478 trimmed_signal = ''.join(tokens[0][0])
|
amine@32
|
479
|
amine@32
|
480 player = player_for(asource)
|
amine@32
|
481
|
amine@32
|
482 print("Playing original signal (with leading and trailing silence)...")
|
amine@32
|
483 player.play(original_signal)
|
amine@32
|
484 print("Playing trimmed signal...")
|
amine@32
|
485 player.play(trimmed_signal)
|
amine@32
|
486
|
amine@32
|
487
|
amine@32
|
488 Online audio signal processing
|
amine@32
|
489 ##############################
|
amine@32
|
490
|
amine@32
|
491 In the next example, audio data is directly acquired from the built-in microphone.
|
amine@32
|
492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function
|
amine@32
|
493 so that audio activities are delivered as soon as they are detected. Each detected
|
amine@32
|
494 activity is played back using the build-in audio output device.
|
amine@32
|
495
|
amine@32
|
496 As mentioned before , Signal energy is strongly related to many factors such
|
amine@32
|
497 microphone sensitivity, background noise (including noise inherent to the hardware),
|
amine@32
|
498 distance and your operating system sound settings. Try a lower `energy_threshold`
|
amine@32
|
499 if your noise does not seem to be detected and a higher threshold if you notice
|
amine@32
|
500 an over detection (echo method prints a detection where you have made no noise).
|
amine@32
|
501
|
amine@32
|
502 .. code:: python
|
amine@32
|
503
|
amine@32
|
504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
|
amine@32
|
505
|
amine@32
|
506 # record = True so that we'll be able to rewind the source.
|
amine@32
|
507 # max_time = 10: read 10 seconds from the microphone
|
amine@32
|
508 asource = ADSFactory.ads(record=True, max_time=10)
|
amine@32
|
509
|
amine@32
|
510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@32
|
511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
|
amine@32
|
512
|
amine@32
|
513 player = player_for(asource)
|
amine@32
|
514
|
amine@32
|
515 def echo(data, start, end):
|
amine@32
|
516 print("Acoustic activity at: {0}--{1}".format(start, end))
|
amine@32
|
517 player.play(''.join(data))
|
amine@32
|
518
|
amine@32
|
519 asource.open()
|
amine@32
|
520
|
amine@32
|
521 tokenizer.tokenize(asource, callback=echo)
|
amine@32
|
522
|
amine@32
|
523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
|
amine@32
|
524
|
amine@32
|
525 .. code:: python
|
amine@32
|
526
|
amine@32
|
527 asource.rewind()
|
amine@32
|
528 # change energy threshold for example
|
amine@32
|
529 tokenizer.validator.set_energy_threshold(55)
|
amine@32
|
530 tokenizer.tokenize(asource, callback=echo)
|
amine@32
|
531
|
amine@32
|
532 In case you want to play the whole recorded signal back use:
|
amine@32
|
533
|
amine@32
|
534 .. code:: python
|
amine@32
|
535
|
amine@32
|
536 player.play(asource.get_audio_source().get_data_buffer())
|
amine@32
|
537
|
amine@32
|
538
|
amine@32
|
539 ************
|
amine@32
|
540 Contributing
|
amine@32
|
541 ************
|
amine@32
|
542
|
amine@32
|
543 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
|
amine@32
|
544
|
amine@32
|
545
|
amine@32
|
546 Amine SEHILI <amine.sehili@gmail.com>
|
amine@32
|
547 September 2015
|
amine@32
|
548
|
amine@32
|
549 *******
|
amine@32
|
550 License
|
amine@32
|
551 *******
|
amine@32
|
552
|
amine@32
|
553 This package is published under GNU GPL Version 3.
|