amine@2
|
1 .. auditok documentation.
|
amine@2
|
2
|
amine@2
|
3 auditok, an AUDIo TOKenization module
|
amine@2
|
4 =====================================
|
amine@2
|
5
|
amine@3
|
6 .. contents:: `Contents`
|
amine@3
|
7 :depth: 3
|
amine@3
|
8
|
amine@2
|
9
|
amine@2
|
10 **auditok** is a module that can be used as a generic tool for data
|
amine@2
|
11 tokenization. Although its core motivation is **Acoustic Activity
|
amine@2
|
12 Detection** (AAD) and extraction from audio streams (i.e. detect
|
amine@2
|
13 where a noise/an acoustic activity occurs within an audio stream and
|
amine@2
|
14 extract the corresponding portion of signal), it can easily be
|
amine@2
|
15 adapted to other tasks.
|
amine@2
|
16
|
amine@2
|
17 Globally speaking, it can be used to extract, from a sequence of
|
amine@2
|
18 observations, all sub-sequences that meet a certain number of
|
amine@2
|
19 criteria in terms of:
|
amine@2
|
20
|
amine@2
|
21 1. Minimum length of a **valid** token (i.e. sub-sequence)
|
amine@2
|
22 2. Maximum length of a valid token
|
amine@2
|
23 3. Maximum tolerated consecutive **non-valid** observations within
|
amine@2
|
24 a valid token
|
amine@2
|
25
|
amine@2
|
26 Examples of a non-valid observation are: a non-numeric ascii symbol
|
amine@2
|
27 if you are interested in sub-sequences of numeric symbols, or a silent
|
amine@2
|
28 audio window (of 10, 20 or 100 milliseconds for instance) if what
|
amine@2
|
29 interests you are audio regions made up of a sequence of "noisy"
|
amine@2
|
30 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
|
amine@2
|
31
|
amine@3
|
32 The most important component of `auditok` is the `auditok.core.StreamTokenizer`
|
amine@3
|
33 class. An instance of this class encapsulates a `DataValidator` and can be
|
amine@2
|
34 configured to detect the desired regions from a stream.
|
amine@3
|
35 The `StreamTokenizer.tokenize` method accepts a `DataSource`
|
amine@2
|
36 object that has a `read` method. Read data can be of any type accepted
|
amine@2
|
37 by the `validator`.
|
amine@2
|
38
|
amine@2
|
39
|
amine@2
|
40 As the main aim of this module is **Audio Activity Detection**,
|
amine@2
|
41 it provides the `auditok.util.ADSFactory` factory class that makes
|
amine@2
|
42 it very easy to create an `AudioDataSource` (a class that implements `DataSource`)
|
amine@2
|
43 object, be that from:
|
amine@2
|
44
|
amine@2
|
45 - A file on the disk
|
amine@2
|
46 - A buffer of data
|
amine@2
|
47 - The built-in microphone (requires PyAudio)
|
amine@2
|
48
|
amine@2
|
49
|
amine@2
|
50 The `AudioDataSource` class inherits from `DataSource` and supplies
|
amine@2
|
51 a higher abstraction level than `AudioSource` thanks to a bunch of
|
amine@2
|
52 handy features:
|
amine@2
|
53
|
amine@3
|
54 - Define a fixed-length block_size (i.e. analysis window)
|
amine@3
|
55 - Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples.
|
amine@3
|
56 - Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone)
|
amine@5
|
57 - Record and rewind data (also useful if you read data from the microphone and you want to process it many times off-line and/or save it)
|
amine@2
|
58
|
amine@2
|
59
|
amine@2
|
60 Last but not least, the current version has only one audio window validator based on
|
amine@2
|
61 signal energy.
|
amine@2
|
62
|
amine@2
|
63 Requirements
|
amine@2
|
64 ============
|
amine@2
|
65
|
amine@2
|
66 `auditok` requires `Pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
|
amine@2
|
67 for audio acquisition and playback.
|
amine@2
|
68
|
amine@2
|
69
|
amine@2
|
70 Illustrative examples with strings
|
amine@2
|
71 ==================================
|
amine@2
|
72
|
amine@2
|
73 Let us look at some examples using the `auditok.util.StringDataSource` class
|
amine@2
|
74 created for test and illustration purposes. Imagine that each character of
|
amine@2
|
75 `auditok.util.StringDataSource` data represent an audio slice of 100 ms for
|
amine@2
|
76 example. In the following examples we will use upper case letters to represent
|
amine@2
|
77 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
|
amine@2
|
78 silent frames.
|
amine@2
|
79
|
amine@2
|
80
|
amine@2
|
81 Extract sub-sequences of consecutive upper case letters
|
amine@2
|
82 -------------------------------------------------------
|
amine@2
|
83
|
amine@3
|
84
|
amine@2
|
85 We want to extract sub-sequences of characters that have:
|
amine@2
|
86
|
amine@5
|
87 - A minimum length of 1 (`min_length` = 1)
|
amine@2
|
88 - A maximum length of 9999 (`max_length` = 9999)
|
amine@2
|
89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
|
amine@2
|
90
|
amine@2
|
91 We also create the `UpperCaseChecker` whose `read` method returns `True` if the
|
amine@2
|
92 checked character is in upper case and `False` otherwise.
|
amine@2
|
93
|
amine@2
|
94 .. code:: python
|
amine@2
|
95
|
amine@2
|
96 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
97
|
amine@2
|
98 class UpperCaseChecker(DataValidator):
|
amine@2
|
99 def is_valid(self, frame):
|
amine@2
|
100 return frame.isupper()
|
amine@2
|
101
|
amine@2
|
102 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
|
amine@2
|
103 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
104 min_length=1, max_length=9999, max_continuous_silence=0)
|
amine@2
|
105
|
amine@2
|
106 tokenizer.tokenize(dsource)
|
amine@2
|
107
|
amine@2
|
108 The output is a list of two tuples, each contains the extracted sub-sequence and its
|
amine@2
|
109 start and end position in the original sequence respectively:
|
amine@2
|
110
|
amine@3
|
111
|
amine@3
|
112 .. code:: python
|
amine@3
|
113
|
amine@2
|
114
|
amine@2
|
115 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
|
amine@2
|
116
|
amine@3
|
117
|
amine@3
|
118 Tolerate up to two non-valid (lower case) letters within an extracted sequence
|
amine@3
|
119 ------------------------------------------------------------------------------
|
amine@2
|
120
|
amine@2
|
121 To do so, we set `max_continuous_silence` =2:
|
amine@2
|
122
|
amine@2
|
123 .. code:: python
|
amine@2
|
124
|
amine@2
|
125
|
amine@2
|
126 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
127
|
amine@2
|
128 class UpperCaseChecker(DataValidator):
|
amine@2
|
129 def is_valid(self, frame):
|
amine@2
|
130 return frame.isupper()
|
amine@2
|
131
|
amine@2
|
132 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@2
|
133 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
134 min_length=1, max_length=9999, max_continuous_silence=2)
|
amine@2
|
135
|
amine@2
|
136 tokenizer.tokenize(dsource)
|
amine@2
|
137
|
amine@2
|
138
|
amine@2
|
139 output:
|
amine@2
|
140
|
amine@2
|
141 .. code:: python
|
amine@2
|
142
|
amine@2
|
143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
|
amine@2
|
144
|
amine@23
|
145 Notice the trailing lower case letters "dd" and "ee" at the end of the two
|
amine@23
|
146 tokens. The default behavior of `StreamTokenizer` is to keep the *trailing
|
amine@5
|
147 silence* if it doesn't exceed `max_continuous_silence`. This can be changed
|
amine@23
|
148 using the `DROP_TRAILING_SILENCE` mode (see next example).
|
amine@2
|
149
|
amine@23
|
150 Remove trailing silence
|
amine@2
|
151 -----------------------
|
amine@2
|
152
|
amine@23
|
153 Trailing silence can be useful for many sound recognition applications, including
|
amine@23
|
154 speech recognition. Moreover, from the human auditory system point of view, trailing
|
amine@2
|
155 low energy signal helps removing abrupt signal cuts.
|
amine@2
|
156
|
amine@23
|
157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
|
amine@2
|
158
|
amine@2
|
159 .. code:: python
|
amine@2
|
160
|
amine@2
|
161 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
162
|
amine@2
|
163 class UpperCaseChecker(DataValidator):
|
amine@2
|
164 def is_valid(self, frame):
|
amine@2
|
165 return frame.isupper()
|
amine@2
|
166
|
amine@2
|
167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@2
|
168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
169 min_length=1, max_length=9999, max_continuous_silence=2,
|
amine@23
|
170 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@2
|
171
|
amine@2
|
172 tokenizer.tokenize(dsource)
|
amine@2
|
173
|
amine@2
|
174 output:
|
amine@2
|
175
|
amine@2
|
176 .. code:: python
|
amine@2
|
177
|
amine@2
|
178 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
|
amine@2
|
179
|
amine@2
|
180
|
amine@3
|
181
|
amine@2
|
182 Limit the length of detected tokens
|
amine@2
|
183 -----------------------------------
|
amine@2
|
184
|
amine@3
|
185
|
amine@2
|
186 Imagine that you just want to detect and recognize a small part of a long
|
amine@2
|
187 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
|
amine@2
|
188 event hogs the tokenizer and prevent it from feeding the event to the next
|
amine@2
|
189 processing step (i.e. a sound recognizer). You can do this by:
|
amine@2
|
190
|
amine@2
|
191 - limiting the length of a detected token.
|
amine@2
|
192
|
amine@2
|
193 and
|
amine@2
|
194
|
amine@2
|
195 - using a callback function as an argument to `StreamTokenizer.tokenize`
|
amine@2
|
196 so that the tokenizer delivers a token as soon as it is detected.
|
amine@2
|
197
|
amine@2
|
198 The following code limits the length of a token to 5:
|
amine@2
|
199
|
amine@2
|
200 .. code:: python
|
amine@2
|
201
|
amine@2
|
202 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
203
|
amine@2
|
204 class UpperCaseChecker(DataValidator):
|
amine@2
|
205 def is_valid(self, frame):
|
amine@2
|
206 return frame.isupper()
|
amine@2
|
207
|
amine@2
|
208 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
|
amine@2
|
209 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
210 min_length=1, max_length=5, max_continuous_silence=0)
|
amine@2
|
211
|
amine@2
|
212 def print_token(data, start, end):
|
amine@2
|
213 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
|
amine@2
|
214
|
amine@2
|
215 tokenizer.tokenize(dsource, callback=print_token)
|
amine@2
|
216
|
amine@2
|
217
|
amine@2
|
218 output:
|
amine@2
|
219
|
amine@3
|
220 .. code:: python
|
amine@3
|
221
|
amine@2
|
222 "token = 'ABCDE', starts at 3, ends at 7"
|
amine@2
|
223 "token = 'FGHIJ', starts at 8, ends at 12"
|
amine@2
|
224 "token = 'K', starts at 13, ends at 13"
|
amine@2
|
225
|
amine@2
|
226
|
amine@3
|
227
|
amine@2
|
228 Using real audio data
|
amine@2
|
229 =====================
|
amine@2
|
230
|
amine@2
|
231 In this section we will use `ADSFactory`, `AudioEnergyValidator` and `StreamTokenizer`
|
amine@2
|
232 for an AAD demonstration using audio data. Before we get any, further it is worth
|
amine@2
|
233 explaining a certain number of points.
|
amine@2
|
234
|
amine@2
|
235 `ADSFactory.ads` method is called to create an `AudioDataSource` object that can be
|
amine@2
|
236 passed to `StreamTokenizer.tokenize`. `ADSFactory.ads` accepts a number of keyword
|
amine@2
|
237 arguments, of which none is mandatory. The returned `AudioDataSource` object can
|
amine@2
|
238 however greatly differ depending on the passed arguments. Further details can be found
|
amine@2
|
239 in the respective method documentation. Note however the following two calls that will
|
amine@2
|
240 create an `AudioDataSource` that read data from an audio file and from the built-in
|
amine@2
|
241 microphone respectively.
|
amine@2
|
242
|
amine@2
|
243 .. code:: python
|
amine@2
|
244
|
amine@2
|
245 from auditok import ADSFactory
|
amine@2
|
246
|
amine@2
|
247 # Get an AudioDataSource from a file
|
amine@2
|
248 file_ads = ADSFactory.ads(filename = "path/to/file/")
|
amine@2
|
249
|
amine@2
|
250 # Get an AudioDataSource from the built-in microphone
|
amine@2
|
251 # The returned object has the default values for sampling
|
amine@2
|
252 # rate, sample width an number of channels. see method's
|
amine@2
|
253 # documentation for customized values
|
amine@2
|
254 mic_ads = ADSFactory.ads()
|
amine@2
|
255
|
amine@2
|
256 For `StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
|
amine@2
|
257 are expressed in term of number of frames. If you want a `max_length` of *2 seconds* for
|
amine@2
|
258 your detected sound events and your *analysis window* is *10 ms* long, you have to specify
|
amine@2
|
259 a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). For a `max_continuous_silence` of *300 ms*
|
amine@2
|
260 for instance, the value to pass to StreamTokenizer is 30 (`int(0.3 / (10. / 1000)) == 30`).
|
amine@2
|
261
|
amine@2
|
262
|
amine@2
|
263 Where do you get the size of the **analysis window** from?
|
amine@2
|
264
|
amine@2
|
265
|
amine@2
|
266 Well this is a parameter you pass to `ADSFactory.ads`. By default `ADSFactory.ads` uses
|
amine@2
|
267 an analysis window of 10 ms. the number of samples that 10 ms of signal contain will
|
amine@2
|
268 vary depending on the sampling rate of your audio source (file, microphone, etc.).
|
amine@2
|
269 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
|
amine@2
|
270 Therefore you can use block sizes of 160, 320, 1600 for analysis windows of 10, 20 and 100
|
amine@2
|
271 ms respectively.
|
amine@2
|
272
|
amine@2
|
273 .. code:: python
|
amine@2
|
274
|
amine@2
|
275 from auditok import ADSFactory
|
amine@2
|
276
|
amine@2
|
277 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
|
amine@2
|
278
|
amine@2
|
279 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 320)
|
amine@2
|
280
|
amine@2
|
281 # If no sampling rate is specified, ADSFactory use 16KHz as the default
|
amine@2
|
282 # rate for the microphone. If you want to use a window of 100 ms, use
|
amine@2
|
283 # a block size of 1600
|
amine@2
|
284 mic_ads = ADSFactory.ads(block_size = 1600)
|
amine@2
|
285
|
amine@2
|
286 So if your not sure what you analysis windows in seconds is, use the following:
|
amine@2
|
287
|
amine@2
|
288 .. code:: python
|
amine@2
|
289
|
amine@2
|
290 my_ads = ADSFactory.ads(...)
|
amine@2
|
291 analysis_win_seconds = float(my_ads.get_block_size()) / my_ads.get_sampling_rate()
|
amine@2
|
292 analysis_window_ms = analysis_win_seconds * 1000
|
amine@2
|
293
|
amine@2
|
294 # For a `max_continuous_silence` of 300 ms use:
|
amine@2
|
295 max_continuous_silence = int(300. / analysis_window_ms)
|
amine@2
|
296
|
amine@2
|
297 # Which is the same as
|
amine@2
|
298 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
|
amine@2
|
299
|
amine@3
|
300
|
amine@2
|
301
|
amine@2
|
302 Extract isolated phrases from an utterance
|
amine@2
|
303 ------------------------------------------
|
amine@2
|
304
|
amine@2
|
305 We will build an `AudioDataSource` using a wave file from the database.
|
amine@2
|
306 The file contains of isolated pronunciation of digits from 1 to 1
|
amine@2
|
307 in Arabic as well as breath-in/out between 2 and 3. The code will play the
|
amine@2
|
308 original file then the detected sounds separately. Note that we use an
|
amine@2
|
309 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
|
amine@2
|
310 on microphone quality, background noise and the amplitude of events you want to
|
amine@2
|
311 detect.
|
amine@2
|
312
|
amine@2
|
313 .. code:: python
|
amine@2
|
314
|
amine@2
|
315 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@2
|
316
|
amine@2
|
317 # We set the `record` argument to True so that we can rewind the source
|
amine@2
|
318 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
|
amine@2
|
319
|
amine@2
|
320 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
|
amine@2
|
321
|
amine@2
|
322 # Defalut analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
|
amine@2
|
323 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
|
amine@2
|
324 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
|
amine@2
|
325 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
|
amine@2
|
326 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
|
amine@2
|
327
|
amine@2
|
328 asource.open()
|
amine@2
|
329 tokens = tokenizer.tokenize(asource)
|
amine@2
|
330
|
amine@2
|
331 # Play detected regions back
|
amine@2
|
332
|
amine@2
|
333 player = player_for(asource)
|
amine@2
|
334
|
amine@2
|
335 # Rewind and read the whole signal
|
amine@2
|
336 asource.rewind()
|
amine@2
|
337 original_signal = []
|
amine@2
|
338
|
amine@2
|
339 while True:
|
amine@2
|
340 w = asource.read()
|
amine@2
|
341 if w is None:
|
amine@2
|
342 break
|
amine@2
|
343 original_signal.append(w)
|
amine@2
|
344
|
amine@2
|
345 original_signal = ''.join(original_signal)
|
amine@2
|
346
|
amine@2
|
347 print("Playing the original file...")
|
amine@2
|
348 player.play(original_signal)
|
amine@2
|
349
|
amine@2
|
350 print("playing detected regions...")
|
amine@2
|
351 for t in tokens:
|
amine@2
|
352 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@2
|
353 data = ''.join(t[0])
|
amine@2
|
354 player.play(data)
|
amine@2
|
355
|
amine@2
|
356 assert len(tokens) == 8
|
amine@2
|
357
|
amine@2
|
358
|
amine@2
|
359 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
|
amine@2
|
360 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
|
amine@2
|
361 that, in the original file, the last three digit are closer to each other than the
|
amine@2
|
362 previous ones. If you wan them to be extracted as one single phrase, you can do so
|
amine@2
|
363 by tolerating a larger continuous silence within a detection:
|
amine@2
|
364
|
amine@2
|
365 .. code:: python
|
amine@2
|
366
|
amine@2
|
367 tokenizer.max_continuous_silence = 50
|
amine@2
|
368 asource.rewind()
|
amine@2
|
369 tokens = tokenizer.tokenize(asource)
|
amine@2
|
370
|
amine@2
|
371 for t in tokens:
|
amine@2
|
372 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@2
|
373 data = ''.join(t[0])
|
amine@2
|
374 player.play(data)
|
amine@2
|
375
|
amine@2
|
376 assert len(tokens) == 6
|
amine@2
|
377
|
amine@2
|
378
|
amine@23
|
379 Trim leading and trailing silence
|
amine@2
|
380 ---------------------------------
|
amine@2
|
381
|
amine@2
|
382 The tokenizer in the following example is set up to remove the silence
|
amine@2
|
383 that precedes the first acoustic activity or follows the last activity
|
amine@2
|
384 in a record. It preserves whatever it founds between the two activities.
|
amine@23
|
385 In other words, it removes the leading and trailing silence.
|
amine@2
|
386
|
amine@2
|
387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
|
amine@8
|
388 (i.e. block_size == 4410)
|
amine@2
|
389
|
amine@2
|
390 Energy threshold is 50.
|
amine@2
|
391
|
amine@2
|
392 The tokenizer will start accumulating windows up from the moment it encounters
|
amine@2
|
393 the first analysis window of an energy >= 50. ALL the following windows will be
|
amine@23
|
394 kept regardless of their energy. At the end of the analysis, it will drop trailing
|
amine@2
|
395 windows with an energy below 50.
|
amine@2
|
396
|
amine@2
|
397 This is an interesting example because the audio file we're analyzing contains a very
|
amine@2
|
398 brief noise that occurs within the leading silence. We certainly do want our tokenizer
|
amine@2
|
399 to stop at this point and considers whatever it comes after as a useful signal.
|
amine@2
|
400 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
|
amine@2
|
401 ans `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
|
amine@2
|
402 that a valid event must start with at least 3 noisy windows, between which there
|
amine@2
|
403 is at most 1 silent window.
|
amine@2
|
404
|
amine@2
|
405 Still with this configuration we can get the tokenizer detect that noise as a valid event
|
amine@5
|
406 (if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
|
amine@2
|
407 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
|
amine@2
|
408 longer silence and hence the energy of the overall analysis window will be below 50.
|
amine@2
|
409
|
amine@2
|
410 When using a shorter analysis window (of 10ms for instance, block_size == 441), the brief
|
amine@2
|
411 noise contributes more to energy calculation which yields an energy of over 50 for the window.
|
amine@2
|
412 Again we can deal with this situation by using a higher energy threshold (55 for example).
|
amine@2
|
413
|
amine@2
|
414 .. code:: python
|
amine@2
|
415
|
amine@2
|
416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@2
|
417
|
amine@2
|
418 # record = True so that we'll be able to rewind the source.
|
amine@23
|
419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
|
amine@2
|
420 record=True, block_size=4410)
|
amine@2
|
421 asource.open()
|
amine@2
|
422
|
amine@2
|
423 original_signal = []
|
amine@2
|
424 # Read the whole signal
|
amine@2
|
425 while True:
|
amine@2
|
426 w = asource.read()
|
amine@2
|
427 if w is None:
|
amine@2
|
428 break
|
amine@2
|
429 original_signal.append(w)
|
amine@2
|
430
|
amine@2
|
431 original_signal = ''.join(original_signal)
|
amine@2
|
432
|
amine@2
|
433 # rewind source
|
amine@2
|
434 asource.rewind()
|
amine@2
|
435
|
amine@2
|
436 # Create a validator with an energy threshold of 50
|
amine@2
|
437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@2
|
438
|
amine@2
|
439 # Create a tokenizer with an unlimited token length and continuous silence within a token
|
amine@23
|
440 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
|
amine@23
|
441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@2
|
442
|
amine@2
|
443
|
amine@2
|
444 tokens = trimmer.tokenize(asource)
|
amine@2
|
445
|
amine@2
|
446 # Make sure we only have one token
|
amine@2
|
447 assert len(tokens) == 1, "Should have detected one single token"
|
amine@2
|
448
|
amine@2
|
449 trimmed_signal = ''.join(tokens[0][0])
|
amine@2
|
450
|
amine@2
|
451 player = player_for(asource)
|
amine@2
|
452
|
amine@23
|
453 print("Playing original signal (with leading and trailing silence)...")
|
amine@2
|
454 player.play(original_signal)
|
amine@2
|
455 print("Playing trimmed signal...")
|
amine@2
|
456 player.play(trimmed_signal)
|
amine@2
|
457
|
amine@2
|
458
|
amine@2
|
459 Online audio signal processing
|
amine@2
|
460 ------------------------------
|
amine@2
|
461
|
amine@5
|
462 In the next example, audio data is directly acquired from the built-in microphone.
|
amine@2
|
463 The `tokenize` method is passed a callback function so that audio activities
|
amine@2
|
464 are delivered as soon as they are detected. Each detected activity is played
|
amine@2
|
465 back using the build-in audio output device.
|
amine@2
|
466
|
amine@5
|
467 As mentioned before , Signal energy is strongly related to many factors such
|
amine@2
|
468 microphone sensitivity, background noise (including noise inherent to the hardware),
|
amine@2
|
469 distance and your operating system sound settings. Try a lower `energy_threshold`
|
amine@2
|
470 if your noise does not seem to be detected and a higher threshold if you notice
|
amine@2
|
471 an over detection (echo method prints a detection where you have made no noise).
|
amine@2
|
472
|
amine@2
|
473 .. code:: python
|
amine@2
|
474
|
amine@2
|
475 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
|
amine@2
|
476
|
amine@2
|
477 # record = True so that we'll be able to rewind the source.
|
amine@2
|
478 # max_time = 10: read 10 seconds from the microphone
|
amine@2
|
479 asource = ADSFactory.ads(record=True, max_time=10)
|
amine@2
|
480
|
amine@2
|
481 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@2
|
482 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
|
amine@2
|
483
|
amine@2
|
484 player = player_for(asource)
|
amine@2
|
485
|
amine@2
|
486 def echo(data, start, end):
|
amine@2
|
487 print("Acoustic activity at: {0}--{1}".format(start, end))
|
amine@2
|
488 player.play(''.join(data))
|
amine@2
|
489
|
amine@2
|
490 asource.open()
|
amine@2
|
491
|
amine@2
|
492 tokenizer.tokenize(asource, callback=echo)
|
amine@2
|
493
|
amine@2
|
494 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
|
amine@2
|
495
|
amine@2
|
496 .. code:: python
|
amine@2
|
497
|
amine@2
|
498 asource.rewind()
|
amine@2
|
499 # change energy threshold for example
|
amine@2
|
500 tokenizer.validator.set_energy_threshold(55)
|
amine@2
|
501 tokenizer.tokenize(asource, callback=echo)
|
amine@2
|
502
|
amine@2
|
503 In case you want to play the whole recorded signal back use:
|
amine@2
|
504
|
amine@2
|
505 .. code:: python
|
amine@2
|
506
|
amine@2
|
507 player.play(asource.get_audio_source().get_data_buffer())
|
amine@2
|
508
|
amine@2
|
509
|
amine@2
|
510 Contributing
|
amine@2
|
511 ============
|
amine@2
|
512 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
|
amine@2
|
513
|
amine@2
|
514
|
amine@3
|
515 Amine SEHILI <amine.sehili@gmail.com>
|
amine@2
|
516 September 2015
|
amine@2
|
517
|
amine@2
|
518 License
|
amine@2
|
519 =======
|
amine@2
|
520
|
amine@23
|
521 This package is published under GNU GPL Version 3.
|