amine@2
|
1 .. auditok documentation.
|
amine@2
|
2
|
amine@2
|
3 auditok, an AUDIo TOKenization module
|
amine@2
|
4 =====================================
|
amine@2
|
5
|
amine@2
|
6
|
amine@2
|
7 **auditok** is a module that can be used as a generic tool for data
|
amine@2
|
8 tokenization. Although its core motivation is **Acoustic Activity
|
amine@2
|
9 Detection** (AAD) and extraction from audio streams (i.e. detect
|
amine@2
|
10 where a noise/an acoustic activity occurs within an audio stream and
|
amine@2
|
11 extract the corresponding portion of signal), it can easily be
|
amine@2
|
12 adapted to other tasks.
|
amine@2
|
13
|
amine@2
|
14 Globally speaking, it can be used to extract, from a sequence of
|
amine@2
|
15 observations, all sub-sequences that meet a certain number of
|
amine@2
|
16 criteria in terms of:
|
amine@2
|
17
|
amine@2
|
18 1. Minimum length of a **valid** token (i.e. sub-sequence)
|
amine@2
|
19 2. Maximum length of a valid token
|
amine@2
|
20 3. Maximum tolerated consecutive **non-valid** observations within
|
amine@2
|
21 a valid token
|
amine@2
|
22
|
amine@2
|
23 Examples of a non-valid observation are: a non-numeric ascii symbol
|
amine@2
|
24 if you are interested in sub-sequences of numeric symbols, or a silent
|
amine@2
|
25 audio window (of 10, 20 or 100 milliseconds for instance) if what
|
amine@2
|
26 interests you are audio regions made up of a sequence of "noisy"
|
amine@2
|
27 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
|
amine@2
|
28
|
amine@2
|
29 The most important component of `auditok` is the `StreamTokenizer` class.
|
amine@2
|
30 An instance of this class encapsulates a `DataValidator` and can be
|
amine@2
|
31 configured to detect the desired regions from a stream.
|
amine@2
|
32 The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource`
|
amine@2
|
33 object that has a `read` method. Read data can be of any type accepted
|
amine@2
|
34 by the `validator`.
|
amine@2
|
35
|
amine@2
|
36
|
amine@2
|
37 As the main aim of this module is **Audio Activity Detection**,
|
amine@2
|
38 it provides the `auditok.util.ADSFactory` factory class that makes
|
amine@2
|
39 it very easy to create an `AudioDataSource` (a class that implements `DataSource`)
|
amine@2
|
40 object, be that from:
|
amine@2
|
41
|
amine@2
|
42 - A file on the disk
|
amine@2
|
43 - A buffer of data
|
amine@2
|
44 - The built-in microphone (requires PyAudio)
|
amine@2
|
45
|
amine@2
|
46
|
amine@2
|
47 The `AudioDataSource` class inherits from `DataSource` and supplies
|
amine@2
|
48 a higher abstraction level than `AudioSource` thanks to a bunch of
|
amine@2
|
49 handy features:
|
amine@2
|
50
|
amine@2
|
51 - Define a fixed-length of block_size (i.e. analysis window)
|
amine@2
|
52 - Allow overlap between two consecutive analysis windows (hop_size < block_size).
|
amine@2
|
53 This can be very important if your validator use the **spectral**
|
amine@2
|
54 information of audio data instead of raw audio samples.
|
amine@2
|
55 - Limit the amount (i.e. duration) of read data (very useful when reading
|
amine@2
|
56 data from the microphone)
|
amine@2
|
57 - Record and rewind data (also useful if you read data from the microphone
|
amine@2
|
58 and you want to process it many times offline and/or save it)
|
amine@2
|
59
|
amine@2
|
60
|
amine@2
|
61 Last but not least, the current version has only one audio window validator based on
|
amine@2
|
62 signal energy.
|
amine@2
|
63
|
amine@2
|
64 Requirements
|
amine@2
|
65 ============
|
amine@2
|
66
|
amine@2
|
67 `auditok` requires `Pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
|
amine@2
|
68 for audio acquisition and playback.
|
amine@2
|
69
|
amine@2
|
70
|
amine@2
|
71 Illustrative examples with strings
|
amine@2
|
72 ==================================
|
amine@2
|
73
|
amine@2
|
74 Let us look at some examples using the `auditok.util.StringDataSource` class
|
amine@2
|
75 created for test and illustration purposes. Imagine that each character of
|
amine@2
|
76 `auditok.util.StringDataSource` data represent an audio slice of 100 ms for
|
amine@2
|
77 example. In the following examples we will use upper case letters to represent
|
amine@2
|
78 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
|
amine@2
|
79 silent frames.
|
amine@2
|
80
|
amine@2
|
81
|
amine@2
|
82 Extract sub-sequences of consecutive upper case letters
|
amine@2
|
83 -------------------------------------------------------
|
amine@2
|
84
|
amine@2
|
85 We want to extract sub-sequences of characters that have:
|
amine@2
|
86
|
amine@2
|
87 - A minimu length of 1 (`min_length` = 1)
|
amine@2
|
88 - A maximum length of 9999 (`max_length` = 9999)
|
amine@2
|
89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
|
amine@2
|
90
|
amine@2
|
91 We also create the `UpperCaseChecker` whose `read` method returns `True` if the
|
amine@2
|
92 checked character is in upper case and `False` otherwise.
|
amine@2
|
93
|
amine@2
|
94 .. code:: python
|
amine@2
|
95
|
amine@2
|
96 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
97
|
amine@2
|
98 class UpperCaseChecker(DataValidator):
|
amine@2
|
99 def is_valid(self, frame):
|
amine@2
|
100 return frame.isupper()
|
amine@2
|
101
|
amine@2
|
102 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
|
amine@2
|
103 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
104 min_length=1, max_length=9999, max_continuous_silence=0)
|
amine@2
|
105
|
amine@2
|
106 tokenizer.tokenize(dsource)
|
amine@2
|
107
|
amine@2
|
108 The output is a list of two tuples, each contains the extracted sub-sequence and its
|
amine@2
|
109 start and end position in the original sequence respectively:
|
amine@2
|
110
|
amine@2
|
111
|
amine@2
|
112 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
|
amine@2
|
113
|
amine@2
|
114 Tolerate up to two non-valid (lower case) letter within an extracted sequence
|
amine@2
|
115 -----------------------------------------------------------------------------
|
amine@2
|
116
|
amine@2
|
117 To do so, we set `max_continuous_silence` =2:
|
amine@2
|
118
|
amine@2
|
119 .. code:: python
|
amine@2
|
120
|
amine@2
|
121
|
amine@2
|
122 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
123
|
amine@2
|
124 class UpperCaseChecker(DataValidator):
|
amine@2
|
125 def is_valid(self, frame):
|
amine@2
|
126 return frame.isupper()
|
amine@2
|
127
|
amine@2
|
128 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@2
|
129 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
130 min_length=1, max_length=9999, max_continuous_silence=2)
|
amine@2
|
131
|
amine@2
|
132 tokenizer.tokenize(dsource)
|
amine@2
|
133
|
amine@2
|
134
|
amine@2
|
135 output:
|
amine@2
|
136
|
amine@2
|
137 .. code:: python
|
amine@2
|
138
|
amine@2
|
139 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
|
amine@2
|
140
|
amine@2
|
141 Notice the trailing lower case letters "dd" and "ee" at the end of the two
|
amine@2
|
142 tokens. The default behavior of `StreamTokenizer` is to keep the *trailing
|
amine@2
|
143 silence* if it does'nt exceed `max_continuous_silence`. This can be changed
|
amine@2
|
144 using the `DROP_TRAILING_SILENCE` mode (see next example).
|
amine@2
|
145
|
amine@2
|
146 Remove trailing silence
|
amine@2
|
147 -----------------------
|
amine@2
|
148
|
amine@2
|
149 Trailing silence can be useful for many sound recognition applications, including
|
amine@2
|
150 speech recognition. Moreover, from the human auditory system point of view, trailing
|
amine@2
|
151 low energy signal helps removing abrupt signal cuts.
|
amine@2
|
152
|
amine@2
|
153 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
|
amine@2
|
154
|
amine@2
|
155 .. code:: python
|
amine@2
|
156
|
amine@2
|
157 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
158
|
amine@2
|
159 class UpperCaseChecker(DataValidator):
|
amine@2
|
160 def is_valid(self, frame):
|
amine@2
|
161 return frame.isupper()
|
amine@2
|
162
|
amine@2
|
163 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
|
amine@2
|
164 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
165 min_length=1, max_length=9999, max_continuous_silence=2,
|
amine@2
|
166 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@2
|
167
|
amine@2
|
168 tokenizer.tokenize(dsource)
|
amine@2
|
169
|
amine@2
|
170 output:
|
amine@2
|
171
|
amine@2
|
172 .. code:: python
|
amine@2
|
173
|
amine@2
|
174 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
|
amine@2
|
175
|
amine@2
|
176
|
amine@2
|
177 Limit the length of detected tokens
|
amine@2
|
178 -----------------------------------
|
amine@2
|
179
|
amine@2
|
180 Imagine that you just want to detect and recognize a small part of a long
|
amine@2
|
181 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
|
amine@2
|
182 event hogs the tokenizer and prevent it from feeding the event to the next
|
amine@2
|
183 processing step (i.e. a sound recognizer). You can do this by:
|
amine@2
|
184
|
amine@2
|
185 - limiting the length of a detected token.
|
amine@2
|
186
|
amine@2
|
187 and
|
amine@2
|
188
|
amine@2
|
189 - using a callback function as an argument to `StreamTokenizer.tokenize`
|
amine@2
|
190 so that the tokenizer delivers a token as soon as it is detected.
|
amine@2
|
191
|
amine@2
|
192 The following code limits the length of a token to 5:
|
amine@2
|
193
|
amine@2
|
194 .. code:: python
|
amine@2
|
195
|
amine@2
|
196 from auditok import StreamTokenizer, StringDataSource, DataValidator
|
amine@2
|
197
|
amine@2
|
198 class UpperCaseChecker(DataValidator):
|
amine@2
|
199 def is_valid(self, frame):
|
amine@2
|
200 return frame.isupper()
|
amine@2
|
201
|
amine@2
|
202 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
|
amine@2
|
203 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
|
amine@2
|
204 min_length=1, max_length=5, max_continuous_silence=0)
|
amine@2
|
205
|
amine@2
|
206 def print_token(data, start, end):
|
amine@2
|
207 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
|
amine@2
|
208
|
amine@2
|
209 tokenizer.tokenize(dsource, callback=print_token)
|
amine@2
|
210
|
amine@2
|
211
|
amine@2
|
212 output:
|
amine@2
|
213
|
amine@2
|
214 "token = 'ABCDE', starts at 3, ends at 7"
|
amine@2
|
215 "token = 'FGHIJ', starts at 8, ends at 12"
|
amine@2
|
216 "token = 'K', starts at 13, ends at 13"
|
amine@2
|
217
|
amine@2
|
218
|
amine@2
|
219 Using real audio data
|
amine@2
|
220 =====================
|
amine@2
|
221
|
amine@2
|
222 In this section we will use `ADSFactory`, `AudioEnergyValidator` and `StreamTokenizer`
|
amine@2
|
223 for an AAD demonstration using audio data. Before we get any, further it is worth
|
amine@2
|
224 explaining a certain number of points.
|
amine@2
|
225
|
amine@2
|
226 `ADSFactory.ads` method is called to create an `AudioDataSource` object that can be
|
amine@2
|
227 passed to `StreamTokenizer.tokenize`. `ADSFactory.ads` accepts a number of keyword
|
amine@2
|
228 arguments, of which none is mandatory. The returned `AudioDataSource` object can
|
amine@2
|
229 however greatly differ depending on the passed arguments. Further details can be found
|
amine@2
|
230 in the respective method documentation. Note however the following two calls that will
|
amine@2
|
231 create an `AudioDataSource` that read data from an audio file and from the built-in
|
amine@2
|
232 microphone respectively.
|
amine@2
|
233
|
amine@2
|
234 .. code:: python
|
amine@2
|
235
|
amine@2
|
236 from auditok import ADSFactory
|
amine@2
|
237
|
amine@2
|
238 # Get an AudioDataSource from a file
|
amine@2
|
239 file_ads = ADSFactory.ads(filename = "path/to/file/")
|
amine@2
|
240
|
amine@2
|
241 # Get an AudioDataSource from the built-in microphone
|
amine@2
|
242 # The returned object has the default values for sampling
|
amine@2
|
243 # rate, sample width an number of channels. see method's
|
amine@2
|
244 # documentation for customized values
|
amine@2
|
245 mic_ads = ADSFactory.ads()
|
amine@2
|
246
|
amine@2
|
247 For `StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
|
amine@2
|
248 are expressed in term of number of frames. If you want a `max_length` of *2 seconds* for
|
amine@2
|
249 your detected sound events and your *analysis window* is *10 ms* long, you have to specify
|
amine@2
|
250 a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). For a `max_continuous_silence` of *300 ms*
|
amine@2
|
251 for instance, the value to pass to StreamTokenizer is 30 (`int(0.3 / (10. / 1000)) == 30`).
|
amine@2
|
252
|
amine@2
|
253
|
amine@2
|
254 Where do you get the size of the **analysis window** from?
|
amine@2
|
255
|
amine@2
|
256
|
amine@2
|
257 Well this is a parameter you pass to `ADSFactory.ads`. By default `ADSFactory.ads` uses
|
amine@2
|
258 an analysis window of 10 ms. the number of samples that 10 ms of signal contain will
|
amine@2
|
259 vary depending on the sampling rate of your audio source (file, microphone, etc.).
|
amine@2
|
260 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
|
amine@2
|
261 Therefore you can use block sizes of 160, 320, 1600 for analysis windows of 10, 20 and 100
|
amine@2
|
262 ms respectively.
|
amine@2
|
263
|
amine@2
|
264 .. code:: python
|
amine@2
|
265
|
amine@2
|
266 from auditok import ADSFactory
|
amine@2
|
267
|
amine@2
|
268 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
|
amine@2
|
269
|
amine@2
|
270 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 320)
|
amine@2
|
271
|
amine@2
|
272 # If no sampling rate is specified, ADSFactory use 16KHz as the default
|
amine@2
|
273 # rate for the microphone. If you want to use a window of 100 ms, use
|
amine@2
|
274 # a block size of 1600
|
amine@2
|
275 mic_ads = ADSFactory.ads(block_size = 1600)
|
amine@2
|
276
|
amine@2
|
277 So if your not sure what you analysis windows in seconds is, use the following:
|
amine@2
|
278
|
amine@2
|
279 .. code:: python
|
amine@2
|
280
|
amine@2
|
281 my_ads = ADSFactory.ads(...)
|
amine@2
|
282 analysis_win_seconds = float(my_ads.get_block_size()) / my_ads.get_sampling_rate()
|
amine@2
|
283 analysis_window_ms = analysis_win_seconds * 1000
|
amine@2
|
284
|
amine@2
|
285 # For a `max_continuous_silence` of 300 ms use:
|
amine@2
|
286 max_continuous_silence = int(300. / analysis_window_ms)
|
amine@2
|
287
|
amine@2
|
288 # Which is the same as
|
amine@2
|
289 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
|
amine@2
|
290
|
amine@2
|
291
|
amine@2
|
292 Examples
|
amine@2
|
293 --------
|
amine@2
|
294
|
amine@2
|
295 Extract isolated phrases from an utterance
|
amine@2
|
296 ------------------------------------------
|
amine@2
|
297
|
amine@2
|
298 We will build an `AudioDataSource` using a wave file from the database.
|
amine@2
|
299 The file contains of isolated pronunciation of digits from 1 to 1
|
amine@2
|
300 in Arabic as well as breath-in/out between 2 and 3. The code will play the
|
amine@2
|
301 original file then the detected sounds separately. Note that we use an
|
amine@2
|
302 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
|
amine@2
|
303 on microphone quality, background noise and the amplitude of events you want to
|
amine@2
|
304 detect.
|
amine@2
|
305
|
amine@2
|
306 .. code:: python
|
amine@2
|
307
|
amine@2
|
308 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@2
|
309
|
amine@2
|
310 # We set the `record` argument to True so that we can rewind the source
|
amine@2
|
311 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
|
amine@2
|
312
|
amine@2
|
313 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
|
amine@2
|
314
|
amine@2
|
315 # Defalut analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
|
amine@2
|
316 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
|
amine@2
|
317 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
|
amine@2
|
318 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
|
amine@2
|
319 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
|
amine@2
|
320
|
amine@2
|
321 asource.open()
|
amine@2
|
322 tokens = tokenizer.tokenize(asource)
|
amine@2
|
323
|
amine@2
|
324 # Play detected regions back
|
amine@2
|
325
|
amine@2
|
326 player = player_for(asource)
|
amine@2
|
327
|
amine@2
|
328 # Rewind and read the whole signal
|
amine@2
|
329 asource.rewind()
|
amine@2
|
330 original_signal = []
|
amine@2
|
331
|
amine@2
|
332 while True:
|
amine@2
|
333 w = asource.read()
|
amine@2
|
334 if w is None:
|
amine@2
|
335 break
|
amine@2
|
336 original_signal.append(w)
|
amine@2
|
337
|
amine@2
|
338 original_signal = ''.join(original_signal)
|
amine@2
|
339
|
amine@2
|
340 print("Playing the original file...")
|
amine@2
|
341 player.play(original_signal)
|
amine@2
|
342
|
amine@2
|
343 print("playing detected regions...")
|
amine@2
|
344 for t in tokens:
|
amine@2
|
345 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@2
|
346 data = ''.join(t[0])
|
amine@2
|
347 player.play(data)
|
amine@2
|
348
|
amine@2
|
349 assert len(tokens) == 8
|
amine@2
|
350
|
amine@2
|
351
|
amine@2
|
352 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
|
amine@2
|
353 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
|
amine@2
|
354 that, in the original file, the last three digit are closer to each other than the
|
amine@2
|
355 previous ones. If you wan them to be extracted as one single phrase, you can do so
|
amine@2
|
356 by tolerating a larger continuous silence within a detection:
|
amine@2
|
357
|
amine@2
|
358 .. code:: python
|
amine@2
|
359
|
amine@2
|
360 tokenizer.max_continuous_silence = 50
|
amine@2
|
361 asource.rewind()
|
amine@2
|
362 tokens = tokenizer.tokenize(asource)
|
amine@2
|
363
|
amine@2
|
364 for t in tokens:
|
amine@2
|
365 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
|
amine@2
|
366 data = ''.join(t[0])
|
amine@2
|
367 player.play(data)
|
amine@2
|
368
|
amine@2
|
369 assert len(tokens) == 6
|
amine@2
|
370
|
amine@2
|
371
|
amine@2
|
372 Trim leading and trailing silence
|
amine@2
|
373 ---------------------------------
|
amine@2
|
374
|
amine@2
|
375 The tokenizer in the following example is set up to remove the silence
|
amine@2
|
376 that precedes the first acoustic activity or follows the last activity
|
amine@2
|
377 in a record. It preserves whatever it founds between the two activities.
|
amine@2
|
378 In other words, it removes the leading and trailing silence.
|
amine@2
|
379
|
amine@2
|
380 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
|
amine@2
|
381 (i.e. bloc_ksize == 4410)
|
amine@2
|
382
|
amine@2
|
383 Energy threshold is 50.
|
amine@2
|
384
|
amine@2
|
385 The tokenizer will start accumulating windows up from the moment it encounters
|
amine@2
|
386 the first analysis window of an energy >= 50. ALL the following windows will be
|
amine@2
|
387 kept regardless of their energy. At the end of the analysis, it will drop trailing
|
amine@2
|
388 windows with an energy below 50.
|
amine@2
|
389
|
amine@2
|
390 This is an interesting example because the audio file we're analyzing contains a very
|
amine@2
|
391 brief noise that occurs within the leading silence. We certainly do want our tokenizer
|
amine@2
|
392 to stop at this point and considers whatever it comes after as a useful signal.
|
amine@2
|
393 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
|
amine@2
|
394 ans `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
|
amine@2
|
395 that a valid event must start with at least 3 noisy windows, between which there
|
amine@2
|
396 is at most 1 silent window.
|
amine@2
|
397
|
amine@2
|
398 Still with this configuration we can get the tokenizer detect that noise as a valid event
|
amine@2
|
399 (if it actually contains 3 consecutive noisy frames). To circummvent this we use an enough
|
amine@2
|
400 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
|
amine@2
|
401 longer silence and hence the energy of the overall analysis window will be below 50.
|
amine@2
|
402
|
amine@2
|
403 When using a shorter analysis window (of 10ms for instance, block_size == 441), the brief
|
amine@2
|
404 noise contributes more to energy calculation which yields an energy of over 50 for the window.
|
amine@2
|
405 Again we can deal with this situation by using a higher energy threshold (55 for example).
|
amine@2
|
406
|
amine@2
|
407 .. code:: python
|
amine@2
|
408
|
amine@2
|
409 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
|
amine@2
|
410 import pyaudio
|
amine@2
|
411
|
amine@2
|
412 # record = True so that we'll be able to rewind the source.
|
amine@2
|
413 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
|
amine@2
|
414 record=True, block_size=4410)
|
amine@2
|
415 asource.open()
|
amine@2
|
416
|
amine@2
|
417 original_signal = []
|
amine@2
|
418 # Read the whole signal
|
amine@2
|
419 while True:
|
amine@2
|
420 w = asource.read()
|
amine@2
|
421 if w is None:
|
amine@2
|
422 break
|
amine@2
|
423 original_signal.append(w)
|
amine@2
|
424
|
amine@2
|
425 original_signal = ''.join(original_signal)
|
amine@2
|
426
|
amine@2
|
427 # rewind source
|
amine@2
|
428 asource.rewind()
|
amine@2
|
429
|
amine@2
|
430 # Create a validator with an energy threshold of 50
|
amine@2
|
431 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@2
|
432
|
amine@2
|
433 # Create a tokenizer with an unlimited token length and continuous silence within a token
|
amine@2
|
434 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
|
amine@2
|
435 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
|
amine@2
|
436
|
amine@2
|
437
|
amine@2
|
438 tokens = trimmer.tokenize(asource)
|
amine@2
|
439
|
amine@2
|
440 # Make sure we only have one token
|
amine@2
|
441 assert len(tokens) == 1, "Should have detected one single token"
|
amine@2
|
442
|
amine@2
|
443 trimmed_signal = ''.join(tokens[0][0])
|
amine@2
|
444
|
amine@2
|
445 player = player_for(asource)
|
amine@2
|
446
|
amine@2
|
447 print("Playing original signal (with leading and trailing silence)...")
|
amine@2
|
448 player.play(original_signal)
|
amine@2
|
449 print("Playing trimmed signal...")
|
amine@2
|
450 player.play(trimmed_signal)
|
amine@2
|
451
|
amine@2
|
452
|
amine@2
|
453 Online audio signal processing
|
amine@2
|
454 ------------------------------
|
amine@2
|
455
|
amine@2
|
456 In the next example, audio data is directely acquired from the built-in microphone.
|
amine@2
|
457 The `tokenize` method is passed a callback function so that audio activities
|
amine@2
|
458 are delivered as soon as they are detected. Each detected activity is played
|
amine@2
|
459 back using the build-in audio output device.
|
amine@2
|
460
|
amine@2
|
461 As mentionned before , Signal energy is strongly related to many factors such
|
amine@2
|
462 microphone sensitivity, background noise (including noise inherent to the hardware),
|
amine@2
|
463 distance and your operating system sound settings. Try a lower `energy_threshold`
|
amine@2
|
464 if your noise does not seem to be detected and a higher threshold if you notice
|
amine@2
|
465 an over detection (echo method prints a detection where you have made no noise).
|
amine@2
|
466
|
amine@2
|
467 .. code:: python
|
amine@2
|
468
|
amine@2
|
469 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
|
amine@2
|
470 import pyaudio
|
amine@2
|
471
|
amine@2
|
472 # record = True so that we'll be able to rewind the source.
|
amine@2
|
473 # max_time = 10: read 10 seconds from the microphone
|
amine@2
|
474 asource = ADSFactory.ads(record=True, max_time=10)
|
amine@2
|
475
|
amine@2
|
476 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
|
amine@2
|
477 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
|
amine@2
|
478
|
amine@2
|
479 player = player_for(asource)
|
amine@2
|
480
|
amine@2
|
481 def echo(data, start, end):
|
amine@2
|
482 print("Acoustic activity at: {0}--{1}".format(start, end))
|
amine@2
|
483 player.play(''.join(data))
|
amine@2
|
484
|
amine@2
|
485 asource.open()
|
amine@2
|
486
|
amine@2
|
487 tokenizer.tokenize(asource, callback=echo)
|
amine@2
|
488
|
amine@2
|
489 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
|
amine@2
|
490
|
amine@2
|
491 .. code:: python
|
amine@2
|
492
|
amine@2
|
493 asource.rewind()
|
amine@2
|
494 # change energy threshold for example
|
amine@2
|
495 tokenizer.validator.set_energy_threshold(55)
|
amine@2
|
496 tokenizer.tokenize(asource, callback=echo)
|
amine@2
|
497
|
amine@2
|
498 In case you want to play the whole recorded signal back use:
|
amine@2
|
499
|
amine@2
|
500 .. code:: python
|
amine@2
|
501
|
amine@2
|
502 player.play(asource.get_audio_source().get_data_buffer())
|
amine@2
|
503
|
amine@2
|
504
|
amine@2
|
505 Contributing
|
amine@2
|
506 ============
|
amine@2
|
507 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
|
amine@2
|
508
|
amine@2
|
509
|
amine@2
|
510 Amine SEHILI <amine.sehili[_at_]gmail.com>
|
amine@2
|
511 September 2015
|
amine@2
|
512
|
amine@2
|
513 License
|
amine@2
|
514 =======
|
amine@2
|
515
|
amine@2
|
516 This package is published under GNU GPL Version 3.
|