comparison doc/apitutorial.rst @ 331:9741b52f194a

Reformat code and documentation
author Amine Sehili <amine.sehili@gmail.com>
date Thu, 24 Oct 2019 20:49:51 +0200
parents 9e9c6b1a25b1
children
comparison
equal deleted inserted replaced
330:9665dc53c394 331:9741b52f194a
4 .. contents:: `Contents` 4 .. contents:: `Contents`
5 :depth: 3 5 :depth: 3
6 6
7 7
8 **auditok** is a module that can be used as a generic tool for data 8 **auditok** is a module that can be used as a generic tool for data
9 tokenization. Although its core motivation is **Acoustic Activity 9 tokenization. Although its core motivation is **Acoustic Activity
10 Detection** (AAD) and extraction from audio streams (i.e. detect 10 Detection** (AAD) and extraction from audio streams (i.e. detect
11 where a noise/an acoustic activity occurs within an audio stream and 11 where a noise/an acoustic activity occurs within an audio stream and
12 extract the corresponding portion of signal), it can easily be 12 extract the corresponding portion of signal), it can easily be
13 adapted to other tasks. 13 adapted to other tasks.
14 14
26 audio window (of 10, 20 or 100 milliseconds for instance) if what 26 audio window (of 10, 20 or 100 milliseconds for instance) if what
27 interests you are audio regions made up of a sequence of "noisy" 27 interests you are audio regions made up of a sequence of "noisy"
28 windows (whatever kind of noise: speech, baby cry, laughter, etc.). 28 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
29 29
30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer` 30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer`
31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be 31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be
32 configured to detect the desired regions from a stream. 32 configured to detect the desired regions from a stream.
33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource` 33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource`
34 object that has a `read` method. Read data can be of any type accepted 34 object that has a `read` method. Read data can be of any type accepted
35 by the `validator`. 35 by the `validator`.
36 36
41 (a class that implements :class:`auditok.util.DataSource`) object, be that from: 41 (a class that implements :class:`auditok.util.DataSource`) object, be that from:
42 42
43 - A file on the disk 43 - A file on the disk
44 - A buffer of data 44 - A buffer of data
45 - The built-in microphone (requires PyAudio) 45 - The built-in microphone (requires PyAudio)
46 46
47 47
48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from 48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from
49 :class:`auditok.util.DataSource` and supplies a higher abstraction level 49 :class:`auditok.util.DataSource` and supplies a higher abstraction level
50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features: 50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features:
51 51
55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`). 55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`).
56 This can be very important if your validator use the **spectral** information of audio data 56 This can be very important if your validator use the **spectral** information of audio data
57 instead of raw audio samples. 57 instead of raw audio samples.
58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone) 58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone)
59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and 59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and
60 you want to process it many times off-line and/or save it) 60 you want to process it many times off-line and/or save it)
61 61
62 See :class:`auditok.util.ADSFactory` documentation for more information. 62 See :class:`auditok.util.ADSFactory` documentation for more information.
63 63
64 Last but not least, the current version has only one audio window validator based on 64 Last but not least, the current version has only one audio window validator based on
65 signal energy (:class:`auditok.util.AudioEnergyValidator). 65 signal energy (:class:`auditok.util.AudioEnergyValidator).
67 ********************************** 67 **********************************
68 Illustrative examples with strings 68 Illustrative examples with strings
69 ********************************** 69 **********************************
70 70
71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class 71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class
72 created for test and illustration purposes. Imagine that each character of 72 created for test and illustration purposes. Imagine that each character of
73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for 73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for
74 example. In the following examples we will use upper case letters to represent 74 example. In the following examples we will use upper case letters to represent
75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for 75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
76 silent frames. 76 silent frames.
77 77
79 Extract sub-sequences of consecutive upper case letters 79 Extract sub-sequences of consecutive upper case letters
80 ####################################################### 80 #######################################################
81 81
82 82
83 We want to extract sub-sequences of characters that have: 83 We want to extract sub-sequences of characters that have:
84 84
85 - A minimum length of 1 (`min_length` = 1) 85 - A minimum length of 1 (`min_length` = 1)
86 - A maximum length of 9999 (`max_length` = 9999) 86 - A maximum length of 9999 (`max_length` = 9999)
87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) 87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
88 88
89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the 89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the
90 checked character is in upper case and `False` otherwise. 90 checked character is in upper case and `False` otherwise.
91 91
92 .. code:: python 92 .. code:: python
93 93
94 from auditok import StreamTokenizer, StringDataSource, DataValidator 94 from auditok import StreamTokenizer, StringDataSource, DataValidator
95 95
96 class UpperCaseChecker(DataValidator): 96 class UpperCaseChecker(DataValidator):
97 def is_valid(self, frame): 97 def is_valid(self, frame):
98 return frame.isupper() 98 return frame.isupper()
99 99
100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc") 100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
102 min_length=1, max_length=9999, max_continuous_silence=0) 102 min_length=1, max_length=9999, max_continuous_silence=0)
103 103
104 tokenizer.tokenize(dsource) 104 tokenizer.tokenize(dsource)
105 105
106 The output is a list of two tuples, each contains the extracted sub-sequence and its 106 The output is a list of two tuples, each contains the extracted sub-sequence and its
107 start and end position in the original sequence respectively: 107 start and end position in the original sequence respectively:
108 108
109 109
110 .. code:: python 110 .. code:: python
111 111
112 112
113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] 113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
114 114
115 115
116 Tolerate up to two non-valid (lower case) letters within an extracted sequence 116 Tolerate up to two non-valid (lower case) letters within an extracted sequence
117 ############################################################################## 117 ##############################################################################
118 118
119 To do so, we set `max_continuous_silence` =2: 119 To do so, we set `max_continuous_silence` =2:
120 120
121 .. code:: python 121 .. code:: python
122 122
123 123
124 from auditok import StreamTokenizer, StringDataSource, DataValidator 124 from auditok import StreamTokenizer, StringDataSource, DataValidator
125 125
126 class UpperCaseChecker(DataValidator): 126 class UpperCaseChecker(DataValidator):
127 def is_valid(self, frame): 127 def is_valid(self, frame):
128 return frame.isupper() 128 return frame.isupper()
129 129
130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") 130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
132 min_length=1, max_length=9999, max_continuous_silence=2) 132 min_length=1, max_length=9999, max_continuous_silence=2)
133 133
134 tokenizer.tokenize(dsource) 134 tokenizer.tokenize(dsource)
135 135
136 136
137 output: 137 output:
138 138
139 .. code:: python 139 .. code:: python
140 140
141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] 141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
142 142
143 Notice the trailing lower case letters "dd" and "ee" at the end of the two 143 Notice the trailing lower case letters "dd" and "ee" at the end of the two
144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing 144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing
145 silence* if it does not exceed `max_continuous_silence`. This can be changed 145 silence* if it does not exceed `max_continuous_silence`. This can be changed
146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example). 146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example).
147 147
155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: 155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
156 156
157 .. code:: python 157 .. code:: python
158 158
159 from auditok import StreamTokenizer, StringDataSource, DataValidator 159 from auditok import StreamTokenizer, StringDataSource, DataValidator
160 160
161 class UpperCaseChecker(DataValidator): 161 class UpperCaseChecker(DataValidator):
162 def is_valid(self, frame): 162 def is_valid(self, frame):
163 return frame.isupper() 163 return frame.isupper()
164 164
165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") 165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
167 min_length=1, max_length=9999, max_continuous_silence=2, 167 min_length=1, max_length=9999, max_continuous_silence=2,
168 mode=StreamTokenizer.DROP_TRAILING_SILENCE) 168 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
169 169
170 tokenizer.tokenize(dsource) 170 tokenizer.tokenize(dsource)
171 171
172 output: 172 output:
173 173
174 .. code:: python 174 .. code:: python
180 Limit the length of detected tokens 180 Limit the length of detected tokens
181 ################################### 181 ###################################
182 182
183 183
184 Imagine that you just want to detect and recognize a small part of a long 184 Imagine that you just want to detect and recognize a small part of a long
185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that 185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
186 event hogs the tokenizer and prevent it from feeding the event to the next 186 event hogs the tokenizer and prevent it from feeding the event to the next
187 processing step (i.e. a sound recognizer). You can do this by: 187 processing step (i.e. a sound recognizer). You can do this by:
188 188
189 - limiting the length of a detected token. 189 - limiting the length of a detected token.
190 190
191 and 191 and
192 192
193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize` 193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize`
194 so that the tokenizer delivers a token as soon as it is detected. 194 so that the tokenizer delivers a token as soon as it is detected.
195 195
196 The following code limits the length of a token to 5: 196 The following code limits the length of a token to 5:
197 197
198 .. code:: python 198 .. code:: python
199 199
200 from auditok import StreamTokenizer, StringDataSource, DataValidator 200 from auditok import StreamTokenizer, StringDataSource, DataValidator
201 201
202 class UpperCaseChecker(DataValidator): 202 class UpperCaseChecker(DataValidator):
203 def is_valid(self, frame): 203 def is_valid(self, frame):
204 return frame.isupper() 204 return frame.isupper()
205 205
206 dsource = StringDataSource("aaaABCDEFGHIJKbbb") 206 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
208 min_length=1, max_length=5, max_continuous_silence=0) 208 min_length=1, max_length=5, max_continuous_silence=0)
209 209
210 def print_token(data, start, end): 210 def print_token(data, start, end):
211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end)) 211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
212 212
213 tokenizer.tokenize(dsource, callback=print_token) 213 tokenizer.tokenize(dsource, callback=print_token)
214 214
215 215
216 output: 216 output:
217 217
218 .. code:: python 218 .. code:: python
219 219
224 224
225 ************************ 225 ************************
226 `auditok` and Audio Data 226 `auditok` and Audio Data
227 ************************ 227 ************************
228 228
229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator` 229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator`
230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any 230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any
231 further it is worth, explaining a certain number of points. 231 further it is worth, explaining a certain number of points.
232 232
233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource` 233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource`
234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference 234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference
235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`. 235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`.
236 236
237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization. 237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization.
238 238
239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory. 239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory.
240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ 240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ
241 depending on the passed arguments. Further details can be found in the respective method documentation. 241 depending on the passed arguments. Further details can be found in the respective method documentation.
242 242
243 Note however the following two calls that will create an :class:`AudioDataSource` 243 Note however the following two calls that will create an :class:`AudioDataSource`
244 that reads data from an audio file and from the built-in microphone respectively. 244 that reads data from an audio file and from the built-in microphone respectively.
245 245
246 .. code:: python 246 .. code:: python
247 247
248 from auditok import ADSFactory 248 from auditok import ADSFactory
249 249
250 # Get an AudioDataSource from a file 250 # Get an AudioDataSource from a file
251 # use 'filename', alias 'fn' keyword argument 251 # use 'filename', alias 'fn' keyword argument
252 file_ads = ADSFactory.ads(filename = "path/to/file/") 252 file_ads = ADSFactory.ads(filename = "path/to/file/")
253 253
254 # Get an AudioDataSource from the built-in microphone 254 # Get an AudioDataSource from the built-in microphone
255 # The returned object has the default values for sampling 255 # The returned object has the default values for sampling
256 # rate, sample width an number of channels. see method's 256 # rate, sample width an number of channels. see method's
257 # documentation for customized values 257 # documentation for customized values
258 mic_ads = ADSFactory.ads() 258 mic_ads = ADSFactory.ads()
259 259
260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence` 260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns 261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns
262 one frame of data or None. 262 one frame of data or None.
263 263
264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window* 264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window*
265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). 265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`).
266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30 266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30
267 (`int(0.3 / (10. / 1000)) == 30`). 267 (`int(0.3 / (10. / 1000)) == 30`).
268 268
269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an 269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an
270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more 270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more
271 data (returns what's left in stream or None). 271 data (returns what's left in stream or None).
272 272
273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of 273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of
274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms. 274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms.
275 275
276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling 276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling
277 rate of your audio source/data (file, microphone, etc.). 277 rate of your audio source/data (file, microphone, etc.).
278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms. 278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
279 279
280 You can use the `block_size` keyword (alias `bs`) to define your analysis window: 280 You can use the `block_size` keyword (alias `bs`) to define your analysis window:
281 281
282 .. code:: python 282 .. code:: python
283 283
284 from auditok import ADSFactory 284 from auditok import ADSFactory
285 285
286 ''' 286 '''
287 Assume you have an audio file with a sampling rate of 16000 287 Assume you have an audio file with a sampling rate of 16000
288 ''' 288 '''
289 289
290 # file_ads.read() will return blocks of 160 sample 290 # file_ads.read() will return blocks of 160 sample
291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160) 291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
292 292
293 # file_ads.read() will return blocks of 320 sample 293 # file_ads.read() will return blocks of 320 sample
294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320) 294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320)
295 295
296 296
297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur` 297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur`
298 (alias `bd`): 298 (alias `bd`):
299 299
300 .. code:: python 300 .. code:: python
301 301
302 from auditok import ADSFactory 302 from auditok import ADSFactory
303 # use an analysis window of 20 ms 303 # use an analysis window of 20 ms
304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02) 304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02)
305 305
306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing 306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing
307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to 307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to
308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like: 308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like:
309 309
310 310
311 .. code:: python 311 .. code:: python
312 312
313 analysis_win_seconds = 0.01 # 10 ms 313 analysis_win_seconds = 0.01 # 10 ms
314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds) 314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds)
315 analysis_window_ms = analysis_win_seconds * 1000 315 analysis_window_ms = analysis_win_seconds * 1000
316 316
317 # If you want your maximum continuous silence to be 300 ms use: 317 # If you want your maximum continuous silence to be 300 ms use:
318 max_continuous_silence = int(300. / analysis_window_ms) 318 max_continuous_silence = int(300. / analysis_window_ms)
319 319
320 # which is the same as: 320 # which is the same as:
321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) 321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
322 322
323 # or simply: 323 # or simply:
324 max_continuous_silence = 30 324 max_continuous_silence = 30
325 325
326 326
327 ****************************** 327 ******************************
328 Examples using real audio data 328 Examples using real audio data
329 ****************************** 329 ******************************
330 330
331 331
332 Extract isolated phrases from an utterance 332 Extract isolated phrases from an utterance
333 ########################################## 333 ##########################################
334 334
335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from 335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from
336 the database. The file contains of isolated pronunciation of digits from 1 to 1 336 the database. The file contains of isolated pronunciation of digits from 1 to 1
337 in Arabic as well as breath-in/out between 2 and 3. The code will play the 337 in Arabic as well as breath-in/out between 2 and 3. The code will play the
338 original file then the detected sounds separately. Note that we use an 338 original file then the detected sounds separately. Note that we use an
339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends 339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
340 on microphone quality, background noise and the amplitude of events you want to 340 on microphone quality, background noise and the amplitude of events you want to
341 detect. 341 detect.
342 342
343 .. code:: python 343 .. code:: python
344 344
345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset 345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
346 346
347 # We set the `record` argument to True so that we can rewind the source 347 # We set the `record` argument to True so that we can rewind the source
348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True) 348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
349 349
350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65) 350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
351 351
352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate()) 352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms 353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds 354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms 355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30) 356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
357 357
358 asource.open() 358 asource.open()
359 tokens = tokenizer.tokenize(asource) 359 tokens = tokenizer.tokenize(asource)
360 360
361 # Play detected regions back 361 # Play detected regions back
362 362
363 player = player_for(asource) 363 player = player_for(asource)
364 364
365 # Rewind and read the whole signal 365 # Rewind and read the whole signal
366 asource.rewind() 366 asource.rewind()
367 original_signal = [] 367 original_signal = []
368 368
369 while True: 369 while True:
370 w = asource.read() 370 w = asource.read()
371 if w is None: 371 if w is None:
372 break 372 break
373 original_signal.append(w) 373 original_signal.append(w)
374 374
375 original_signal = ''.join(original_signal) 375 original_signal = ''.join(original_signal)
376 376
377 print("Playing the original file...") 377 print("Playing the original file...")
378 player.play(original_signal) 378 player.play(original_signal)
379 379
380 print("playing detected regions...") 380 print("playing detected regions...")
381 for t in tokens: 381 for t in tokens:
382 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) 382 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
383 data = ''.join(t[0]) 383 data = ''.join(t[0])
384 player.play(data) 384 player.play(data)
385 385
386 assert len(tokens) == 8 386 assert len(tokens) == 8
387 387
388 388
389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits 389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed 390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
391 that, in the original file, the last three digit are closer to each other than the 391 that, in the original file, the last three digit are closer to each other than the
392 previous ones. If you wan them to be extracted as one single phrase, you can do so 392 previous ones. If you wan them to be extracted as one single phrase, you can do so
393 by tolerating a larger continuous silence within a detection: 393 by tolerating a larger continuous silence within a detection:
394 394
395 .. code:: python 395 .. code:: python
396 396
397 tokenizer.max_continuous_silence = 50 397 tokenizer.max_continuous_silence = 50
398 asource.rewind() 398 asource.rewind()
399 tokens = tokenizer.tokenize(asource) 399 tokens = tokenizer.tokenize(asource)
400 400
401 for t in tokens: 401 for t in tokens:
402 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) 402 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
403 data = ''.join(t[0]) 403 data = ''.join(t[0])
404 player.play(data) 404 player.play(data)
405 405
406 assert len(tokens) == 6 406 assert len(tokens) == 6
407 407
408 408
409 Trim leading and trailing silence 409 Trim leading and trailing silence
410 ################################# 410 #################################
411 411
412 The tokenizer in the following example is set up to remove the silence 412 The tokenizer in the following example is set up to remove the silence
413 that precedes the first acoustic activity or follows the last activity 413 that precedes the first acoustic activity or follows the last activity
414 in a record. It preserves whatever it founds between the two activities. 414 in a record. It preserves whatever it founds between the two activities.
415 In other words, it removes the leading and trailing silence. 415 In other words, it removes the leading and trailing silence.
416 416
417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms 417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
418 (i.e. block_size == 4410) 418 (i.e. block_size == 4410)
419 419
420 Energy threshold is 50. 420 Energy threshold is 50.
421 421
422 The tokenizer will start accumulating windows up from the moment it encounters 422 The tokenizer will start accumulating windows up from the moment it encounters
423 the first analysis window of an energy >= 50. ALL the following windows will be 423 the first analysis window of an energy >= 50. ALL the following windows will be
424 kept regardless of their energy. At the end of the analysis, it will drop trailing 424 kept regardless of their energy. At the end of the analysis, it will drop trailing
425 windows with an energy below 50. 425 windows with an energy below 50.
426 426
427 This is an interesting example because the audio file we're analyzing contains a very 427 This is an interesting example because the audio file we're analyzing contains a very
428 brief noise that occurs within the leading silence. We certainly do want our tokenizer 428 brief noise that occurs within the leading silence. We certainly do want our tokenizer
429 to stop at this point and considers whatever it comes after as a useful signal. 429 to stop at this point and considers whatever it comes after as a useful signal.
430 To force the tokenizer to ignore that brief event we use two other parameters `init_min` 430 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer 431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
432 that a valid event must start with at least 3 noisy windows, between which there 432 that a valid event must start with at least 3 noisy windows, between which there
433 is at most 1 silent window. 433 is at most 1 silent window.
455 while True: 455 while True:
456 w = asource.read() 456 w = asource.read()
457 if w is None: 457 if w is None:
458 break 458 break
459 original_signal.append(w) 459 original_signal.append(w)
460 460
461 original_signal = ''.join(original_signal) 461 original_signal = ''.join(original_signal)
462 462
463 # rewind source 463 # rewind source
464 asource.rewind() 464 asource.rewind()
465 465
466 # Create a validator with an energy threshold of 50 466 # Create a validator with an energy threshold of 50
467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) 467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
468 468
469 # Create a tokenizer with an unlimited token length and continuous silence within a token 469 # Create a tokenizer with an unlimited token length and continuous silence within a token
470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence 470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) 471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
472 472
473 tokens = trimmer.tokenize(asource) 473 tokens = trimmer.tokenize(asource)
474 474
475 # Make sure we only have one token 475 # Make sure we only have one token
476 assert len(tokens) == 1, "Should have detected one single token" 476 assert len(tokens) == 1, "Should have detected one single token"
477 477
478 trimmed_signal = ''.join(tokens[0][0]) 478 trimmed_signal = ''.join(tokens[0][0])
479 479
480 player = player_for(asource) 480 player = player_for(asource)
481 481
482 print("Playing original signal (with leading and trailing silence)...") 482 print("Playing original signal (with leading and trailing silence)...")
483 player.play(original_signal) 483 player.play(original_signal)
484 print("Playing trimmed signal...") 484 print("Playing trimmed signal...")
485 player.play(trimmed_signal) 485 player.play(trimmed_signal)
486 486
487 487
488 Online audio signal processing 488 Online audio signal processing
489 ############################## 489 ##############################
490 490
491 In the next example, audio data is directly acquired from the built-in microphone. 491 In the next example, audio data is directly acquired from the built-in microphone.
492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function 492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function
493 so that audio activities are delivered as soon as they are detected. Each detected 493 so that audio activities are delivered as soon as they are detected. Each detected
494 activity is played back using the build-in audio output device. 494 activity is played back using the build-in audio output device.
495 495
496 As mentioned before , Signal energy is strongly related to many factors such 496 As mentioned before , Signal energy is strongly related to many factors such
497 microphone sensitivity, background noise (including noise inherent to the hardware), 497 microphone sensitivity, background noise (including noise inherent to the hardware),
498 distance and your operating system sound settings. Try a lower `energy_threshold` 498 distance and your operating system sound settings. Try a lower `energy_threshold`
499 if your noise does not seem to be detected and a higher threshold if you notice 499 if your noise does not seem to be detected and a higher threshold if you notice
500 an over detection (echo method prints a detection where you have made no noise). 500 an over detection (echo method prints a detection where you have made no noise).
501 501
502 .. code:: python 502 .. code:: python
503 503
504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for 504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
505 505
506 # record = True so that we'll be able to rewind the source. 506 # record = True so that we'll be able to rewind the source.
507 # max_time = 10: read 10 seconds from the microphone 507 # max_time = 10: read 10 seconds from the microphone
508 asource = ADSFactory.ads(record=True, max_time=10) 508 asource = ADSFactory.ads(record=True, max_time=10)
509 509
510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) 510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30) 511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
512 512
513 player = player_for(asource) 513 player = player_for(asource)
514 514
515 def echo(data, start, end): 515 def echo(data, start, end):
516 print("Acoustic activity at: {0}--{1}".format(start, end)) 516 print("Acoustic activity at: {0}--{1}".format(start, end))
517 player.play(''.join(data)) 517 player.play(''.join(data))
518 518
519 asource.open() 519 asource.open()
520 520
521 tokenizer.tokenize(asource, callback=echo) 521 tokenizer.tokenize(asource, callback=echo)
522 522
523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code: 523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
524 524
525 .. code:: python 525 .. code:: python
532 In case you want to play the whole recorded signal back use: 532 In case you want to play the whole recorded signal back use:
533 533
534 .. code:: python 534 .. code:: python
535 535
536 player.play(asource.get_audio_source().get_data_buffer()) 536 player.play(asource.get_audio_source().get_data_buffer())
537 537
538 538
539 ************ 539 ************
540 Contributing 540 Contributing
541 ************ 541 ************
542 542