comparison quickstart.rst @ 3:364eeb8e8bd2

README.md, typos fixes
author Amine Sehili <amine.sehili@gmail.com>
date Tue, 22 Sep 2015 10:49:57 +0200
parents edee860b9f61
children 252d698ae642
comparison
equal deleted inserted replaced
2:edee860b9f61 3:364eeb8e8bd2
1 .. auditok documentation. 1 .. auditok documentation.
2 2
3 auditok, an AUDIo TOKenization module 3 auditok, an AUDIo TOKenization module
4 ===================================== 4 =====================================
5
6 .. contents:: `Contents`
7 :depth: 3
5 8
6 9
7 **auditok** is a module that can be used as a generic tool for data 10 **auditok** is a module that can be used as a generic tool for data
8 tokenization. Although its core motivation is **Acoustic Activity 11 tokenization. Although its core motivation is **Acoustic Activity
9 Detection** (AAD) and extraction from audio streams (i.e. detect 12 Detection** (AAD) and extraction from audio streams (i.e. detect
24 if you are interested in sub-sequences of numeric symbols, or a silent 27 if you are interested in sub-sequences of numeric symbols, or a silent
25 audio window (of 10, 20 or 100 milliseconds for instance) if what 28 audio window (of 10, 20 or 100 milliseconds for instance) if what
26 interests you are audio regions made up of a sequence of "noisy" 29 interests you are audio regions made up of a sequence of "noisy"
27 windows (whatever kind of noise: speech, baby cry, laughter, etc.). 30 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
28 31
29 The most important component of `auditok` is the `StreamTokenizer` class. 32 The most important component of `auditok` is the `auditok.core.StreamTokenizer`
30 An instance of this class encapsulates a `DataValidator` and can be 33 class. An instance of this class encapsulates a `DataValidator` and can be
31 configured to detect the desired regions from a stream. 34 configured to detect the desired regions from a stream.
32 The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource` 35 The `StreamTokenizer.tokenize` method accepts a `DataSource`
33 object that has a `read` method. Read data can be of any type accepted 36 object that has a `read` method. Read data can be of any type accepted
34 by the `validator`. 37 by the `validator`.
35 38
36 39
37 As the main aim of this module is **Audio Activity Detection**, 40 As the main aim of this module is **Audio Activity Detection**,
46 49
47 The `AudioDataSource` class inherits from `DataSource` and supplies 50 The `AudioDataSource` class inherits from `DataSource` and supplies
48 a higher abstraction level than `AudioSource` thanks to a bunch of 51 a higher abstraction level than `AudioSource` thanks to a bunch of
49 handy features: 52 handy features:
50 53
51 - Define a fixed-length of block_size (i.e. analysis window) 54 - Define a fixed-length block_size (i.e. analysis window)
52 - Allow overlap between two consecutive analysis windows (hop_size < block_size). 55 - Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples.
53 This can be very important if your validator use the **spectral** 56 - Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone)
54 information of audio data instead of raw audio samples. 57 - Record and rewind data (also useful if you read data from the microphone and you want to process it many times offline and/or save it)
55 - Limit the amount (i.e. duration) of read data (very useful when reading
56 data from the microphone)
57 - Record and rewind data (also useful if you read data from the microphone
58 and you want to process it many times offline and/or save it)
59 58
60 59
61 Last but not least, the current version has only one audio window validator based on 60 Last but not least, the current version has only one audio window validator based on
62 signal energy. 61 signal energy.
63 62
80 79
81 80
82 Extract sub-sequences of consecutive upper case letters 81 Extract sub-sequences of consecutive upper case letters
83 ------------------------------------------------------- 82 -------------------------------------------------------
84 83
84
85 We want to extract sub-sequences of characters that have: 85 We want to extract sub-sequences of characters that have:
86 86
87 - A minimu length of 1 (`min_length` = 1) 87 - A minimu length of 1 (`min_length` = 1)
88 - A maximum length of 9999 (`max_length` = 9999) 88 - A maximum length of 9999 (`max_length` = 9999)
89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) 89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
106 tokenizer.tokenize(dsource) 106 tokenizer.tokenize(dsource)
107 107
108 The output is a list of two tuples, each contains the extracted sub-sequence and its 108 The output is a list of two tuples, each contains the extracted sub-sequence and its
109 start and end position in the original sequence respectively: 109 start and end position in the original sequence respectively:
110 110
111
112 .. code:: python
113
111 114
112 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] 115 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
113 116
114 Tolerate up to two non-valid (lower case) letter within an extracted sequence 117
115 ----------------------------------------------------------------------------- 118 Tolerate up to two non-valid (lower case) letters within an extracted sequence
119 ------------------------------------------------------------------------------
116 120
117 To do so, we set `max_continuous_silence` =2: 121 To do so, we set `max_continuous_silence` =2:
118 122
119 .. code:: python 123 .. code:: python
120 124
136 140
137 .. code:: python 141 .. code:: python
138 142
139 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] 143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
140 144
141 Notice the trailing lower case letters "dd" and "ee" at the end of the two 145 Notice the tailing lower case letters "dd" and "ee" at the end of the two
142 tokens. The default behavior of `StreamTokenizer` is to keep the *trailing 146 tokens. The default behavior of `StreamTokenizer` is to keep the *tailing
143 silence* if it does'nt exceed `max_continuous_silence`. This can be changed 147 silence* if it does'nt exceed `max_continuous_silence`. This can be changed
144 using the `DROP_TRAILING_SILENCE` mode (see next example). 148 using the `DROP_TAILING_SILENCE` mode (see next example).
145 149
146 Remove trailing silence 150 Remove tailing silence
147 ----------------------- 151 -----------------------
148 152
149 Trailing silence can be useful for many sound recognition applications, including 153 Tailing silence can be useful for many sound recognition applications, including
150 speech recognition. Moreover, from the human auditory system point of view, trailing 154 speech recognition. Moreover, from the human auditory system point of view, tailing
151 low energy signal helps removing abrupt signal cuts. 155 low energy signal helps removing abrupt signal cuts.
152 156
153 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: 157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`:
154 158
155 .. code:: python 159 .. code:: python
156 160
157 from auditok import StreamTokenizer, StringDataSource, DataValidator 161 from auditok import StreamTokenizer, StringDataSource, DataValidator
158 162
161 return frame.isupper() 165 return frame.isupper()
162 166
163 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") 167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
164 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
165 min_length=1, max_length=9999, max_continuous_silence=2, 169 min_length=1, max_length=9999, max_continuous_silence=2,
166 mode=StreamTokenizer.DROP_TRAILING_SILENCE) 170 mode=StreamTokenizer.DROP_TAILING_SILENCE)
167 171
168 tokenizer.tokenize(dsource) 172 tokenizer.tokenize(dsource)
169 173
170 output: 174 output:
171 175
172 .. code:: python 176 .. code:: python
173 177
174 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)] 178 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
175 179
176 180
181
177 Limit the length of detected tokens 182 Limit the length of detected tokens
178 ----------------------------------- 183 -----------------------------------
184
179 185
180 Imagine that you just want to detect and recognize a small part of a long 186 Imagine that you just want to detect and recognize a small part of a long
181 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that 187 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
182 event hogs the tokenizer and prevent it from feeding the event to the next 188 event hogs the tokenizer and prevent it from feeding the event to the next
183 processing step (i.e. a sound recognizer). You can do this by: 189 processing step (i.e. a sound recognizer). You can do this by:
209 tokenizer.tokenize(dsource, callback=print_token) 215 tokenizer.tokenize(dsource, callback=print_token)
210 216
211 217
212 output: 218 output:
213 219
220 .. code:: python
221
214 "token = 'ABCDE', starts at 3, ends at 7" 222 "token = 'ABCDE', starts at 3, ends at 7"
215 "token = 'FGHIJ', starts at 8, ends at 12" 223 "token = 'FGHIJ', starts at 8, ends at 12"
216 "token = 'K', starts at 13, ends at 13" 224 "token = 'K', starts at 13, ends at 13"
225
217 226
218 227
219 Using real audio data 228 Using real audio data
220 ===================== 229 =====================
221 230
286 max_continuous_silence = int(300. / analysis_window_ms) 295 max_continuous_silence = int(300. / analysis_window_ms)
287 296
288 # Which is the same as 297 # Which is the same as
289 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) 298 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
290 299
291 300
292 Examples
293 --------
294 301
295 Extract isolated phrases from an utterance 302 Extract isolated phrases from an utterance
296 ------------------------------------------ 303 ------------------------------------------
297 304
298 We will build an `AudioDataSource` using a wave file from the database. 305 We will build an `AudioDataSource` using a wave file from the database.
367 player.play(data) 374 player.play(data)
368 375
369 assert len(tokens) == 6 376 assert len(tokens) == 6
370 377
371 378
372 Trim leading and trailing silence 379 Trim leading and tailing silence
373 --------------------------------- 380 ---------------------------------
374 381
375 The tokenizer in the following example is set up to remove the silence 382 The tokenizer in the following example is set up to remove the silence
376 that precedes the first acoustic activity or follows the last activity 383 that precedes the first acoustic activity or follows the last activity
377 in a record. It preserves whatever it founds between the two activities. 384 in a record. It preserves whatever it founds between the two activities.
378 In other words, it removes the leading and trailing silence. 385 In other words, it removes the leading and tailing silence.
379 386
380 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms 387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
381 (i.e. bloc_ksize == 4410) 388 (i.e. bloc_ksize == 4410)
382 389
383 Energy threshold is 50. 390 Energy threshold is 50.
384 391
385 The tokenizer will start accumulating windows up from the moment it encounters 392 The tokenizer will start accumulating windows up from the moment it encounters
386 the first analysis window of an energy >= 50. ALL the following windows will be 393 the first analysis window of an energy >= 50. ALL the following windows will be
387 kept regardless of their energy. At the end of the analysis, it will drop trailing 394 kept regardless of their energy. At the end of the analysis, it will drop tailing
388 windows with an energy below 50. 395 windows with an energy below 50.
389 396
390 This is an interesting example because the audio file we're analyzing contains a very 397 This is an interesting example because the audio file we're analyzing contains a very
391 brief noise that occurs within the leading silence. We certainly do want our tokenizer 398 brief noise that occurs within the leading silence. We certainly do want our tokenizer
392 to stop at this point and considers whatever it comes after as a useful signal. 399 to stop at this point and considers whatever it comes after as a useful signal.
405 Again we can deal with this situation by using a higher energy threshold (55 for example). 412 Again we can deal with this situation by using a higher energy threshold (55 for example).
406 413
407 .. code:: python 414 .. code:: python
408 415
409 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset 416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
410 import pyaudio
411 417
412 # record = True so that we'll be able to rewind the source. 418 # record = True so that we'll be able to rewind the source.
413 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, 419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence,
414 record=True, block_size=4410) 420 record=True, block_size=4410)
415 asource.open() 421 asource.open()
416 422
417 original_signal = [] 423 original_signal = []
418 # Read the whole signal 424 # Read the whole signal
429 435
430 # Create a validator with an energy threshold of 50 436 # Create a validator with an energy threshold of 50
431 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) 437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
432 438
433 # Create a tokenizer with an unlimited token length and continuous silence within a token 439 # Create a tokenizer with an unlimited token length and continuous silence within a token
434 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence 440 # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence
435 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) 441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE)
436 442
437 443
438 tokens = trimmer.tokenize(asource) 444 tokens = trimmer.tokenize(asource)
439 445
440 # Make sure we only have one token 446 # Make sure we only have one token
442 448
443 trimmed_signal = ''.join(tokens[0][0]) 449 trimmed_signal = ''.join(tokens[0][0])
444 450
445 player = player_for(asource) 451 player = player_for(asource)
446 452
447 print("Playing original signal (with leading and trailing silence)...") 453 print("Playing original signal (with leading and tailing silence)...")
448 player.play(original_signal) 454 player.play(original_signal)
449 print("Playing trimmed signal...") 455 print("Playing trimmed signal...")
450 player.play(trimmed_signal) 456 player.play(trimmed_signal)
451 457
452 458
465 an over detection (echo method prints a detection where you have made no noise). 471 an over detection (echo method prints a detection where you have made no noise).
466 472
467 .. code:: python 473 .. code:: python
468 474
469 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for 475 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
470 import pyaudio
471 476
472 # record = True so that we'll be able to rewind the source. 477 # record = True so that we'll be able to rewind the source.
473 # max_time = 10: read 10 seconds from the microphone 478 # max_time = 10: read 10 seconds from the microphone
474 asource = ADSFactory.ads(record=True, max_time=10) 479 asource = ADSFactory.ads(record=True, max_time=10)
475 480
505 Contributing 510 Contributing
506 ============ 511 ============
507 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute. 512 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
508 513
509 514
510 Amine SEHILI <amine.sehili[_at_]gmail.com> 515 Amine SEHILI <amine.sehili@gmail.com>
511 September 2015 516 September 2015
512 517
513 License 518 License
514 ======= 519 =======
515 520