Mercurial > hg > auditok
comparison quickstart.rst @ 3:364eeb8e8bd2
README.md, typos fixes
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Tue, 22 Sep 2015 10:49:57 +0200 |
parents | edee860b9f61 |
children | 252d698ae642 |
comparison
equal
deleted
inserted
replaced
2:edee860b9f61 | 3:364eeb8e8bd2 |
---|---|
1 .. auditok documentation. | 1 .. auditok documentation. |
2 | 2 |
3 auditok, an AUDIo TOKenization module | 3 auditok, an AUDIo TOKenization module |
4 ===================================== | 4 ===================================== |
5 | |
6 .. contents:: `Contents` | |
7 :depth: 3 | |
5 | 8 |
6 | 9 |
7 **auditok** is a module that can be used as a generic tool for data | 10 **auditok** is a module that can be used as a generic tool for data |
8 tokenization. Although its core motivation is **Acoustic Activity | 11 tokenization. Although its core motivation is **Acoustic Activity |
9 Detection** (AAD) and extraction from audio streams (i.e. detect | 12 Detection** (AAD) and extraction from audio streams (i.e. detect |
24 if you are interested in sub-sequences of numeric symbols, or a silent | 27 if you are interested in sub-sequences of numeric symbols, or a silent |
25 audio window (of 10, 20 or 100 milliseconds for instance) if what | 28 audio window (of 10, 20 or 100 milliseconds for instance) if what |
26 interests you are audio regions made up of a sequence of "noisy" | 29 interests you are audio regions made up of a sequence of "noisy" |
27 windows (whatever kind of noise: speech, baby cry, laughter, etc.). | 30 windows (whatever kind of noise: speech, baby cry, laughter, etc.). |
28 | 31 |
29 The most important component of `auditok` is the `StreamTokenizer` class. | 32 The most important component of `auditok` is the `auditok.core.StreamTokenizer` |
30 An instance of this class encapsulates a `DataValidator` and can be | 33 class. An instance of this class encapsulates a `DataValidator` and can be |
31 configured to detect the desired regions from a stream. | 34 configured to detect the desired regions from a stream. |
32 The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource` | 35 The `StreamTokenizer.tokenize` method accepts a `DataSource` |
33 object that has a `read` method. Read data can be of any type accepted | 36 object that has a `read` method. Read data can be of any type accepted |
34 by the `validator`. | 37 by the `validator`. |
35 | 38 |
36 | 39 |
37 As the main aim of this module is **Audio Activity Detection**, | 40 As the main aim of this module is **Audio Activity Detection**, |
46 | 49 |
47 The `AudioDataSource` class inherits from `DataSource` and supplies | 50 The `AudioDataSource` class inherits from `DataSource` and supplies |
48 a higher abstraction level than `AudioSource` thanks to a bunch of | 51 a higher abstraction level than `AudioSource` thanks to a bunch of |
49 handy features: | 52 handy features: |
50 | 53 |
51 - Define a fixed-length of block_size (i.e. analysis window) | 54 - Define a fixed-length block_size (i.e. analysis window) |
52 - Allow overlap between two consecutive analysis windows (hop_size < block_size). | 55 - Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples. |
53 This can be very important if your validator use the **spectral** | 56 - Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone) |
54 information of audio data instead of raw audio samples. | 57 - Record and rewind data (also useful if you read data from the microphone and you want to process it many times offline and/or save it) |
55 - Limit the amount (i.e. duration) of read data (very useful when reading | |
56 data from the microphone) | |
57 - Record and rewind data (also useful if you read data from the microphone | |
58 and you want to process it many times offline and/or save it) | |
59 | 58 |
60 | 59 |
61 Last but not least, the current version has only one audio window validator based on | 60 Last but not least, the current version has only one audio window validator based on |
62 signal energy. | 61 signal energy. |
63 | 62 |
80 | 79 |
81 | 80 |
82 Extract sub-sequences of consecutive upper case letters | 81 Extract sub-sequences of consecutive upper case letters |
83 ------------------------------------------------------- | 82 ------------------------------------------------------- |
84 | 83 |
84 | |
85 We want to extract sub-sequences of characters that have: | 85 We want to extract sub-sequences of characters that have: |
86 | 86 |
87 - A minimu length of 1 (`min_length` = 1) | 87 - A minimu length of 1 (`min_length` = 1) |
88 - A maximum length of 9999 (`max_length` = 9999) | 88 - A maximum length of 9999 (`max_length` = 9999) |
89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) | 89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) |
106 tokenizer.tokenize(dsource) | 106 tokenizer.tokenize(dsource) |
107 | 107 |
108 The output is a list of two tuples, each contains the extracted sub-sequence and its | 108 The output is a list of two tuples, each contains the extracted sub-sequence and its |
109 start and end position in the original sequence respectively: | 109 start and end position in the original sequence respectively: |
110 | 110 |
111 | |
112 .. code:: python | |
113 | |
111 | 114 |
112 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] | 115 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] |
113 | 116 |
114 Tolerate up to two non-valid (lower case) letter within an extracted sequence | 117 |
115 ----------------------------------------------------------------------------- | 118 Tolerate up to two non-valid (lower case) letters within an extracted sequence |
119 ------------------------------------------------------------------------------ | |
116 | 120 |
117 To do so, we set `max_continuous_silence` =2: | 121 To do so, we set `max_continuous_silence` =2: |
118 | 122 |
119 .. code:: python | 123 .. code:: python |
120 | 124 |
136 | 140 |
137 .. code:: python | 141 .. code:: python |
138 | 142 |
139 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] | 143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] |
140 | 144 |
141 Notice the trailing lower case letters "dd" and "ee" at the end of the two | 145 Notice the tailing lower case letters "dd" and "ee" at the end of the two |
142 tokens. The default behavior of `StreamTokenizer` is to keep the *trailing | 146 tokens. The default behavior of `StreamTokenizer` is to keep the *tailing |
143 silence* if it does'nt exceed `max_continuous_silence`. This can be changed | 147 silence* if it does'nt exceed `max_continuous_silence`. This can be changed |
144 using the `DROP_TRAILING_SILENCE` mode (see next example). | 148 using the `DROP_TAILING_SILENCE` mode (see next example). |
145 | 149 |
146 Remove trailing silence | 150 Remove tailing silence |
147 ----------------------- | 151 ----------------------- |
148 | 152 |
149 Trailing silence can be useful for many sound recognition applications, including | 153 Tailing silence can be useful for many sound recognition applications, including |
150 speech recognition. Moreover, from the human auditory system point of view, trailing | 154 speech recognition. Moreover, from the human auditory system point of view, tailing |
151 low energy signal helps removing abrupt signal cuts. | 155 low energy signal helps removing abrupt signal cuts. |
152 | 156 |
153 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: | 157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`: |
154 | 158 |
155 .. code:: python | 159 .. code:: python |
156 | 160 |
157 from auditok import StreamTokenizer, StringDataSource, DataValidator | 161 from auditok import StreamTokenizer, StringDataSource, DataValidator |
158 | 162 |
161 return frame.isupper() | 165 return frame.isupper() |
162 | 166 |
163 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") | 167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") |
164 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
165 min_length=1, max_length=9999, max_continuous_silence=2, | 169 min_length=1, max_length=9999, max_continuous_silence=2, |
166 mode=StreamTokenizer.DROP_TRAILING_SILENCE) | 170 mode=StreamTokenizer.DROP_TAILING_SILENCE) |
167 | 171 |
168 tokenizer.tokenize(dsource) | 172 tokenizer.tokenize(dsource) |
169 | 173 |
170 output: | 174 output: |
171 | 175 |
172 .. code:: python | 176 .. code:: python |
173 | 177 |
174 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)] | 178 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)] |
175 | 179 |
176 | 180 |
181 | |
177 Limit the length of detected tokens | 182 Limit the length of detected tokens |
178 ----------------------------------- | 183 ----------------------------------- |
184 | |
179 | 185 |
180 Imagine that you just want to detect and recognize a small part of a long | 186 Imagine that you just want to detect and recognize a small part of a long |
181 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that | 187 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that |
182 event hogs the tokenizer and prevent it from feeding the event to the next | 188 event hogs the tokenizer and prevent it from feeding the event to the next |
183 processing step (i.e. a sound recognizer). You can do this by: | 189 processing step (i.e. a sound recognizer). You can do this by: |
209 tokenizer.tokenize(dsource, callback=print_token) | 215 tokenizer.tokenize(dsource, callback=print_token) |
210 | 216 |
211 | 217 |
212 output: | 218 output: |
213 | 219 |
220 .. code:: python | |
221 | |
214 "token = 'ABCDE', starts at 3, ends at 7" | 222 "token = 'ABCDE', starts at 3, ends at 7" |
215 "token = 'FGHIJ', starts at 8, ends at 12" | 223 "token = 'FGHIJ', starts at 8, ends at 12" |
216 "token = 'K', starts at 13, ends at 13" | 224 "token = 'K', starts at 13, ends at 13" |
225 | |
217 | 226 |
218 | 227 |
219 Using real audio data | 228 Using real audio data |
220 ===================== | 229 ===================== |
221 | 230 |
286 max_continuous_silence = int(300. / analysis_window_ms) | 295 max_continuous_silence = int(300. / analysis_window_ms) |
287 | 296 |
288 # Which is the same as | 297 # Which is the same as |
289 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) | 298 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) |
290 | 299 |
291 | 300 |
292 Examples | |
293 -------- | |
294 | 301 |
295 Extract isolated phrases from an utterance | 302 Extract isolated phrases from an utterance |
296 ------------------------------------------ | 303 ------------------------------------------ |
297 | 304 |
298 We will build an `AudioDataSource` using a wave file from the database. | 305 We will build an `AudioDataSource` using a wave file from the database. |
367 player.play(data) | 374 player.play(data) |
368 | 375 |
369 assert len(tokens) == 6 | 376 assert len(tokens) == 6 |
370 | 377 |
371 | 378 |
372 Trim leading and trailing silence | 379 Trim leading and tailing silence |
373 --------------------------------- | 380 --------------------------------- |
374 | 381 |
375 The tokenizer in the following example is set up to remove the silence | 382 The tokenizer in the following example is set up to remove the silence |
376 that precedes the first acoustic activity or follows the last activity | 383 that precedes the first acoustic activity or follows the last activity |
377 in a record. It preserves whatever it founds between the two activities. | 384 in a record. It preserves whatever it founds between the two activities. |
378 In other words, it removes the leading and trailing silence. | 385 In other words, it removes the leading and tailing silence. |
379 | 386 |
380 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms | 387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms |
381 (i.e. bloc_ksize == 4410) | 388 (i.e. bloc_ksize == 4410) |
382 | 389 |
383 Energy threshold is 50. | 390 Energy threshold is 50. |
384 | 391 |
385 The tokenizer will start accumulating windows up from the moment it encounters | 392 The tokenizer will start accumulating windows up from the moment it encounters |
386 the first analysis window of an energy >= 50. ALL the following windows will be | 393 the first analysis window of an energy >= 50. ALL the following windows will be |
387 kept regardless of their energy. At the end of the analysis, it will drop trailing | 394 kept regardless of their energy. At the end of the analysis, it will drop tailing |
388 windows with an energy below 50. | 395 windows with an energy below 50. |
389 | 396 |
390 This is an interesting example because the audio file we're analyzing contains a very | 397 This is an interesting example because the audio file we're analyzing contains a very |
391 brief noise that occurs within the leading silence. We certainly do want our tokenizer | 398 brief noise that occurs within the leading silence. We certainly do want our tokenizer |
392 to stop at this point and considers whatever it comes after as a useful signal. | 399 to stop at this point and considers whatever it comes after as a useful signal. |
405 Again we can deal with this situation by using a higher energy threshold (55 for example). | 412 Again we can deal with this situation by using a higher energy threshold (55 for example). |
406 | 413 |
407 .. code:: python | 414 .. code:: python |
408 | 415 |
409 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset | 416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset |
410 import pyaudio | |
411 | 417 |
412 # record = True so that we'll be able to rewind the source. | 418 # record = True so that we'll be able to rewind the source. |
413 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, | 419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence, |
414 record=True, block_size=4410) | 420 record=True, block_size=4410) |
415 asource.open() | 421 asource.open() |
416 | 422 |
417 original_signal = [] | 423 original_signal = [] |
418 # Read the whole signal | 424 # Read the whole signal |
429 | 435 |
430 # Create a validator with an energy threshold of 50 | 436 # Create a validator with an energy threshold of 50 |
431 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) | 437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) |
432 | 438 |
433 # Create a tokenizer with an unlimited token length and continuous silence within a token | 439 # Create a tokenizer with an unlimited token length and continuous silence within a token |
434 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence | 440 # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence |
435 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) | 441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE) |
436 | 442 |
437 | 443 |
438 tokens = trimmer.tokenize(asource) | 444 tokens = trimmer.tokenize(asource) |
439 | 445 |
440 # Make sure we only have one token | 446 # Make sure we only have one token |
442 | 448 |
443 trimmed_signal = ''.join(tokens[0][0]) | 449 trimmed_signal = ''.join(tokens[0][0]) |
444 | 450 |
445 player = player_for(asource) | 451 player = player_for(asource) |
446 | 452 |
447 print("Playing original signal (with leading and trailing silence)...") | 453 print("Playing original signal (with leading and tailing silence)...") |
448 player.play(original_signal) | 454 player.play(original_signal) |
449 print("Playing trimmed signal...") | 455 print("Playing trimmed signal...") |
450 player.play(trimmed_signal) | 456 player.play(trimmed_signal) |
451 | 457 |
452 | 458 |
465 an over detection (echo method prints a detection where you have made no noise). | 471 an over detection (echo method prints a detection where you have made no noise). |
466 | 472 |
467 .. code:: python | 473 .. code:: python |
468 | 474 |
469 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for | 475 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for |
470 import pyaudio | |
471 | 476 |
472 # record = True so that we'll be able to rewind the source. | 477 # record = True so that we'll be able to rewind the source. |
473 # max_time = 10: read 10 seconds from the microphone | 478 # max_time = 10: read 10 seconds from the microphone |
474 asource = ADSFactory.ads(record=True, max_time=10) | 479 asource = ADSFactory.ads(record=True, max_time=10) |
475 | 480 |
505 Contributing | 510 Contributing |
506 ============ | 511 ============ |
507 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute. | 512 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute. |
508 | 513 |
509 | 514 |
510 Amine SEHILI <amine.sehili[_at_]gmail.com> | 515 Amine SEHILI <amine.sehili@gmail.com> |
511 September 2015 | 516 September 2015 |
512 | 517 |
513 License | 518 License |
514 ======= | 519 ======= |
515 | 520 |