Mercurial > hg > auditok
comparison quickstart.rst @ 23:2beb3fb562f3
doc update
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Sun, 29 Nov 2015 11:52:56 +0100 |
parents | 6b2cc3ca5b6a |
children |
comparison
equal
deleted
inserted
replaced
22:aceb9bc3d74e | 23:2beb3fb562f3 |
---|---|
140 | 140 |
141 .. code:: python | 141 .. code:: python |
142 | 142 |
143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] | 143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] |
144 | 144 |
145 Notice the tailing lower case letters "dd" and "ee" at the end of the two | 145 Notice the trailing lower case letters "dd" and "ee" at the end of the two |
146 tokens. The default behavior of `StreamTokenizer` is to keep the *tailing | 146 tokens. The default behavior of `StreamTokenizer` is to keep the *trailing |
147 silence* if it doesn't exceed `max_continuous_silence`. This can be changed | 147 silence* if it doesn't exceed `max_continuous_silence`. This can be changed |
148 using the `DROP_TAILING_SILENCE` mode (see next example). | 148 using the `DROP_TRAILING_SILENCE` mode (see next example). |
149 | 149 |
150 Remove tailing silence | 150 Remove trailing silence |
151 ----------------------- | 151 ----------------------- |
152 | 152 |
153 Tailing silence can be useful for many sound recognition applications, including | 153 Trailing silence can be useful for many sound recognition applications, including |
154 speech recognition. Moreover, from the human auditory system point of view, tailing | 154 speech recognition. Moreover, from the human auditory system point of view, trailing |
155 low energy signal helps removing abrupt signal cuts. | 155 low energy signal helps removing abrupt signal cuts. |
156 | 156 |
157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`: | 157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: |
158 | 158 |
159 .. code:: python | 159 .. code:: python |
160 | 160 |
161 from auditok import StreamTokenizer, StringDataSource, DataValidator | 161 from auditok import StreamTokenizer, StringDataSource, DataValidator |
162 | 162 |
165 return frame.isupper() | 165 return frame.isupper() |
166 | 166 |
167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") | 167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") |
168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
169 min_length=1, max_length=9999, max_continuous_silence=2, | 169 min_length=1, max_length=9999, max_continuous_silence=2, |
170 mode=StreamTokenizer.DROP_TAILING_SILENCE) | 170 mode=StreamTokenizer.DROP_TRAILING_SILENCE) |
171 | 171 |
172 tokenizer.tokenize(dsource) | 172 tokenizer.tokenize(dsource) |
173 | 173 |
174 output: | 174 output: |
175 | 175 |
374 player.play(data) | 374 player.play(data) |
375 | 375 |
376 assert len(tokens) == 6 | 376 assert len(tokens) == 6 |
377 | 377 |
378 | 378 |
379 Trim leading and tailing silence | 379 Trim leading and trailing silence |
380 --------------------------------- | 380 --------------------------------- |
381 | 381 |
382 The tokenizer in the following example is set up to remove the silence | 382 The tokenizer in the following example is set up to remove the silence |
383 that precedes the first acoustic activity or follows the last activity | 383 that precedes the first acoustic activity or follows the last activity |
384 in a record. It preserves whatever it founds between the two activities. | 384 in a record. It preserves whatever it founds between the two activities. |
385 In other words, it removes the leading and tailing silence. | 385 In other words, it removes the leading and trailing silence. |
386 | 386 |
387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms | 387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms |
388 (i.e. block_size == 4410) | 388 (i.e. block_size == 4410) |
389 | 389 |
390 Energy threshold is 50. | 390 Energy threshold is 50. |
391 | 391 |
392 The tokenizer will start accumulating windows up from the moment it encounters | 392 The tokenizer will start accumulating windows up from the moment it encounters |
393 the first analysis window of an energy >= 50. ALL the following windows will be | 393 the first analysis window of an energy >= 50. ALL the following windows will be |
394 kept regardless of their energy. At the end of the analysis, it will drop tailing | 394 kept regardless of their energy. At the end of the analysis, it will drop trailing |
395 windows with an energy below 50. | 395 windows with an energy below 50. |
396 | 396 |
397 This is an interesting example because the audio file we're analyzing contains a very | 397 This is an interesting example because the audio file we're analyzing contains a very |
398 brief noise that occurs within the leading silence. We certainly do want our tokenizer | 398 brief noise that occurs within the leading silence. We certainly do want our tokenizer |
399 to stop at this point and considers whatever it comes after as a useful signal. | 399 to stop at this point and considers whatever it comes after as a useful signal. |
414 .. code:: python | 414 .. code:: python |
415 | 415 |
416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset | 416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset |
417 | 417 |
418 # record = True so that we'll be able to rewind the source. | 418 # record = True so that we'll be able to rewind the source. |
419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence, | 419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, |
420 record=True, block_size=4410) | 420 record=True, block_size=4410) |
421 asource.open() | 421 asource.open() |
422 | 422 |
423 original_signal = [] | 423 original_signal = [] |
424 # Read the whole signal | 424 # Read the whole signal |
435 | 435 |
436 # Create a validator with an energy threshold of 50 | 436 # Create a validator with an energy threshold of 50 |
437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) | 437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) |
438 | 438 |
439 # Create a tokenizer with an unlimited token length and continuous silence within a token | 439 # Create a tokenizer with an unlimited token length and continuous silence within a token |
440 # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence | 440 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence |
441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE) | 441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) |
442 | 442 |
443 | 443 |
444 tokens = trimmer.tokenize(asource) | 444 tokens = trimmer.tokenize(asource) |
445 | 445 |
446 # Make sure we only have one token | 446 # Make sure we only have one token |
448 | 448 |
449 trimmed_signal = ''.join(tokens[0][0]) | 449 trimmed_signal = ''.join(tokens[0][0]) |
450 | 450 |
451 player = player_for(asource) | 451 player = player_for(asource) |
452 | 452 |
453 print("Playing original signal (with leading and tailing silence)...") | 453 print("Playing original signal (with leading and trailing silence)...") |
454 player.play(original_signal) | 454 player.play(original_signal) |
455 print("Playing trimmed signal...") | 455 print("Playing trimmed signal...") |
456 player.play(trimmed_signal) | 456 player.play(trimmed_signal) |
457 | 457 |
458 | 458 |