Mercurial > hg > auditok
comparison demos/audio_trim_demo.py @ 2:edee860b9f61
First release on Github
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Thu, 17 Sep 2015 22:01:30 +0200 |
parents | |
children | 364eeb8e8bd2 |
comparison
equal
deleted
inserted
replaced
1:78ba0ead5f9f | 2:edee860b9f61 |
---|---|
1 """ | |
2 @author: Amine SEHILI <amine.sehili@gmail.com> | |
3 September, 2015 | |
4 """ | |
5 | |
6 # Trim leading and trailing silence from a record | |
7 | |
8 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset | |
9 import pyaudio | |
10 | |
11 """ | |
12 The tokenizer in the following example is set up to remove the silence | |
13 that precedes the first acoustic activity or follows the last activity | |
14 in a record. It preserves whatever it founds between the two activities. | |
15 In other words, it removes the leading and trailing silence. | |
16 | |
17 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms | |
18 (i.e. bloc_ksize == 4410) | |
19 | |
20 Energy threshold is 50. | |
21 | |
22 The tokenizer will start accumulating windows up from the moment it encounters | |
23 the first analysis window of an energy >= 50. ALL the following windows will be | |
24 kept regardless of their energy. At the end of the analysis, it will drop trailing | |
25 windows with an energy below 50. | |
26 | |
27 This is an interesting example because the audio file we're analyzing contains a very | |
28 brief noise that occurs within the leading silence. We certainly do want our tokenizer | |
29 to stop at this point and considers whatever it comes after as a useful signal. | |
30 To force the tokenizer to ignore that brief event we use two other parameters `init_min` | |
31 ans `init_max_silence`. By `init_min`=3 and `init_max_silence`=1 we tell the tokenizer | |
32 that a valid event must start with at least 3 noisy windows, between which there | |
33 is at most 1 silent window. | |
34 | |
35 Still with this configuration we can get the tokenizer detect that noise as a valid event | |
36 (if it actually contains 3 consecutive noisy frames). To circummvent this we use an enough | |
37 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much | |
38 longer silence and hence the energy of the overall analysis window will be below 50. | |
39 | |
40 When using a shorter analysis window (of 10ms for instance, block_size == 441), the brief | |
41 noise contributes more to energy calculation which yields an energy of over 50 for the window. | |
42 Again we can deal with this situation by using a higher energy threshold (55 for example) | |
43 | |
44 """ | |
45 | |
46 | |
47 # record = True so that we'll be able to rewind the source. | |
48 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, | |
49 record=True, block_size=4410) | |
50 asource.open() | |
51 | |
52 original_signal = [] | |
53 # Read the whole signal | |
54 while True: | |
55 w = asource.read() | |
56 if w is None: | |
57 break | |
58 original_signal.append(w) | |
59 | |
60 original_signal = ''.join(original_signal) | |
61 | |
62 | |
63 # rewind source | |
64 asource.rewind() | |
65 | |
66 # Create a validator with an energy threshold of 50 | |
67 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) | |
68 | |
69 # Create a tokenizer with an unlimited token length and continuous silence within a token | |
70 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence | |
71 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, | |
72 max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE, init_min=3, init_max_silence=1) | |
73 | |
74 | |
75 tokens = trimmer.tokenize(asource) | |
76 | |
77 # Make sure we only have one token | |
78 assert len(tokens) == 1, "Should have detected one single token" | |
79 | |
80 trimmed_signal = ''.join(tokens[0][0]) | |
81 | |
82 player = player_for(asource) | |
83 | |
84 print("\n ** Playing original signal (with leading and trailing silence)...") | |
85 player.play(original_signal) | |
86 print("\n ** Playing trimmed signal...") | |
87 player.play(trimmed_signal) | |
88 | |
89 player.stop() | |
90 asource.close() |