Mercurial > hg > auditok
comparison doc/apitutorial.rst @ 331:9741b52f194a
Reformat code and documentation
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Thu, 24 Oct 2019 20:49:51 +0200 |
parents | 9e9c6b1a25b1 |
children |
comparison
equal
deleted
inserted
replaced
330:9665dc53c394 | 331:9741b52f194a |
---|---|
4 .. contents:: `Contents` | 4 .. contents:: `Contents` |
5 :depth: 3 | 5 :depth: 3 |
6 | 6 |
7 | 7 |
8 **auditok** is a module that can be used as a generic tool for data | 8 **auditok** is a module that can be used as a generic tool for data |
9 tokenization. Although its core motivation is **Acoustic Activity | 9 tokenization. Although its core motivation is **Acoustic Activity |
10 Detection** (AAD) and extraction from audio streams (i.e. detect | 10 Detection** (AAD) and extraction from audio streams (i.e. detect |
11 where a noise/an acoustic activity occurs within an audio stream and | 11 where a noise/an acoustic activity occurs within an audio stream and |
12 extract the corresponding portion of signal), it can easily be | 12 extract the corresponding portion of signal), it can easily be |
13 adapted to other tasks. | 13 adapted to other tasks. |
14 | 14 |
26 audio window (of 10, 20 or 100 milliseconds for instance) if what | 26 audio window (of 10, 20 or 100 milliseconds for instance) if what |
27 interests you are audio regions made up of a sequence of "noisy" | 27 interests you are audio regions made up of a sequence of "noisy" |
28 windows (whatever kind of noise: speech, baby cry, laughter, etc.). | 28 windows (whatever kind of noise: speech, baby cry, laughter, etc.). |
29 | 29 |
30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer` | 30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer` |
31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be | 31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be |
32 configured to detect the desired regions from a stream. | 32 configured to detect the desired regions from a stream. |
33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource` | 33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource` |
34 object that has a `read` method. Read data can be of any type accepted | 34 object that has a `read` method. Read data can be of any type accepted |
35 by the `validator`. | 35 by the `validator`. |
36 | 36 |
41 (a class that implements :class:`auditok.util.DataSource`) object, be that from: | 41 (a class that implements :class:`auditok.util.DataSource`) object, be that from: |
42 | 42 |
43 - A file on the disk | 43 - A file on the disk |
44 - A buffer of data | 44 - A buffer of data |
45 - The built-in microphone (requires PyAudio) | 45 - The built-in microphone (requires PyAudio) |
46 | 46 |
47 | 47 |
48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from | 48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from |
49 :class:`auditok.util.DataSource` and supplies a higher abstraction level | 49 :class:`auditok.util.DataSource` and supplies a higher abstraction level |
50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features: | 50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features: |
51 | 51 |
55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`). | 55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`). |
56 This can be very important if your validator use the **spectral** information of audio data | 56 This can be very important if your validator use the **spectral** information of audio data |
57 instead of raw audio samples. | 57 instead of raw audio samples. |
58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone) | 58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone) |
59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and | 59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and |
60 you want to process it many times off-line and/or save it) | 60 you want to process it many times off-line and/or save it) |
61 | 61 |
62 See :class:`auditok.util.ADSFactory` documentation for more information. | 62 See :class:`auditok.util.ADSFactory` documentation for more information. |
63 | 63 |
64 Last but not least, the current version has only one audio window validator based on | 64 Last but not least, the current version has only one audio window validator based on |
65 signal energy (:class:`auditok.util.AudioEnergyValidator). | 65 signal energy (:class:`auditok.util.AudioEnergyValidator). |
67 ********************************** | 67 ********************************** |
68 Illustrative examples with strings | 68 Illustrative examples with strings |
69 ********************************** | 69 ********************************** |
70 | 70 |
71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class | 71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class |
72 created for test and illustration purposes. Imagine that each character of | 72 created for test and illustration purposes. Imagine that each character of |
73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for | 73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for |
74 example. In the following examples we will use upper case letters to represent | 74 example. In the following examples we will use upper case letters to represent |
75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for | 75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for |
76 silent frames. | 76 silent frames. |
77 | 77 |
79 Extract sub-sequences of consecutive upper case letters | 79 Extract sub-sequences of consecutive upper case letters |
80 ####################################################### | 80 ####################################################### |
81 | 81 |
82 | 82 |
83 We want to extract sub-sequences of characters that have: | 83 We want to extract sub-sequences of characters that have: |
84 | 84 |
85 - A minimum length of 1 (`min_length` = 1) | 85 - A minimum length of 1 (`min_length` = 1) |
86 - A maximum length of 9999 (`max_length` = 9999) | 86 - A maximum length of 9999 (`max_length` = 9999) |
87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) | 87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0) |
88 | 88 |
89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the | 89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the |
90 checked character is in upper case and `False` otherwise. | 90 checked character is in upper case and `False` otherwise. |
91 | 91 |
92 .. code:: python | 92 .. code:: python |
93 | 93 |
94 from auditok import StreamTokenizer, StringDataSource, DataValidator | 94 from auditok import StreamTokenizer, StringDataSource, DataValidator |
95 | 95 |
96 class UpperCaseChecker(DataValidator): | 96 class UpperCaseChecker(DataValidator): |
97 def is_valid(self, frame): | 97 def is_valid(self, frame): |
98 return frame.isupper() | 98 return frame.isupper() |
99 | 99 |
100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc") | 100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc") |
101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
102 min_length=1, max_length=9999, max_continuous_silence=0) | 102 min_length=1, max_length=9999, max_continuous_silence=0) |
103 | 103 |
104 tokenizer.tokenize(dsource) | 104 tokenizer.tokenize(dsource) |
105 | 105 |
106 The output is a list of two tuples, each contains the extracted sub-sequence and its | 106 The output is a list of two tuples, each contains the extracted sub-sequence and its |
107 start and end position in the original sequence respectively: | 107 start and end position in the original sequence respectively: |
108 | 108 |
109 | 109 |
110 .. code:: python | 110 .. code:: python |
111 | 111 |
112 | 112 |
113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] | 113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] |
114 | 114 |
115 | 115 |
116 Tolerate up to two non-valid (lower case) letters within an extracted sequence | 116 Tolerate up to two non-valid (lower case) letters within an extracted sequence |
117 ############################################################################## | 117 ############################################################################## |
118 | 118 |
119 To do so, we set `max_continuous_silence` =2: | 119 To do so, we set `max_continuous_silence` =2: |
120 | 120 |
121 .. code:: python | 121 .. code:: python |
122 | 122 |
123 | 123 |
124 from auditok import StreamTokenizer, StringDataSource, DataValidator | 124 from auditok import StreamTokenizer, StringDataSource, DataValidator |
125 | 125 |
126 class UpperCaseChecker(DataValidator): | 126 class UpperCaseChecker(DataValidator): |
127 def is_valid(self, frame): | 127 def is_valid(self, frame): |
128 return frame.isupper() | 128 return frame.isupper() |
129 | 129 |
130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") | 130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") |
131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
132 min_length=1, max_length=9999, max_continuous_silence=2) | 132 min_length=1, max_length=9999, max_continuous_silence=2) |
133 | 133 |
134 tokenizer.tokenize(dsource) | 134 tokenizer.tokenize(dsource) |
135 | 135 |
136 | 136 |
137 output: | 137 output: |
138 | 138 |
139 .. code:: python | 139 .. code:: python |
140 | 140 |
141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] | 141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] |
142 | 142 |
143 Notice the trailing lower case letters "dd" and "ee" at the end of the two | 143 Notice the trailing lower case letters "dd" and "ee" at the end of the two |
144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing | 144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing |
145 silence* if it does not exceed `max_continuous_silence`. This can be changed | 145 silence* if it does not exceed `max_continuous_silence`. This can be changed |
146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example). | 146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example). |
147 | 147 |
155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: | 155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: |
156 | 156 |
157 .. code:: python | 157 .. code:: python |
158 | 158 |
159 from auditok import StreamTokenizer, StringDataSource, DataValidator | 159 from auditok import StreamTokenizer, StringDataSource, DataValidator |
160 | 160 |
161 class UpperCaseChecker(DataValidator): | 161 class UpperCaseChecker(DataValidator): |
162 def is_valid(self, frame): | 162 def is_valid(self, frame): |
163 return frame.isupper() | 163 return frame.isupper() |
164 | 164 |
165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") | 165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") |
166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
167 min_length=1, max_length=9999, max_continuous_silence=2, | 167 min_length=1, max_length=9999, max_continuous_silence=2, |
168 mode=StreamTokenizer.DROP_TRAILING_SILENCE) | 168 mode=StreamTokenizer.DROP_TRAILING_SILENCE) |
169 | 169 |
170 tokenizer.tokenize(dsource) | 170 tokenizer.tokenize(dsource) |
171 | 171 |
172 output: | 172 output: |
173 | 173 |
174 .. code:: python | 174 .. code:: python |
180 Limit the length of detected tokens | 180 Limit the length of detected tokens |
181 ################################### | 181 ################################### |
182 | 182 |
183 | 183 |
184 Imagine that you just want to detect and recognize a small part of a long | 184 Imagine that you just want to detect and recognize a small part of a long |
185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that | 185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that |
186 event hogs the tokenizer and prevent it from feeding the event to the next | 186 event hogs the tokenizer and prevent it from feeding the event to the next |
187 processing step (i.e. a sound recognizer). You can do this by: | 187 processing step (i.e. a sound recognizer). You can do this by: |
188 | 188 |
189 - limiting the length of a detected token. | 189 - limiting the length of a detected token. |
190 | 190 |
191 and | 191 and |
192 | 192 |
193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize` | 193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize` |
194 so that the tokenizer delivers a token as soon as it is detected. | 194 so that the tokenizer delivers a token as soon as it is detected. |
195 | 195 |
196 The following code limits the length of a token to 5: | 196 The following code limits the length of a token to 5: |
197 | 197 |
198 .. code:: python | 198 .. code:: python |
199 | 199 |
200 from auditok import StreamTokenizer, StringDataSource, DataValidator | 200 from auditok import StreamTokenizer, StringDataSource, DataValidator |
201 | 201 |
202 class UpperCaseChecker(DataValidator): | 202 class UpperCaseChecker(DataValidator): |
203 def is_valid(self, frame): | 203 def is_valid(self, frame): |
204 return frame.isupper() | 204 return frame.isupper() |
205 | 205 |
206 dsource = StringDataSource("aaaABCDEFGHIJKbbb") | 206 dsource = StringDataSource("aaaABCDEFGHIJKbbb") |
207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), | 207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(), |
208 min_length=1, max_length=5, max_continuous_silence=0) | 208 min_length=1, max_length=5, max_continuous_silence=0) |
209 | 209 |
210 def print_token(data, start, end): | 210 def print_token(data, start, end): |
211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end)) | 211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end)) |
212 | 212 |
213 tokenizer.tokenize(dsource, callback=print_token) | 213 tokenizer.tokenize(dsource, callback=print_token) |
214 | 214 |
215 | 215 |
216 output: | 216 output: |
217 | 217 |
218 .. code:: python | 218 .. code:: python |
219 | 219 |
224 | 224 |
225 ************************ | 225 ************************ |
226 `auditok` and Audio Data | 226 `auditok` and Audio Data |
227 ************************ | 227 ************************ |
228 | 228 |
229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator` | 229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator` |
230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any | 230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any |
231 further it is worth, explaining a certain number of points. | 231 further it is worth, explaining a certain number of points. |
232 | 232 |
233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource` | 233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource` |
234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference | 234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference |
235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`. | 235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`. |
236 | 236 |
237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization. | 237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization. |
238 | 238 |
239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory. | 239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory. |
240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ | 240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ |
241 depending on the passed arguments. Further details can be found in the respective method documentation. | 241 depending on the passed arguments. Further details can be found in the respective method documentation. |
242 | 242 |
243 Note however the following two calls that will create an :class:`AudioDataSource` | 243 Note however the following two calls that will create an :class:`AudioDataSource` |
244 that reads data from an audio file and from the built-in microphone respectively. | 244 that reads data from an audio file and from the built-in microphone respectively. |
245 | 245 |
246 .. code:: python | 246 .. code:: python |
247 | 247 |
248 from auditok import ADSFactory | 248 from auditok import ADSFactory |
249 | 249 |
250 # Get an AudioDataSource from a file | 250 # Get an AudioDataSource from a file |
251 # use 'filename', alias 'fn' keyword argument | 251 # use 'filename', alias 'fn' keyword argument |
252 file_ads = ADSFactory.ads(filename = "path/to/file/") | 252 file_ads = ADSFactory.ads(filename = "path/to/file/") |
253 | 253 |
254 # Get an AudioDataSource from the built-in microphone | 254 # Get an AudioDataSource from the built-in microphone |
255 # The returned object has the default values for sampling | 255 # The returned object has the default values for sampling |
256 # rate, sample width an number of channels. see method's | 256 # rate, sample width an number of channels. see method's |
257 # documentation for customized values | 257 # documentation for customized values |
258 mic_ads = ADSFactory.ads() | 258 mic_ads = ADSFactory.ads() |
259 | 259 |
260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence` | 260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence` |
261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns | 261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns |
262 one frame of data or None. | 262 one frame of data or None. |
263 | 263 |
264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window* | 264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window* |
265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). | 265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). |
266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30 | 266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30 |
267 (`int(0.3 / (10. / 1000)) == 30`). | 267 (`int(0.3 / (10. / 1000)) == 30`). |
268 | 268 |
269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an | 269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an |
270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more | 270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more |
271 data (returns what's left in stream or None). | 271 data (returns what's left in stream or None). |
272 | 272 |
273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of | 273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of |
274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms. | 274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms. |
275 | 275 |
276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling | 276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling |
277 rate of your audio source/data (file, microphone, etc.). | 277 rate of your audio source/data (file, microphone, etc.). |
278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms. | 278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms. |
279 | 279 |
280 You can use the `block_size` keyword (alias `bs`) to define your analysis window: | 280 You can use the `block_size` keyword (alias `bs`) to define your analysis window: |
281 | 281 |
282 .. code:: python | 282 .. code:: python |
283 | 283 |
284 from auditok import ADSFactory | 284 from auditok import ADSFactory |
285 | 285 |
286 ''' | 286 ''' |
287 Assume you have an audio file with a sampling rate of 16000 | 287 Assume you have an audio file with a sampling rate of 16000 |
288 ''' | 288 ''' |
289 | 289 |
290 # file_ads.read() will return blocks of 160 sample | 290 # file_ads.read() will return blocks of 160 sample |
291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160) | 291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160) |
292 | 292 |
293 # file_ads.read() will return blocks of 320 sample | 293 # file_ads.read() will return blocks of 320 sample |
294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320) | 294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320) |
295 | 295 |
296 | 296 |
297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur` | 297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur` |
298 (alias `bd`): | 298 (alias `bd`): |
299 | 299 |
300 .. code:: python | 300 .. code:: python |
301 | 301 |
302 from auditok import ADSFactory | 302 from auditok import ADSFactory |
303 # use an analysis window of 20 ms | 303 # use an analysis window of 20 ms |
304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02) | 304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02) |
305 | 305 |
306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing | 306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing |
307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to | 307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to |
308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like: | 308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like: |
309 | 309 |
310 | 310 |
311 .. code:: python | 311 .. code:: python |
312 | 312 |
313 analysis_win_seconds = 0.01 # 10 ms | 313 analysis_win_seconds = 0.01 # 10 ms |
314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds) | 314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds) |
315 analysis_window_ms = analysis_win_seconds * 1000 | 315 analysis_window_ms = analysis_win_seconds * 1000 |
316 | 316 |
317 # If you want your maximum continuous silence to be 300 ms use: | 317 # If you want your maximum continuous silence to be 300 ms use: |
318 max_continuous_silence = int(300. / analysis_window_ms) | 318 max_continuous_silence = int(300. / analysis_window_ms) |
319 | 319 |
320 # which is the same as: | 320 # which is the same as: |
321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) | 321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) |
322 | 322 |
323 # or simply: | 323 # or simply: |
324 max_continuous_silence = 30 | 324 max_continuous_silence = 30 |
325 | 325 |
326 | 326 |
327 ****************************** | 327 ****************************** |
328 Examples using real audio data | 328 Examples using real audio data |
329 ****************************** | 329 ****************************** |
330 | 330 |
331 | 331 |
332 Extract isolated phrases from an utterance | 332 Extract isolated phrases from an utterance |
333 ########################################## | 333 ########################################## |
334 | 334 |
335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from | 335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from |
336 the database. The file contains of isolated pronunciation of digits from 1 to 1 | 336 the database. The file contains of isolated pronunciation of digits from 1 to 1 |
337 in Arabic as well as breath-in/out between 2 and 3. The code will play the | 337 in Arabic as well as breath-in/out between 2 and 3. The code will play the |
338 original file then the detected sounds separately. Note that we use an | 338 original file then the detected sounds separately. Note that we use an |
339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends | 339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends |
340 on microphone quality, background noise and the amplitude of events you want to | 340 on microphone quality, background noise and the amplitude of events you want to |
341 detect. | 341 detect. |
342 | 342 |
343 .. code:: python | 343 .. code:: python |
344 | 344 |
345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset | 345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset |
346 | 346 |
347 # We set the `record` argument to True so that we can rewind the source | 347 # We set the `record` argument to True so that we can rewind the source |
348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True) | 348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True) |
349 | 349 |
350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65) | 350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65) |
351 | 351 |
352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate()) | 352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate()) |
353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms | 353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms |
354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds | 354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds |
355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms | 355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms |
356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30) | 356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30) |
357 | 357 |
358 asource.open() | 358 asource.open() |
359 tokens = tokenizer.tokenize(asource) | 359 tokens = tokenizer.tokenize(asource) |
360 | 360 |
361 # Play detected regions back | 361 # Play detected regions back |
362 | 362 |
363 player = player_for(asource) | 363 player = player_for(asource) |
364 | 364 |
365 # Rewind and read the whole signal | 365 # Rewind and read the whole signal |
366 asource.rewind() | 366 asource.rewind() |
367 original_signal = [] | 367 original_signal = [] |
368 | 368 |
369 while True: | 369 while True: |
370 w = asource.read() | 370 w = asource.read() |
371 if w is None: | 371 if w is None: |
372 break | 372 break |
373 original_signal.append(w) | 373 original_signal.append(w) |
374 | 374 |
375 original_signal = ''.join(original_signal) | 375 original_signal = ''.join(original_signal) |
376 | 376 |
377 print("Playing the original file...") | 377 print("Playing the original file...") |
378 player.play(original_signal) | 378 player.play(original_signal) |
379 | 379 |
380 print("playing detected regions...") | 380 print("playing detected regions...") |
381 for t in tokens: | 381 for t in tokens: |
382 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) | 382 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) |
383 data = ''.join(t[0]) | 383 data = ''.join(t[0]) |
384 player.play(data) | 384 player.play(data) |
385 | 385 |
386 assert len(tokens) == 8 | 386 assert len(tokens) == 8 |
387 | 387 |
388 | 388 |
389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits | 389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits |
390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed | 390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed |
391 that, in the original file, the last three digit are closer to each other than the | 391 that, in the original file, the last three digit are closer to each other than the |
392 previous ones. If you wan them to be extracted as one single phrase, you can do so | 392 previous ones. If you wan them to be extracted as one single phrase, you can do so |
393 by tolerating a larger continuous silence within a detection: | 393 by tolerating a larger continuous silence within a detection: |
394 | 394 |
395 .. code:: python | 395 .. code:: python |
396 | 396 |
397 tokenizer.max_continuous_silence = 50 | 397 tokenizer.max_continuous_silence = 50 |
398 asource.rewind() | 398 asource.rewind() |
399 tokens = tokenizer.tokenize(asource) | 399 tokens = tokenizer.tokenize(asource) |
400 | 400 |
401 for t in tokens: | 401 for t in tokens: |
402 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) | 402 print("Token starts at {0} and ends at {1}".format(t[1], t[2])) |
403 data = ''.join(t[0]) | 403 data = ''.join(t[0]) |
404 player.play(data) | 404 player.play(data) |
405 | 405 |
406 assert len(tokens) == 6 | 406 assert len(tokens) == 6 |
407 | 407 |
408 | 408 |
409 Trim leading and trailing silence | 409 Trim leading and trailing silence |
410 ################################# | 410 ################################# |
411 | 411 |
412 The tokenizer in the following example is set up to remove the silence | 412 The tokenizer in the following example is set up to remove the silence |
413 that precedes the first acoustic activity or follows the last activity | 413 that precedes the first acoustic activity or follows the last activity |
414 in a record. It preserves whatever it founds between the two activities. | 414 in a record. It preserves whatever it founds between the two activities. |
415 In other words, it removes the leading and trailing silence. | 415 In other words, it removes the leading and trailing silence. |
416 | 416 |
417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms | 417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms |
418 (i.e. block_size == 4410) | 418 (i.e. block_size == 4410) |
419 | 419 |
420 Energy threshold is 50. | 420 Energy threshold is 50. |
421 | 421 |
422 The tokenizer will start accumulating windows up from the moment it encounters | 422 The tokenizer will start accumulating windows up from the moment it encounters |
423 the first analysis window of an energy >= 50. ALL the following windows will be | 423 the first analysis window of an energy >= 50. ALL the following windows will be |
424 kept regardless of their energy. At the end of the analysis, it will drop trailing | 424 kept regardless of their energy. At the end of the analysis, it will drop trailing |
425 windows with an energy below 50. | 425 windows with an energy below 50. |
426 | 426 |
427 This is an interesting example because the audio file we're analyzing contains a very | 427 This is an interesting example because the audio file we're analyzing contains a very |
428 brief noise that occurs within the leading silence. We certainly do want our tokenizer | 428 brief noise that occurs within the leading silence. We certainly do want our tokenizer |
429 to stop at this point and considers whatever it comes after as a useful signal. | 429 to stop at this point and considers whatever it comes after as a useful signal. |
430 To force the tokenizer to ignore that brief event we use two other parameters `init_min` | 430 To force the tokenizer to ignore that brief event we use two other parameters `init_min` |
431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer | 431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer |
432 that a valid event must start with at least 3 noisy windows, between which there | 432 that a valid event must start with at least 3 noisy windows, between which there |
433 is at most 1 silent window. | 433 is at most 1 silent window. |
455 while True: | 455 while True: |
456 w = asource.read() | 456 w = asource.read() |
457 if w is None: | 457 if w is None: |
458 break | 458 break |
459 original_signal.append(w) | 459 original_signal.append(w) |
460 | 460 |
461 original_signal = ''.join(original_signal) | 461 original_signal = ''.join(original_signal) |
462 | 462 |
463 # rewind source | 463 # rewind source |
464 asource.rewind() | 464 asource.rewind() |
465 | 465 |
466 # Create a validator with an energy threshold of 50 | 466 # Create a validator with an energy threshold of 50 |
467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) | 467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) |
468 | 468 |
469 # Create a tokenizer with an unlimited token length and continuous silence within a token | 469 # Create a tokenizer with an unlimited token length and continuous silence within a token |
470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence | 470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence |
471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) | 471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) |
472 | 472 |
473 tokens = trimmer.tokenize(asource) | 473 tokens = trimmer.tokenize(asource) |
474 | 474 |
475 # Make sure we only have one token | 475 # Make sure we only have one token |
476 assert len(tokens) == 1, "Should have detected one single token" | 476 assert len(tokens) == 1, "Should have detected one single token" |
477 | 477 |
478 trimmed_signal = ''.join(tokens[0][0]) | 478 trimmed_signal = ''.join(tokens[0][0]) |
479 | 479 |
480 player = player_for(asource) | 480 player = player_for(asource) |
481 | 481 |
482 print("Playing original signal (with leading and trailing silence)...") | 482 print("Playing original signal (with leading and trailing silence)...") |
483 player.play(original_signal) | 483 player.play(original_signal) |
484 print("Playing trimmed signal...") | 484 print("Playing trimmed signal...") |
485 player.play(trimmed_signal) | 485 player.play(trimmed_signal) |
486 | 486 |
487 | 487 |
488 Online audio signal processing | 488 Online audio signal processing |
489 ############################## | 489 ############################## |
490 | 490 |
491 In the next example, audio data is directly acquired from the built-in microphone. | 491 In the next example, audio data is directly acquired from the built-in microphone. |
492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function | 492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function |
493 so that audio activities are delivered as soon as they are detected. Each detected | 493 so that audio activities are delivered as soon as they are detected. Each detected |
494 activity is played back using the build-in audio output device. | 494 activity is played back using the build-in audio output device. |
495 | 495 |
496 As mentioned before , Signal energy is strongly related to many factors such | 496 As mentioned before , Signal energy is strongly related to many factors such |
497 microphone sensitivity, background noise (including noise inherent to the hardware), | 497 microphone sensitivity, background noise (including noise inherent to the hardware), |
498 distance and your operating system sound settings. Try a lower `energy_threshold` | 498 distance and your operating system sound settings. Try a lower `energy_threshold` |
499 if your noise does not seem to be detected and a higher threshold if you notice | 499 if your noise does not seem to be detected and a higher threshold if you notice |
500 an over detection (echo method prints a detection where you have made no noise). | 500 an over detection (echo method prints a detection where you have made no noise). |
501 | 501 |
502 .. code:: python | 502 .. code:: python |
503 | 503 |
504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for | 504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for |
505 | 505 |
506 # record = True so that we'll be able to rewind the source. | 506 # record = True so that we'll be able to rewind the source. |
507 # max_time = 10: read 10 seconds from the microphone | 507 # max_time = 10: read 10 seconds from the microphone |
508 asource = ADSFactory.ads(record=True, max_time=10) | 508 asource = ADSFactory.ads(record=True, max_time=10) |
509 | 509 |
510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) | 510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) |
511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30) | 511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30) |
512 | 512 |
513 player = player_for(asource) | 513 player = player_for(asource) |
514 | 514 |
515 def echo(data, start, end): | 515 def echo(data, start, end): |
516 print("Acoustic activity at: {0}--{1}".format(start, end)) | 516 print("Acoustic activity at: {0}--{1}".format(start, end)) |
517 player.play(''.join(data)) | 517 player.play(''.join(data)) |
518 | 518 |
519 asource.open() | 519 asource.open() |
520 | 520 |
521 tokenizer.tokenize(asource, callback=echo) | 521 tokenizer.tokenize(asource, callback=echo) |
522 | 522 |
523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code: | 523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code: |
524 | 524 |
525 .. code:: python | 525 .. code:: python |
532 In case you want to play the whole recorded signal back use: | 532 In case you want to play the whole recorded signal back use: |
533 | 533 |
534 .. code:: python | 534 .. code:: python |
535 | 535 |
536 player.play(asource.get_audio_source().get_data_buffer()) | 536 player.play(asource.get_audio_source().get_data_buffer()) |
537 | 537 |
538 | 538 |
539 ************ | 539 ************ |
540 Contributing | 540 Contributing |
541 ************ | 541 ************ |
542 | 542 |