annotate doc/apitutorial.rst @ 218:41e2ce69d4f4

Update AudioDataSource tests with valid block_dur values
author Amine Sehili <amine.sehili@gmail.com>
date Sat, 06 Jul 2019 11:32:11 +0100
parents 9e9c6b1a25b1
children 9741b52f194a
rev   line source
amine@32 1 `auditok` API Tutorial
amine@32 2 ======================
amine@32 3
amine@32 4 .. contents:: `Contents`
amine@32 5 :depth: 3
amine@32 6
amine@32 7
amine@32 8 **auditok** is a module that can be used as a generic tool for data
amine@32 9 tokenization. Although its core motivation is **Acoustic Activity
amine@32 10 Detection** (AAD) and extraction from audio streams (i.e. detect
amine@32 11 where a noise/an acoustic activity occurs within an audio stream and
amine@32 12 extract the corresponding portion of signal), it can easily be
amine@32 13 adapted to other tasks.
amine@32 14
amine@32 15 Globally speaking, it can be used to extract, from a sequence of
amine@32 16 observations, all sub-sequences that meet a certain number of
amine@32 17 criteria in terms of:
amine@32 18
amine@32 19 1. Minimum length of a **valid** token (i.e. sub-sequence)
amine@35 20 2. Maximum length of a **valid** token
amine@32 21 3. Maximum tolerated consecutive **non-valid** observations within
amine@32 22 a valid token
amine@32 23
amine@32 24 Examples of a non-valid observation are: a non-numeric ascii symbol
amine@32 25 if you are interested in sub-sequences of numeric symbols, or a silent
amine@32 26 audio window (of 10, 20 or 100 milliseconds for instance) if what
amine@32 27 interests you are audio regions made up of a sequence of "noisy"
amine@32 28 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
amine@32 29
amine@32 30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer`
amine@32 31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be
amine@32 32 configured to detect the desired regions from a stream.
amine@32 33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource`
amine@32 34 object that has a `read` method. Read data can be of any type accepted
amine@32 35 by the `validator`.
amine@32 36
amine@32 37
amine@32 38 As the main aim of this module is **Audio Activity Detection**,
amine@32 39 it provides the :class:`auditok.util.ADSFactory` factory class that makes
amine@32 40 it very easy to create an :class:`auditok.util.ADSFactory.AudioDataSource`
amine@32 41 (a class that implements :class:`auditok.util.DataSource`) object, be that from:
amine@32 42
amine@32 43 - A file on the disk
amine@32 44 - A buffer of data
amine@32 45 - The built-in microphone (requires PyAudio)
amine@32 46
amine@32 47
amine@32 48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from
amine@32 49 :class:`auditok.util.DataSource` and supplies a higher abstraction level
amine@32 50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features:
amine@32 51
amine@32 52 - Define a fixed-length `block_size` (alias `bs`, i.e. analysis window)
amine@32 53 - Alternatively, use `block_dur` (duration in seconds, alias `bd`)
amine@32 54 - Allow overlap between two consecutive analysis windows
amine@32 55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`).
amine@32 56 This can be very important if your validator use the **spectral** information of audio data
amine@32 57 instead of raw audio samples.
amine@32 58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone)
amine@32 59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and
amine@32 60 you want to process it many times off-line and/or save it)
amine@32 61
amine@32 62 See :class:`auditok.util.ADSFactory` documentation for more information.
amine@32 63
amine@32 64 Last but not least, the current version has only one audio window validator based on
amine@32 65 signal energy (:class:`auditok.util.AudioEnergyValidator).
amine@32 66
amine@32 67 **********************************
amine@32 68 Illustrative examples with strings
amine@32 69 **********************************
amine@32 70
amine@32 71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class
amine@32 72 created for test and illustration purposes. Imagine that each character of
amine@33 73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for
amine@32 74 example. In the following examples we will use upper case letters to represent
amine@32 75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
amine@32 76 silent frames.
amine@32 77
amine@32 78
amine@32 79 Extract sub-sequences of consecutive upper case letters
amine@32 80 #######################################################
amine@32 81
amine@32 82
amine@32 83 We want to extract sub-sequences of characters that have:
amine@32 84
amine@32 85 - A minimum length of 1 (`min_length` = 1)
amine@32 86 - A maximum length of 9999 (`max_length` = 9999)
amine@32 87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
amine@32 88
amine@32 89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the
amine@32 90 checked character is in upper case and `False` otherwise.
amine@32 91
amine@32 92 .. code:: python
amine@32 93
amine@32 94 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32 95
amine@32 96 class UpperCaseChecker(DataValidator):
amine@32 97 def is_valid(self, frame):
amine@32 98 return frame.isupper()
amine@32 99
amine@32 100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
amine@32 101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32 102 min_length=1, max_length=9999, max_continuous_silence=0)
amine@32 103
amine@32 104 tokenizer.tokenize(dsource)
amine@32 105
amine@32 106 The output is a list of two tuples, each contains the extracted sub-sequence and its
amine@32 107 start and end position in the original sequence respectively:
amine@32 108
amine@32 109
amine@32 110 .. code:: python
amine@32 111
amine@32 112
amine@32 113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
amine@32 114
amine@32 115
amine@32 116 Tolerate up to two non-valid (lower case) letters within an extracted sequence
amine@32 117 ##############################################################################
amine@32 118
amine@32 119 To do so, we set `max_continuous_silence` =2:
amine@32 120
amine@32 121 .. code:: python
amine@32 122
amine@32 123
amine@32 124 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32 125
amine@32 126 class UpperCaseChecker(DataValidator):
amine@32 127 def is_valid(self, frame):
amine@32 128 return frame.isupper()
amine@32 129
amine@32 130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@32 131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32 132 min_length=1, max_length=9999, max_continuous_silence=2)
amine@32 133
amine@32 134 tokenizer.tokenize(dsource)
amine@32 135
amine@32 136
amine@32 137 output:
amine@32 138
amine@32 139 .. code:: python
amine@32 140
amine@32 141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
amine@32 142
amine@32 143 Notice the trailing lower case letters "dd" and "ee" at the end of the two
amine@32 144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing
amine@35 145 silence* if it does not exceed `max_continuous_silence`. This can be changed
amine@32 146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example).
amine@32 147
amine@32 148 Remove trailing silence
amine@32 149 #######################
amine@32 150
amine@32 151 Trailing silence can be useful for many sound recognition applications, including
amine@32 152 speech recognition. Moreover, from the human auditory system point of view, trailing
amine@32 153 low energy signal helps removing abrupt signal cuts.
amine@32 154
amine@32 155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
amine@32 156
amine@32 157 .. code:: python
amine@32 158
amine@32 159 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32 160
amine@32 161 class UpperCaseChecker(DataValidator):
amine@32 162 def is_valid(self, frame):
amine@32 163 return frame.isupper()
amine@32 164
amine@32 165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@32 166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32 167 min_length=1, max_length=9999, max_continuous_silence=2,
amine@32 168 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
amine@32 169
amine@32 170 tokenizer.tokenize(dsource)
amine@32 171
amine@32 172 output:
amine@32 173
amine@32 174 .. code:: python
amine@32 175
amine@32 176 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
amine@32 177
amine@32 178
amine@32 179
amine@32 180 Limit the length of detected tokens
amine@32 181 ###################################
amine@32 182
amine@32 183
amine@32 184 Imagine that you just want to detect and recognize a small part of a long
amine@32 185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
amine@32 186 event hogs the tokenizer and prevent it from feeding the event to the next
amine@32 187 processing step (i.e. a sound recognizer). You can do this by:
amine@32 188
amine@32 189 - limiting the length of a detected token.
amine@32 190
amine@32 191 and
amine@32 192
amine@32 193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize`
amine@32 194 so that the tokenizer delivers a token as soon as it is detected.
amine@32 195
amine@32 196 The following code limits the length of a token to 5:
amine@32 197
amine@32 198 .. code:: python
amine@32 199
amine@32 200 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32 201
amine@32 202 class UpperCaseChecker(DataValidator):
amine@32 203 def is_valid(self, frame):
amine@32 204 return frame.isupper()
amine@32 205
amine@32 206 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
amine@32 207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32 208 min_length=1, max_length=5, max_continuous_silence=0)
amine@32 209
amine@32 210 def print_token(data, start, end):
amine@32 211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
amine@32 212
amine@32 213 tokenizer.tokenize(dsource, callback=print_token)
amine@32 214
amine@32 215
amine@32 216 output:
amine@32 217
amine@32 218 .. code:: python
amine@32 219
amine@32 220 "token = 'ABCDE', starts at 3, ends at 7"
amine@32 221 "token = 'FGHIJ', starts at 8, ends at 12"
amine@32 222 "token = 'K', starts at 13, ends at 13"
amine@32 223
amine@32 224
amine@32 225 ************************
amine@32 226 `auditok` and Audio Data
amine@32 227 ************************
amine@32 228
amine@35 229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator`
amine@35 230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any
amine@32 231 further it is worth, explaining a certain number of points.
amine@32 232
amine@35 233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource`
amine@35 234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference
amine@35 235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`.
amine@35 236
amine@35 237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization.
amine@35 238
amine@35 239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory.
amine@35 240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ
amine@35 241 depending on the passed arguments. Further details can be found in the respective method documentation.
amine@35 242
amine@35 243 Note however the following two calls that will create an :class:`AudioDataSource`
amine@35 244 that reads data from an audio file and from the built-in microphone respectively.
amine@32 245
amine@32 246 .. code:: python
amine@32 247
amine@32 248 from auditok import ADSFactory
amine@32 249
amine@32 250 # Get an AudioDataSource from a file
amine@35 251 # use 'filename', alias 'fn' keyword argument
amine@32 252 file_ads = ADSFactory.ads(filename = "path/to/file/")
amine@32 253
amine@32 254 # Get an AudioDataSource from the built-in microphone
amine@32 255 # The returned object has the default values for sampling
amine@32 256 # rate, sample width an number of channels. see method's
amine@32 257 # documentation for customized values
amine@32 258 mic_ads = ADSFactory.ads()
amine@32 259
amine@35 260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
amine@35 261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns
amine@35 262 one frame of data or None.
amine@32 263
amine@35 264 If you want a `max_length` of 2 seconds for your detected sound events and your *analysis window*
amine@35 265 is *10 ms* long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`).
amine@35 266 For a `max_continuous_silence` of *300 ms* for instance, the value to pass to StreamTokenizer is 30
amine@35 267 (`int(0.3 / (10. / 1000)) == 30`).
amine@32 268
amine@35 269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an
amine@35 270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more
amine@35 271 data (returns what's left in stream or None).
amine@32 272
amine@35 273 This fixed-length amount of data is referred here to as **analysis window** and is a parameter of
amine@35 274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms.
amine@32 275
amine@35 276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling
amine@35 277 rate of your audio source/data (file, microphone, etc.).
amine@32 278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
amine@35 279
amine@35 280 You can use the `block_size` keyword (alias `bs`) to define your analysis window:
amine@32 281
amine@32 282 .. code:: python
amine@32 283
amine@32 284 from auditok import ADSFactory
amine@32 285
amine@35 286 '''
amine@35 287 Assume you have an audio file with a sampling rate of 16000
amine@35 288 '''
amine@35 289
amine@35 290 # file_ads.read() will return blocks of 160 sample
amine@32 291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
amine@32 292
amine@35 293 # file_ads.read() will return blocks of 320 sample
amine@35 294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320)
amine@32 295
amine@35 296
amine@35 297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur`
amine@35 298 (alias `bd`):
amine@32 299
amine@32 300 .. code:: python
amine@32 301
amine@35 302 from auditok import ADSFactory
amine@35 303 # use an analysis window of 20 ms
amine@35 304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02)
amine@35 305
amine@35 306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing
amine@35 307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to
amine@35 308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like:
amine@35 309
amine@35 310
amine@35 311 .. code:: python
amine@35 312
amine@35 313 analysis_win_seconds = 0.01 # 10 ms
amine@35 314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds)
amine@32 315 analysis_window_ms = analysis_win_seconds * 1000
amine@32 316
amine@35 317 # If you want your maximum continuous silence to be 300 ms use:
amine@32 318 max_continuous_silence = int(300. / analysis_window_ms)
amine@32 319
amine@35 320 # which is the same as:
amine@32 321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
amine@32 322
amine@35 323 # or simply:
amine@35 324 max_continuous_silence = 30
amine@35 325
amine@32 326
amine@32 327 ******************************
amine@32 328 Examples using real audio data
amine@32 329 ******************************
amine@32 330
amine@32 331
amine@32 332 Extract isolated phrases from an utterance
amine@32 333 ##########################################
amine@32 334
amine@32 335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from
amine@32 336 the database. The file contains of isolated pronunciation of digits from 1 to 1
amine@32 337 in Arabic as well as breath-in/out between 2 and 3. The code will play the
amine@32 338 original file then the detected sounds separately. Note that we use an
amine@32 339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
amine@32 340 on microphone quality, background noise and the amplitude of events you want to
amine@32 341 detect.
amine@32 342
amine@32 343 .. code:: python
amine@32 344
amine@32 345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@32 346
amine@32 347 # We set the `record` argument to True so that we can rewind the source
amine@32 348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
amine@32 349
amine@32 350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
amine@32 351
hoelzl@61 352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
amine@32 353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
amine@32 354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
amine@32 355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
amine@32 356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
amine@32 357
amine@32 358 asource.open()
amine@32 359 tokens = tokenizer.tokenize(asource)
amine@32 360
amine@32 361 # Play detected regions back
amine@32 362
amine@32 363 player = player_for(asource)
amine@32 364
amine@32 365 # Rewind and read the whole signal
amine@32 366 asource.rewind()
amine@32 367 original_signal = []
amine@32 368
amine@32 369 while True:
amine@32 370 w = asource.read()
amine@32 371 if w is None:
amine@32 372 break
amine@32 373 original_signal.append(w)
amine@32 374
amine@32 375 original_signal = ''.join(original_signal)
amine@32 376
amine@32 377 print("Playing the original file...")
amine@32 378 player.play(original_signal)
amine@32 379
amine@32 380 print("playing detected regions...")
amine@32 381 for t in tokens:
amine@32 382 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@32 383 data = ''.join(t[0])
amine@32 384 player.play(data)
amine@32 385
amine@32 386 assert len(tokens) == 8
amine@32 387
amine@32 388
amine@32 389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
amine@32 390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
amine@32 391 that, in the original file, the last three digit are closer to each other than the
amine@32 392 previous ones. If you wan them to be extracted as one single phrase, you can do so
amine@32 393 by tolerating a larger continuous silence within a detection:
amine@32 394
amine@32 395 .. code:: python
amine@32 396
amine@32 397 tokenizer.max_continuous_silence = 50
amine@32 398 asource.rewind()
amine@32 399 tokens = tokenizer.tokenize(asource)
amine@32 400
amine@32 401 for t in tokens:
amine@32 402 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@32 403 data = ''.join(t[0])
amine@32 404 player.play(data)
amine@32 405
amine@32 406 assert len(tokens) == 6
amine@32 407
amine@32 408
amine@32 409 Trim leading and trailing silence
amine@32 410 #################################
amine@32 411
amine@32 412 The tokenizer in the following example is set up to remove the silence
amine@32 413 that precedes the first acoustic activity or follows the last activity
amine@32 414 in a record. It preserves whatever it founds between the two activities.
amine@32 415 In other words, it removes the leading and trailing silence.
amine@32 416
amine@32 417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
amine@32 418 (i.e. block_size == 4410)
amine@32 419
amine@32 420 Energy threshold is 50.
amine@32 421
amine@32 422 The tokenizer will start accumulating windows up from the moment it encounters
amine@32 423 the first analysis window of an energy >= 50. ALL the following windows will be
amine@32 424 kept regardless of their energy. At the end of the analysis, it will drop trailing
amine@32 425 windows with an energy below 50.
amine@32 426
amine@32 427 This is an interesting example because the audio file we're analyzing contains a very
amine@32 428 brief noise that occurs within the leading silence. We certainly do want our tokenizer
amine@32 429 to stop at this point and considers whatever it comes after as a useful signal.
amine@32 430 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
amine@48 431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
amine@32 432 that a valid event must start with at least 3 noisy windows, between which there
amine@32 433 is at most 1 silent window.
amine@32 434
amine@32 435 Still with this configuration we can get the tokenizer detect that noise as a valid event
amine@32 436 (if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
amine@32 437 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
amine@32 438 longer silence and hence the energy of the overall analysis window will be below 50.
amine@32 439
amine@35 440 When using a shorter analysis window (of 10 ms for instance, block_size == 441), the brief
amine@32 441 noise contributes more to energy calculation which yields an energy of over 50 for the window.
amine@32 442 Again we can deal with this situation by using a higher energy threshold (55 for example).
amine@32 443
amine@32 444 .. code:: python
amine@32 445
amine@32 446 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@32 447
amine@32 448 # record = True so that we'll be able to rewind the source.
amine@32 449 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
amine@32 450 record=True, block_size=4410)
amine@32 451 asource.open()
amine@32 452
amine@32 453 original_signal = []
amine@32 454 # Read the whole signal
amine@32 455 while True:
amine@32 456 w = asource.read()
amine@32 457 if w is None:
amine@32 458 break
amine@32 459 original_signal.append(w)
amine@32 460
amine@32 461 original_signal = ''.join(original_signal)
amine@32 462
amine@32 463 # rewind source
amine@32 464 asource.rewind()
amine@32 465
amine@32 466 # Create a validator with an energy threshold of 50
amine@32 467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@32 468
amine@32 469 # Create a tokenizer with an unlimited token length and continuous silence within a token
amine@32 470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
amine@32 471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
amine@32 472
amine@32 473 tokens = trimmer.tokenize(asource)
amine@32 474
amine@32 475 # Make sure we only have one token
amine@32 476 assert len(tokens) == 1, "Should have detected one single token"
amine@32 477
amine@32 478 trimmed_signal = ''.join(tokens[0][0])
amine@32 479
amine@32 480 player = player_for(asource)
amine@32 481
amine@32 482 print("Playing original signal (with leading and trailing silence)...")
amine@32 483 player.play(original_signal)
amine@32 484 print("Playing trimmed signal...")
amine@32 485 player.play(trimmed_signal)
amine@32 486
amine@32 487
amine@32 488 Online audio signal processing
amine@32 489 ##############################
amine@32 490
amine@32 491 In the next example, audio data is directly acquired from the built-in microphone.
amine@32 492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function
amine@32 493 so that audio activities are delivered as soon as they are detected. Each detected
amine@32 494 activity is played back using the build-in audio output device.
amine@32 495
amine@32 496 As mentioned before , Signal energy is strongly related to many factors such
amine@32 497 microphone sensitivity, background noise (including noise inherent to the hardware),
amine@32 498 distance and your operating system sound settings. Try a lower `energy_threshold`
amine@32 499 if your noise does not seem to be detected and a higher threshold if you notice
amine@32 500 an over detection (echo method prints a detection where you have made no noise).
amine@32 501
amine@32 502 .. code:: python
amine@32 503
amine@32 504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
amine@32 505
amine@32 506 # record = True so that we'll be able to rewind the source.
amine@32 507 # max_time = 10: read 10 seconds from the microphone
amine@32 508 asource = ADSFactory.ads(record=True, max_time=10)
amine@32 509
amine@32 510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@32 511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
amine@32 512
amine@32 513 player = player_for(asource)
amine@32 514
amine@32 515 def echo(data, start, end):
amine@32 516 print("Acoustic activity at: {0}--{1}".format(start, end))
amine@32 517 player.play(''.join(data))
amine@32 518
amine@32 519 asource.open()
amine@32 520
amine@32 521 tokenizer.tokenize(asource, callback=echo)
amine@32 522
amine@32 523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
amine@32 524
amine@32 525 .. code:: python
amine@32 526
amine@32 527 asource.rewind()
amine@32 528 # change energy threshold for example
amine@32 529 tokenizer.validator.set_energy_threshold(55)
amine@32 530 tokenizer.tokenize(asource, callback=echo)
amine@32 531
amine@32 532 In case you want to play the whole recorded signal back use:
amine@32 533
amine@32 534 .. code:: python
amine@32 535
amine@32 536 player.play(asource.get_audio_source().get_data_buffer())
amine@32 537
amine@32 538
amine@32 539 ************
amine@32 540 Contributing
amine@32 541 ************
amine@32 542
amine@32 543 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
amine@32 544
amine@32 545
amine@32 546 Amine SEHILI <amine.sehili@gmail.com>
amine@32 547 September 2015
amine@32 548
amine@32 549 *******
amine@32 550 License
amine@32 551 *******
amine@32 552
amine@32 553 This package is published under GNU GPL Version 3.