auditok: doc/apitutorial.rst annotate

annotate doc/apitutorial.rst @ 218:41e2ce69d4f4

Update AudioDataSource tests with valid block_dur values

author	Amine Sehili <amine.sehili@gmail.com>
date	Sat, 06 Jul 2019 11:32:11 +0100
parents	9e9c6b1a25b1
children	9741b52f194a

rev	line source
amine@32	1 `auditok` API Tutorial
amine@32	2 ======================
amine@32	3
amine@32	4 .. contents:: `Contents`
amine@32	5 :depth: 3
amine@32	6
amine@32	7
amine@32	8 auditok is a module that can be used as a generic tool for data
amine@32	9 tokenization. Although its core motivation is **Acoustic Activity
amine@32	10 Detection** (AAD) and extraction from audio streams (i.e. detect
amine@32	11 where a noise/an acoustic activity occurs within an audio stream and
amine@32	12 extract the corresponding portion of signal), it can easily be
amine@32	13 adapted to other tasks.
amine@32	14
amine@32	15 Globally speaking, it can be used to extract, from a sequence of
amine@32	16 observations, all sub-sequences that meet a certain number of
amine@32	17 criteria in terms of:
amine@32	18
amine@32	19 1. Minimum length of a valid token (i.e. sub-sequence)
amine@35	20 2. Maximum length of a valid token
amine@32	21 3. Maximum tolerated consecutive non-valid observations within
amine@32	22 a valid token
amine@32	23
amine@32	24 Examples of a non-valid observation are: a non-numeric ascii symbol
amine@32	25 if you are interested in sub-sequences of numeric symbols, or a silent
amine@32	26 audio window (of 10, 20 or 100 milliseconds for instance) if what
amine@32	27 interests you are audio regions made up of a sequence of "noisy"
amine@32	28 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
amine@32	29
amine@32	30 The most important component of `auditok` is the :class:`auditok.core.StreamTokenizer`
amine@32	31 class. An instance of this class encapsulates a :class:`auditok.util.DataValidator` and can be
amine@32	32 configured to detect the desired regions from a stream.
amine@32	33 The :func:`auditok.core.StreamTokenizer.tokenize` method accepts a :class:`auditok.util.DataSource`
amine@32	34 object that has a `read` method. Read data can be of any type accepted
amine@32	35 by the `validator`.
amine@32	36
amine@32	37
amine@32	38 As the main aim of this module is Audio Activity Detection,
amine@32	39 it provides the :class:`auditok.util.ADSFactory` factory class that makes
amine@32	40 it very easy to create an :class:`auditok.util.ADSFactory.AudioDataSource`
amine@32	41 (a class that implements :class:`auditok.util.DataSource`) object, be that from:
amine@32	42
amine@32	43 - A file on the disk
amine@32	44 - A buffer of data
amine@32	45 - The built-in microphone (requires PyAudio)
amine@32	46
amine@32	47
amine@32	48 The :class:`auditok.util.ADSFactory.AudioDataSource` class inherits from
amine@32	49 :class:`auditok.util.DataSource` and supplies a higher abstraction level
amine@32	50 than :class:`auditok.io.AudioSource` thanks to a bunch of handy features:
amine@32	51
amine@32	52 - Define a fixed-length `block_size` (alias `bs`, i.e. analysis window)
amine@32	53 - Alternatively, use `block_dur` (duration in seconds, alias `bd`)
amine@32	54 - Allow overlap between two consecutive analysis windows
amine@32	55 (if one of `hop_size` , `hs` or `hop_dur` , `hd` keywords is used and is > 0 and < `block_size`).
amine@32	56 This can be very important if your validator use the spectral information of audio data
amine@32	57 instead of raw audio samples.
amine@32	58 - Limit the amount (i.e. duration) of read data (if keyword `max_time` or `mt` is used, very useful when reading data from the microphone)
amine@32	59 - Record all read data and rewind if necessary (if keyword `record` or `rec` , also useful if you read data from the microphone and
amine@32	60 you want to process it many times off-line and/or save it)
amine@32	61
amine@32	62 See :class:`auditok.util.ADSFactory` documentation for more information.
amine@32	63
amine@32	64 Last but not least, the current version has only one audio window validator based on
amine@32	65 signal energy (:class:`auditok.util.AudioEnergyValidator).
amine@32	66
amine@32	67 **********************************
amine@32	68 Illustrative examples with strings
amine@32	69 **********************************
amine@32	70
amine@32	71 Let us look at some examples using the :class:`auditok.util.StringDataSource` class
amine@32	72 created for test and illustration purposes. Imagine that each character of
amine@33	73 :class:`auditok.util.StringDataSource` data represents an audio slice of 100 ms for
amine@32	74 example. In the following examples we will use upper case letters to represent
amine@32	75 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
amine@32	76 silent frames.
amine@32	77
amine@32	78
amine@32	79 Extract sub-sequences of consecutive upper case letters
amine@32	80 #######################################################
amine@32	81
amine@32	82
amine@32	83 We want to extract sub-sequences of characters that have:
amine@32	84
amine@32	85 - A minimum length of 1 (`min_length` = 1)
amine@32	86 - A maximum length of 9999 (`max_length` = 9999)
amine@32	87 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
amine@32	88
amine@32	89 We also create the `UpperCaseChecker` with a `read` method that returns `True` if the
amine@32	90 checked character is in upper case and `False` otherwise.
amine@32	91
amine@32	92 .. code:: python
amine@32	93
amine@32	94 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32	95
amine@32	96 class UpperCaseChecker(DataValidator):
amine@32	97 def is_valid(self, frame):
amine@32	98 return frame.isupper()
amine@32	99
amine@32	100 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
amine@32	101 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32	102 min_length=1, max_length=9999, max_continuous_silence=0)
amine@32	103
amine@32	104 tokenizer.tokenize(dsource)
amine@32	105
amine@32	106 The output is a list of two tuples, each contains the extracted sub-sequence and its
amine@32	107 start and end position in the original sequence respectively:
amine@32	108
amine@32	109
amine@32	110 .. code:: python
amine@32	111
amine@32	112
amine@32	113 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
amine@32	114
amine@32	115
amine@32	116 Tolerate up to two non-valid (lower case) letters within an extracted sequence
amine@32	117 ##############################################################################
amine@32	118
amine@32	119 To do so, we set `max_continuous_silence` =2:
amine@32	120
amine@32	121 .. code:: python
amine@32	122
amine@32	123
amine@32	124 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32	125
amine@32	126 class UpperCaseChecker(DataValidator):
amine@32	127 def is_valid(self, frame):
amine@32	128 return frame.isupper()
amine@32	129
amine@32	130 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@32	131 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32	132 min_length=1, max_length=9999, max_continuous_silence=2)
amine@32	133
amine@32	134 tokenizer.tokenize(dsource)
amine@32	135
amine@32	136
amine@32	137 output:
amine@32	138
amine@32	139 .. code:: python
amine@32	140
amine@32	141 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
amine@32	142
amine@32	143 Notice the trailing lower case letters "dd" and "ee" at the end of the two
amine@32	144 tokens. The default behavior of :class:`auditok.core.StreamTokenizer` is to keep the *trailing
amine@35	145 silence* if it does not exceed `max_continuous_silence`. This can be changed
amine@32	146 using the `StreamTokenizer.DROP_TRAILING_SILENCE` mode (see next example).
amine@32	147
amine@32	148 Remove trailing silence
amine@32	149 #######################
amine@32	150
amine@32	151 Trailing silence can be useful for many sound recognition applications, including
amine@32	152 speech recognition. Moreover, from the human auditory system point of view, trailing
amine@32	153 low energy signal helps removing abrupt signal cuts.
amine@32	154
amine@32	155 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
amine@32	156
amine@32	157 .. code:: python
amine@32	158
amine@32	159 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32	160
amine@32	161 class UpperCaseChecker(DataValidator):
amine@32	162 def is_valid(self, frame):
amine@32	163 return frame.isupper()
amine@32	164
amine@32	165 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@32	166 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32	167 min_length=1, max_length=9999, max_continuous_silence=2,
amine@32	168 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
amine@32	169
amine@32	170 tokenizer.tokenize(dsource)
amine@32	171
amine@32	172 output:
amine@32	173
amine@32	174 .. code:: python
amine@32	175
amine@32	176 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
amine@32	177
amine@32	178
amine@32	179
amine@32	180 Limit the length of detected tokens
amine@32	181 ###################################
amine@32	182
amine@32	183
amine@32	184 Imagine that you just want to detect and recognize a small part of a long
amine@32	185 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
amine@32	186 event hogs the tokenizer and prevent it from feeding the event to the next
amine@32	187 processing step (i.e. a sound recognizer). You can do this by:
amine@32	188
amine@32	189 - limiting the length of a detected token.
amine@32	190
amine@32	191 and
amine@32	192
amine@32	193 - using a callback function as an argument to :class:`auditok.core.StreamTokenizer.tokenize`
amine@32	194 so that the tokenizer delivers a token as soon as it is detected.
amine@32	195
amine@32	196 The following code limits the length of a token to 5:
amine@32	197
amine@32	198 .. code:: python
amine@32	199
amine@32	200 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@32	201
amine@32	202 class UpperCaseChecker(DataValidator):
amine@32	203 def is_valid(self, frame):
amine@32	204 return frame.isupper()
amine@32	205
amine@32	206 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
amine@32	207 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@32	208 min_length=1, max_length=5, max_continuous_silence=0)
amine@32	209
amine@32	210 def print_token(data, start, end):
amine@32	211 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
amine@32	212
amine@32	213 tokenizer.tokenize(dsource, callback=print_token)
amine@32	214
amine@32	215
amine@32	216 output:
amine@32	217
amine@32	218 .. code:: python
amine@32	219
amine@32	220 "token = 'ABCDE', starts at 3, ends at 7"
amine@32	221 "token = 'FGHIJ', starts at 8, ends at 12"
amine@32	222 "token = 'K', starts at 13, ends at 13"
amine@32	223
amine@32	224
amine@32	225 ************************
amine@32	226 `auditok` and Audio Data
amine@32	227 ************************
amine@32	228
amine@35	229 In the rest of this document we will use :class:`auditok.util.ADSFactory`, :class:`auditok.util.AudioEnergyValidator`
amine@35	230 and :class:`auditok.core.StreamTokenizer` for Audio Activity Detection demos using audio data. Before we get any
amine@32	231 further it is worth, explaining a certain number of points.
amine@32	232
amine@35	233 :func:`auditok.util.ADSFactory.ads` method is used to create an :class:`auditok.util.ADSFactory.AudioDataSource`
amine@35	234 object either from a wave file, the built-in microphone or a user-supplied data buffer. Refer to the API reference
amine@35	235 for more information and examples on :func:`ADSFactory.ads` and :class:`AudioDataSource`.
amine@35	236
amine@35	237 The created :class:`AudioDataSource` object is then passed to :func:`StreamTokenizer.tokenize` for tokenization.
amine@35	238
amine@35	239 :func:`auditok.util.ADSFactory.ads` accepts a number of keyword arguments, of which none is mandatory.
amine@35	240 The returned :class:`AudioDataSource` object's features and behavior can however greatly differ
amine@35	241 depending on the passed arguments. Further details can be found in the respective method documentation.
amine@35	242
amine@35	243 Note however the following two calls that will create an :class:`AudioDataSource`
amine@35	244 that reads data from an audio file and from the built-in microphone respectively.
amine@32	245
amine@32	246 .. code:: python
amine@32	247
amine@32	248 from auditok import ADSFactory
amine@32	249
amine@32	250 # Get an AudioDataSource from a file
amine@35	251 # use 'filename', alias 'fn' keyword argument
amine@32	252 file_ads = ADSFactory.ads(filename = "path/to/file/")
amine@32	253
amine@32	254 # Get an AudioDataSource from the built-in microphone
amine@32	255 # The returned object has the default values for sampling
amine@32	256 # rate, sample width an number of channels. see method's
amine@32	257 # documentation for customized values
amine@32	258 mic_ads = ADSFactory.ads()
amine@32	259
amine@35	260 For :class:`StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
amine@35	261 are expressed in terms of number of frames. Each call to :func:`AudioDataSource.read` returns
amine@35	262 one frame of data or None.
amine@32	263
amine@35	264 If you want a `max_length` of 2 seconds for your detected sound events and your analysis window
amine@35	265 is 10 ms long, you have to specify a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`).
amine@35	266 For a `max_continuous_silence` of 300 ms for instance, the value to pass to StreamTokenizer is 30
amine@35	267 (`int(0.3 / (10. / 1000)) == 30`).
amine@32	268
amine@35	269 Each time :class:`StreamTkenizer` calls the :func:`read` (has no argument) method of an
amine@35	270 :class:`AudioDataSource` object, it returns the same amount of data, except if there are no more
amine@35	271 data (returns what's left in stream or None).
amine@32	272
amine@35	273 This fixed-length amount of data is referred here to as analysis window and is a parameter of
amine@35	274 :func:`ADSFactory.ads` method. By default :func:`ADSFactory.ads` uses an analysis window of 10 ms.
amine@32	275
amine@35	276 The number of samples that 10 ms of audio data contain will vary, depending on the sampling
amine@35	277 rate of your audio source/data (file, microphone, etc.).
amine@32	278 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
amine@35	279
amine@35	280 You can use the `block_size` keyword (alias `bs`) to define your analysis window:
amine@32	281
amine@32	282 .. code:: python
amine@32	283
amine@32	284 from auditok import ADSFactory
amine@32	285
amine@35	286 '''
amine@35	287 Assume you have an audio file with a sampling rate of 16000
amine@35	288 '''
amine@35	289
amine@35	290 # file_ads.read() will return blocks of 160 sample
amine@32	291 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
amine@32	292
amine@35	293 # file_ads.read() will return blocks of 320 sample
amine@35	294 file_ads = ADSFactory.ads(filename = "path/to/file/", bs = 320)
amine@32	295
amine@35	296
amine@35	297 Fortunately, you can specify the size of your analysis window in seconds, thanks to keyword `block_dur`
amine@35	298 (alias `bd`):
amine@32	299
amine@32	300 .. code:: python
amine@32	301
amine@35	302 from auditok import ADSFactory
amine@35	303 # use an analysis window of 20 ms
amine@35	304 file_ads = ADSFactory.ads(filename = "path/to/file/", bd = 0.02)
amine@35	305
amine@35	306 For :class:`StreamTkenizer`, each :func:`read` call that does not return `None` is treated as a processing
amine@35	307 frame. :class:`StreamTkenizer` has no way to figure out the temporal length of that frame (why sould it?). So to
amine@35	308 correctly initialize your :class:`StreamTokenizer`, based on your analysis window duration, use something like:
amine@35	309
amine@35	310
amine@35	311 .. code:: python
amine@35	312
amine@35	313 analysis_win_seconds = 0.01 # 10 ms
amine@35	314 my_ads = ADSFactory.ads(block_dur = analysis_win_seconds)
amine@32	315 analysis_window_ms = analysis_win_seconds * 1000
amine@32	316
amine@35	317 # If you want your maximum continuous silence to be 300 ms use:
amine@32	318 max_continuous_silence = int(300. / analysis_window_ms)
amine@32	319
amine@35	320 # which is the same as:
amine@32	321 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
amine@32	322
amine@35	323 # or simply:
amine@35	324 max_continuous_silence = 30
amine@35	325
amine@32	326
amine@32	327 ******************************
amine@32	328 Examples using real audio data
amine@32	329 ******************************
amine@32	330
amine@32	331
amine@32	332 Extract isolated phrases from an utterance
amine@32	333 ##########################################
amine@32	334
amine@32	335 We will build an :class:`auditok.util.ADSFactory.AudioDataSource` using a wave file from
amine@32	336 the database. The file contains of isolated pronunciation of digits from 1 to 1
amine@32	337 in Arabic as well as breath-in/out between 2 and 3. The code will play the
amine@32	338 original file then the detected sounds separately. Note that we use an
amine@32	339 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
amine@32	340 on microphone quality, background noise and the amplitude of events you want to
amine@32	341 detect.
amine@32	342
amine@32	343 .. code:: python
amine@32	344
amine@32	345 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@32	346
amine@32	347 # We set the `record` argument to True so that we can rewind the source
amine@32	348 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
amine@32	349
amine@32	350 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
amine@32	351
hoelzl@61	352 # Default analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
amine@32	353 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
amine@32	354 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
amine@32	355 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
amine@32	356 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
amine@32	357
amine@32	358 asource.open()
amine@32	359 tokens = tokenizer.tokenize(asource)
amine@32	360
amine@32	361 # Play detected regions back
amine@32	362
amine@32	363 player = player_for(asource)
amine@32	364
amine@32	365 # Rewind and read the whole signal
amine@32	366 asource.rewind()
amine@32	367 original_signal = []
amine@32	368
amine@32	369 while True:
amine@32	370 w = asource.read()
amine@32	371 if w is None:
amine@32	372 break
amine@32	373 original_signal.append(w)
amine@32	374
amine@32	375 original_signal = ''.join(original_signal)
amine@32	376
amine@32	377 print("Playing the original file...")
amine@32	378 player.play(original_signal)
amine@32	379
amine@32	380 print("playing detected regions...")
amine@32	381 for t in tokens:
amine@32	382 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@32	383 data = ''.join(t[0])
amine@32	384 player.play(data)
amine@32	385
amine@32	386 assert len(tokens) == 8
amine@32	387
amine@32	388
amine@32	389 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
amine@32	390 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
amine@32	391 that, in the original file, the last three digit are closer to each other than the
amine@32	392 previous ones. If you wan them to be extracted as one single phrase, you can do so
amine@32	393 by tolerating a larger continuous silence within a detection:
amine@32	394
amine@32	395 .. code:: python
amine@32	396
amine@32	397 tokenizer.max_continuous_silence = 50
amine@32	398 asource.rewind()
amine@32	399 tokens = tokenizer.tokenize(asource)
amine@32	400
amine@32	401 for t in tokens:
amine@32	402 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@32	403 data = ''.join(t[0])
amine@32	404 player.play(data)
amine@32	405
amine@32	406 assert len(tokens) == 6
amine@32	407
amine@32	408
amine@32	409 Trim leading and trailing silence
amine@32	410 #################################
amine@32	411
amine@32	412 The tokenizer in the following example is set up to remove the silence
amine@32	413 that precedes the first acoustic activity or follows the last activity
amine@32	414 in a record. It preserves whatever it founds between the two activities.
amine@32	415 In other words, it removes the leading and trailing silence.
amine@32	416
amine@32	417 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
amine@32	418 (i.e. block_size == 4410)
amine@32	419
amine@32	420 Energy threshold is 50.
amine@32	421
amine@32	422 The tokenizer will start accumulating windows up from the moment it encounters
amine@32	423 the first analysis window of an energy >= 50. ALL the following windows will be
amine@32	424 kept regardless of their energy. At the end of the analysis, it will drop trailing
amine@32	425 windows with an energy below 50.
amine@32	426
amine@32	427 This is an interesting example because the audio file we're analyzing contains a very
amine@32	428 brief noise that occurs within the leading silence. We certainly do want our tokenizer
amine@32	429 to stop at this point and considers whatever it comes after as a useful signal.
amine@32	430 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
amine@48	431 and `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
amine@32	432 that a valid event must start with at least 3 noisy windows, between which there
amine@32	433 is at most 1 silent window.
amine@32	434
amine@32	435 Still with this configuration we can get the tokenizer detect that noise as a valid event
amine@32	436 (if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
amine@32	437 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
amine@32	438 longer silence and hence the energy of the overall analysis window will be below 50.
amine@32	439
amine@35	440 When using a shorter analysis window (of 10 ms for instance, block_size == 441), the brief
amine@32	441 noise contributes more to energy calculation which yields an energy of over 50 for the window.
amine@32	442 Again we can deal with this situation by using a higher energy threshold (55 for example).
amine@32	443
amine@32	444 .. code:: python
amine@32	445
amine@32	446 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@32	447
amine@32	448 # record = True so that we'll be able to rewind the source.
amine@32	449 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
amine@32	450 record=True, block_size=4410)
amine@32	451 asource.open()
amine@32	452
amine@32	453 original_signal = []
amine@32	454 # Read the whole signal
amine@32	455 while True:
amine@32	456 w = asource.read()
amine@32	457 if w is None:
amine@32	458 break
amine@32	459 original_signal.append(w)
amine@32	460
amine@32	461 original_signal = ''.join(original_signal)
amine@32	462
amine@32	463 # rewind source
amine@32	464 asource.rewind()
amine@32	465
amine@32	466 # Create a validator with an energy threshold of 50
amine@32	467 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@32	468
amine@32	469 # Create a tokenizer with an unlimited token length and continuous silence within a token
amine@32	470 # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
amine@32	471 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
amine@32	472
amine@32	473 tokens = trimmer.tokenize(asource)
amine@32	474
amine@32	475 # Make sure we only have one token
amine@32	476 assert len(tokens) == 1, "Should have detected one single token"
amine@32	477
amine@32	478 trimmed_signal = ''.join(tokens[0][0])
amine@32	479
amine@32	480 player = player_for(asource)
amine@32	481
amine@32	482 print("Playing original signal (with leading and trailing silence)...")
amine@32	483 player.play(original_signal)
amine@32	484 print("Playing trimmed signal...")
amine@32	485 player.play(trimmed_signal)
amine@32	486
amine@32	487
amine@32	488 Online audio signal processing
amine@32	489 ##############################
amine@32	490
amine@32	491 In the next example, audio data is directly acquired from the built-in microphone.
amine@32	492 The :func:`auditok.core.StreamTokenizer.tokenize` method is passed a callback function
amine@32	493 so that audio activities are delivered as soon as they are detected. Each detected
amine@32	494 activity is played back using the build-in audio output device.
amine@32	495
amine@32	496 As mentioned before , Signal energy is strongly related to many factors such
amine@32	497 microphone sensitivity, background noise (including noise inherent to the hardware),
amine@32	498 distance and your operating system sound settings. Try a lower `energy_threshold`
amine@32	499 if your noise does not seem to be detected and a higher threshold if you notice
amine@32	500 an over detection (echo method prints a detection where you have made no noise).
amine@32	501
amine@32	502 .. code:: python
amine@32	503
amine@32	504 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
amine@32	505
amine@32	506 # record = True so that we'll be able to rewind the source.
amine@32	507 # max_time = 10: read 10 seconds from the microphone
amine@32	508 asource = ADSFactory.ads(record=True, max_time=10)
amine@32	509
amine@32	510 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@32	511 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
amine@32	512
amine@32	513 player = player_for(asource)
amine@32	514
amine@32	515 def echo(data, start, end):
amine@32	516 print("Acoustic activity at: {0}--{1}".format(start, end))
amine@32	517 player.play(''.join(data))
amine@32	518
amine@32	519 asource.open()
amine@32	520
amine@32	521 tokenizer.tokenize(asource, callback=echo)
amine@32	522
amine@32	523 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
amine@32	524
amine@32	525 .. code:: python
amine@32	526
amine@32	527 asource.rewind()
amine@32	528 # change energy threshold for example
amine@32	529 tokenizer.validator.set_energy_threshold(55)
amine@32	530 tokenizer.tokenize(asource, callback=echo)
amine@32	531
amine@32	532 In case you want to play the whole recorded signal back use:
amine@32	533
amine@32	534 .. code:: python
amine@32	535
amine@32	536 player.play(asource.get_audio_source().get_data_buffer())
amine@32	537
amine@32	538
amine@32	539 ************
amine@32	540 Contributing
amine@32	541 ************
amine@32	542
amine@32	543 auditok is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
amine@32	544
amine@32	545
amine@32	546 Amine SEHILI <amine.sehili@gmail.com>
amine@32	547 September 2015
amine@32	548
amine@32	549 *******
amine@32	550 License
amine@32	551 *******
amine@32	552
amine@32	553 This package is published under GNU GPL Version 3.

Mercurial > hg > auditok

annotate doc/apitutorial.rst @ 218:41e2ce69d4f4