auditok: quickstart.rst annotate

annotate quickstart.rst @ 5:252d698ae642

Version 1.3, bug and typos fixes

author	Amine Sehili <amine.sehili@gmail.com>
date	Wed, 23 Sep 2015 11:26:58 +0200
parents	364eeb8e8bd2
children	6b2cc3ca5b6a

rev	line source
amine@2	1 .. auditok documentation.
amine@2	2
amine@2	3 auditok, an AUDIo TOKenization module
amine@2	4 =====================================
amine@2	5
amine@3	6 .. contents:: `Contents`
amine@3	7 :depth: 3
amine@3	8
amine@2	9
amine@2	10 auditok is a module that can be used as a generic tool for data
amine@2	11 tokenization. Although its core motivation is **Acoustic Activity
amine@2	12 Detection** (AAD) and extraction from audio streams (i.e. detect
amine@2	13 where a noise/an acoustic activity occurs within an audio stream and
amine@2	14 extract the corresponding portion of signal), it can easily be
amine@2	15 adapted to other tasks.
amine@2	16
amine@2	17 Globally speaking, it can be used to extract, from a sequence of
amine@2	18 observations, all sub-sequences that meet a certain number of
amine@2	19 criteria in terms of:
amine@2	20
amine@2	21 1. Minimum length of a valid token (i.e. sub-sequence)
amine@2	22 2. Maximum length of a valid token
amine@2	23 3. Maximum tolerated consecutive non-valid observations within
amine@2	24 a valid token
amine@2	25
amine@2	26 Examples of a non-valid observation are: a non-numeric ascii symbol
amine@2	27 if you are interested in sub-sequences of numeric symbols, or a silent
amine@2	28 audio window (of 10, 20 or 100 milliseconds for instance) if what
amine@2	29 interests you are audio regions made up of a sequence of "noisy"
amine@2	30 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
amine@2	31
amine@3	32 The most important component of `auditok` is the `auditok.core.StreamTokenizer`
amine@3	33 class. An instance of this class encapsulates a `DataValidator` and can be
amine@2	34 configured to detect the desired regions from a stream.
amine@3	35 The `StreamTokenizer.tokenize` method accepts a `DataSource`
amine@2	36 object that has a `read` method. Read data can be of any type accepted
amine@2	37 by the `validator`.
amine@2	38
amine@2	39
amine@2	40 As the main aim of this module is Audio Activity Detection,
amine@2	41 it provides the `auditok.util.ADSFactory` factory class that makes
amine@2	42 it very easy to create an `AudioDataSource` (a class that implements `DataSource`)
amine@2	43 object, be that from:
amine@2	44
amine@2	45 - A file on the disk
amine@2	46 - A buffer of data
amine@2	47 - The built-in microphone (requires PyAudio)
amine@2	48
amine@2	49
amine@2	50 The `AudioDataSource` class inherits from `DataSource` and supplies
amine@2	51 a higher abstraction level than `AudioSource` thanks to a bunch of
amine@2	52 handy features:
amine@2	53
amine@3	54 - Define a fixed-length block_size (i.e. analysis window)
amine@3	55 - Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the spectral information of audio data instead of raw audio samples.
amine@3	56 - Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone)
amine@5	57 - Record and rewind data (also useful if you read data from the microphone and you want to process it many times off-line and/or save it)
amine@2	58
amine@2	59
amine@2	60 Last but not least, the current version has only one audio window validator based on
amine@2	61 signal energy.
amine@2	62
amine@2	63 Requirements
amine@2	64 ============
amine@2	65
amine@2	66 `auditok` requires `Pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
amine@2	67 for audio acquisition and playback.
amine@2	68
amine@2	69
amine@2	70 Illustrative examples with strings
amine@2	71 ==================================
amine@2	72
amine@2	73 Let us look at some examples using the `auditok.util.StringDataSource` class
amine@2	74 created for test and illustration purposes. Imagine that each character of
amine@2	75 `auditok.util.StringDataSource` data represent an audio slice of 100 ms for
amine@2	76 example. In the following examples we will use upper case letters to represent
amine@2	77 noisy audio slices (i.e. analysis windows or frames) and lower case letter for
amine@2	78 silent frames.
amine@2	79
amine@2	80
amine@2	81 Extract sub-sequences of consecutive upper case letters
amine@2	82 -------------------------------------------------------
amine@2	83
amine@3	84
amine@2	85 We want to extract sub-sequences of characters that have:
amine@2	86
amine@5	87 - A minimum length of 1 (`min_length` = 1)
amine@2	88 - A maximum length of 9999 (`max_length` = 9999)
amine@2	89 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
amine@2	90
amine@2	91 We also create the `UpperCaseChecker` whose `read` method returns `True` if the
amine@2	92 checked character is in upper case and `False` otherwise.
amine@2	93
amine@2	94 .. code:: python
amine@2	95
amine@2	96 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@2	97
amine@2	98 class UpperCaseChecker(DataValidator):
amine@2	99 def is_valid(self, frame):
amine@2	100 return frame.isupper()
amine@2	101
amine@2	102 dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
amine@2	103 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@2	104 min_length=1, max_length=9999, max_continuous_silence=0)
amine@2	105
amine@2	106 tokenizer.tokenize(dsource)
amine@2	107
amine@2	108 The output is a list of two tuples, each contains the extracted sub-sequence and its
amine@2	109 start and end position in the original sequence respectively:
amine@2	110
amine@3	111
amine@3	112 .. code:: python
amine@3	113
amine@2	114
amine@2	115 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
amine@2	116
amine@3	117
amine@3	118 Tolerate up to two non-valid (lower case) letters within an extracted sequence
amine@3	119 ------------------------------------------------------------------------------
amine@2	120
amine@2	121 To do so, we set `max_continuous_silence` =2:
amine@2	122
amine@2	123 .. code:: python
amine@2	124
amine@2	125
amine@2	126 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@2	127
amine@2	128 class UpperCaseChecker(DataValidator):
amine@2	129 def is_valid(self, frame):
amine@2	130 return frame.isupper()
amine@2	131
amine@2	132 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@2	133 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@2	134 min_length=1, max_length=9999, max_continuous_silence=2)
amine@2	135
amine@2	136 tokenizer.tokenize(dsource)
amine@2	137
amine@2	138
amine@2	139 output:
amine@2	140
amine@2	141 .. code:: python
amine@2	142
amine@2	143 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
amine@2	144
amine@3	145 Notice the tailing lower case letters "dd" and "ee" at the end of the two
amine@3	146 tokens. The default behavior of `StreamTokenizer` is to keep the *tailing
amine@5	147 silence* if it doesn't exceed `max_continuous_silence`. This can be changed
amine@3	148 using the `DROP_TAILING_SILENCE` mode (see next example).
amine@2	149
amine@3	150 Remove tailing silence
amine@2	151 -----------------------
amine@2	152
amine@3	153 Tailing silence can be useful for many sound recognition applications, including
amine@3	154 speech recognition. Moreover, from the human auditory system point of view, tailing
amine@2	155 low energy signal helps removing abrupt signal cuts.
amine@2	156
amine@3	157 If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`:
amine@2	158
amine@2	159 .. code:: python
amine@2	160
amine@2	161 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@2	162
amine@2	163 class UpperCaseChecker(DataValidator):
amine@2	164 def is_valid(self, frame):
amine@2	165 return frame.isupper()
amine@2	166
amine@2	167 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
amine@2	168 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@2	169 min_length=1, max_length=9999, max_continuous_silence=2,
amine@3	170 mode=StreamTokenizer.DROP_TAILING_SILENCE)
amine@2	171
amine@2	172 tokenizer.tokenize(dsource)
amine@2	173
amine@2	174 output:
amine@2	175
amine@2	176 .. code:: python
amine@2	177
amine@2	178 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
amine@2	179
amine@2	180
amine@3	181
amine@2	182 Limit the length of detected tokens
amine@2	183 -----------------------------------
amine@2	184
amine@3	185
amine@2	186 Imagine that you just want to detect and recognize a small part of a long
amine@2	187 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
amine@2	188 event hogs the tokenizer and prevent it from feeding the event to the next
amine@2	189 processing step (i.e. a sound recognizer). You can do this by:
amine@2	190
amine@2	191 - limiting the length of a detected token.
amine@2	192
amine@2	193 and
amine@2	194
amine@2	195 - using a callback function as an argument to `StreamTokenizer.tokenize`
amine@2	196 so that the tokenizer delivers a token as soon as it is detected.
amine@2	197
amine@2	198 The following code limits the length of a token to 5:
amine@2	199
amine@2	200 .. code:: python
amine@2	201
amine@2	202 from auditok import StreamTokenizer, StringDataSource, DataValidator
amine@2	203
amine@2	204 class UpperCaseChecker(DataValidator):
amine@2	205 def is_valid(self, frame):
amine@2	206 return frame.isupper()
amine@2	207
amine@2	208 dsource = StringDataSource("aaaABCDEFGHIJKbbb")
amine@2	209 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
amine@2	210 min_length=1, max_length=5, max_continuous_silence=0)
amine@2	211
amine@2	212 def print_token(data, start, end):
amine@2	213 print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
amine@2	214
amine@2	215 tokenizer.tokenize(dsource, callback=print_token)
amine@2	216
amine@2	217
amine@2	218 output:
amine@2	219
amine@3	220 .. code:: python
amine@3	221
amine@2	222 "token = 'ABCDE', starts at 3, ends at 7"
amine@2	223 "token = 'FGHIJ', starts at 8, ends at 12"
amine@2	224 "token = 'K', starts at 13, ends at 13"
amine@2	225
amine@2	226
amine@3	227
amine@2	228 Using real audio data
amine@2	229 =====================
amine@2	230
amine@2	231 In this section we will use `ADSFactory`, `AudioEnergyValidator` and `StreamTokenizer`
amine@2	232 for an AAD demonstration using audio data. Before we get any, further it is worth
amine@2	233 explaining a certain number of points.
amine@2	234
amine@2	235 `ADSFactory.ads` method is called to create an `AudioDataSource` object that can be
amine@2	236 passed to `StreamTokenizer.tokenize`. `ADSFactory.ads` accepts a number of keyword
amine@2	237 arguments, of which none is mandatory. The returned `AudioDataSource` object can
amine@2	238 however greatly differ depending on the passed arguments. Further details can be found
amine@2	239 in the respective method documentation. Note however the following two calls that will
amine@2	240 create an `AudioDataSource` that read data from an audio file and from the built-in
amine@2	241 microphone respectively.
amine@2	242
amine@2	243 .. code:: python
amine@2	244
amine@2	245 from auditok import ADSFactory
amine@2	246
amine@2	247 # Get an AudioDataSource from a file
amine@2	248 file_ads = ADSFactory.ads(filename = "path/to/file/")
amine@2	249
amine@2	250 # Get an AudioDataSource from the built-in microphone
amine@2	251 # The returned object has the default values for sampling
amine@2	252 # rate, sample width an number of channels. see method's
amine@2	253 # documentation for customized values
amine@2	254 mic_ads = ADSFactory.ads()
amine@2	255
amine@2	256 For `StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
amine@2	257 are expressed in term of number of frames. If you want a `max_length` of 2 seconds for
amine@2	258 your detected sound events and your analysis window is 10 ms long, you have to specify
amine@2	259 a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). For a `max_continuous_silence` of 300 ms
amine@2	260 for instance, the value to pass to StreamTokenizer is 30 (`int(0.3 / (10. / 1000)) == 30`).
amine@2	261
amine@2	262
amine@2	263 Where do you get the size of the analysis window from?
amine@2	264
amine@2	265
amine@2	266 Well this is a parameter you pass to `ADSFactory.ads`. By default `ADSFactory.ads` uses
amine@2	267 an analysis window of 10 ms. the number of samples that 10 ms of signal contain will
amine@2	268 vary depending on the sampling rate of your audio source (file, microphone, etc.).
amine@2	269 For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
amine@2	270 Therefore you can use block sizes of 160, 320, 1600 for analysis windows of 10, 20 and 100
amine@2	271 ms respectively.
amine@2	272
amine@2	273 .. code:: python
amine@2	274
amine@2	275 from auditok import ADSFactory
amine@2	276
amine@2	277 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
amine@2	278
amine@2	279 file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 320)
amine@2	280
amine@2	281 # If no sampling rate is specified, ADSFactory use 16KHz as the default
amine@2	282 # rate for the microphone. If you want to use a window of 100 ms, use
amine@2	283 # a block size of 1600
amine@2	284 mic_ads = ADSFactory.ads(block_size = 1600)
amine@2	285
amine@2	286 So if your not sure what you analysis windows in seconds is, use the following:
amine@2	287
amine@2	288 .. code:: python
amine@2	289
amine@2	290 my_ads = ADSFactory.ads(...)
amine@2	291 analysis_win_seconds = float(my_ads.get_block_size()) / my_ads.get_sampling_rate()
amine@2	292 analysis_window_ms = analysis_win_seconds * 1000
amine@2	293
amine@2	294 # For a `max_continuous_silence` of 300 ms use:
amine@2	295 max_continuous_silence = int(300. / analysis_window_ms)
amine@2	296
amine@2	297 # Which is the same as
amine@2	298 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
amine@2	299
amine@3	300
amine@2	301
amine@2	302 Extract isolated phrases from an utterance
amine@2	303 ------------------------------------------
amine@2	304
amine@2	305 We will build an `AudioDataSource` using a wave file from the database.
amine@2	306 The file contains of isolated pronunciation of digits from 1 to 1
amine@2	307 in Arabic as well as breath-in/out between 2 and 3. The code will play the
amine@2	308 original file then the detected sounds separately. Note that we use an
amine@2	309 `energy_threshold` of 65, this parameter should be carefully chosen. It depends
amine@2	310 on microphone quality, background noise and the amplitude of events you want to
amine@2	311 detect.
amine@2	312
amine@2	313 .. code:: python
amine@2	314
amine@2	315 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@2	316
amine@2	317 # We set the `record` argument to True so that we can rewind the source
amine@2	318 asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
amine@2	319
amine@2	320 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
amine@2	321
amine@2	322 # Defalut analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
amine@2	323 # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
amine@2	324 # max_length=4000 : maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
amine@2	325 # max_continuous_silence=30 : maximum length of a tolerated silence within a valid audio activity is 30 * 30 == 300 ms
amine@2	326 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
amine@2	327
amine@2	328 asource.open()
amine@2	329 tokens = tokenizer.tokenize(asource)
amine@2	330
amine@2	331 # Play detected regions back
amine@2	332
amine@2	333 player = player_for(asource)
amine@2	334
amine@2	335 # Rewind and read the whole signal
amine@2	336 asource.rewind()
amine@2	337 original_signal = []
amine@2	338
amine@2	339 while True:
amine@2	340 w = asource.read()
amine@2	341 if w is None:
amine@2	342 break
amine@2	343 original_signal.append(w)
amine@2	344
amine@2	345 original_signal = ''.join(original_signal)
amine@2	346
amine@2	347 print("Playing the original file...")
amine@2	348 player.play(original_signal)
amine@2	349
amine@2	350 print("playing detected regions...")
amine@2	351 for t in tokens:
amine@2	352 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@2	353 data = ''.join(t[0])
amine@2	354 player.play(data)
amine@2	355
amine@2	356 assert len(tokens) == 8
amine@2	357
amine@2	358
amine@2	359 The tokenizer extracts 8 audio regions from the signal, including all isolated digits
amine@2	360 (from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
amine@2	361 that, in the original file, the last three digit are closer to each other than the
amine@2	362 previous ones. If you wan them to be extracted as one single phrase, you can do so
amine@2	363 by tolerating a larger continuous silence within a detection:
amine@2	364
amine@2	365 .. code:: python
amine@2	366
amine@2	367 tokenizer.max_continuous_silence = 50
amine@2	368 asource.rewind()
amine@2	369 tokens = tokenizer.tokenize(asource)
amine@2	370
amine@2	371 for t in tokens:
amine@2	372 print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
amine@2	373 data = ''.join(t[0])
amine@2	374 player.play(data)
amine@2	375
amine@2	376 assert len(tokens) == 6
amine@2	377
amine@2	378
amine@3	379 Trim leading and tailing silence
amine@2	380 ---------------------------------
amine@2	381
amine@2	382 The tokenizer in the following example is set up to remove the silence
amine@2	383 that precedes the first acoustic activity or follows the last activity
amine@2	384 in a record. It preserves whatever it founds between the two activities.
amine@3	385 In other words, it removes the leading and tailing silence.
amine@2	386
amine@2	387 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
amine@2	388 (i.e. bloc_ksize == 4410)
amine@2	389
amine@2	390 Energy threshold is 50.
amine@2	391
amine@2	392 The tokenizer will start accumulating windows up from the moment it encounters
amine@2	393 the first analysis window of an energy >= 50. ALL the following windows will be
amine@3	394 kept regardless of their energy. At the end of the analysis, it will drop tailing
amine@2	395 windows with an energy below 50.
amine@2	396
amine@2	397 This is an interesting example because the audio file we're analyzing contains a very
amine@2	398 brief noise that occurs within the leading silence. We certainly do want our tokenizer
amine@2	399 to stop at this point and considers whatever it comes after as a useful signal.
amine@2	400 To force the tokenizer to ignore that brief event we use two other parameters `init_min`
amine@2	401 ans `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
amine@2	402 that a valid event must start with at least 3 noisy windows, between which there
amine@2	403 is at most 1 silent window.
amine@2	404
amine@2	405 Still with this configuration we can get the tokenizer detect that noise as a valid event
amine@5	406 (if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
amine@2	407 large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
amine@2	408 longer silence and hence the energy of the overall analysis window will be below 50.
amine@2	409
amine@2	410 When using a shorter analysis window (of 10ms for instance, block_size == 441), the brief
amine@2	411 noise contributes more to energy calculation which yields an energy of over 50 for the window.
amine@2	412 Again we can deal with this situation by using a higher energy threshold (55 for example).
amine@2	413
amine@2	414 .. code:: python
amine@2	415
amine@2	416 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
amine@2	417
amine@2	418 # record = True so that we'll be able to rewind the source.
amine@3	419 asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence,
amine@2	420 record=True, block_size=4410)
amine@2	421 asource.open()
amine@2	422
amine@2	423 original_signal = []
amine@2	424 # Read the whole signal
amine@2	425 while True:
amine@2	426 w = asource.read()
amine@2	427 if w is None:
amine@2	428 break
amine@2	429 original_signal.append(w)
amine@2	430
amine@2	431 original_signal = ''.join(original_signal)
amine@2	432
amine@2	433 # rewind source
amine@2	434 asource.rewind()
amine@2	435
amine@2	436 # Create a validator with an energy threshold of 50
amine@2	437 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@2	438
amine@2	439 # Create a tokenizer with an unlimited token length and continuous silence within a token
amine@3	440 # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence
amine@3	441 trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE)
amine@2	442
amine@2	443
amine@2	444 tokens = trimmer.tokenize(asource)
amine@2	445
amine@2	446 # Make sure we only have one token
amine@2	447 assert len(tokens) == 1, "Should have detected one single token"
amine@2	448
amine@2	449 trimmed_signal = ''.join(tokens[0][0])
amine@2	450
amine@2	451 player = player_for(asource)
amine@2	452
amine@3	453 print("Playing original signal (with leading and tailing silence)...")
amine@2	454 player.play(original_signal)
amine@2	455 print("Playing trimmed signal...")
amine@2	456 player.play(trimmed_signal)
amine@2	457
amine@2	458
amine@2	459 Online audio signal processing
amine@2	460 ------------------------------
amine@2	461
amine@5	462 In the next example, audio data is directly acquired from the built-in microphone.
amine@2	463 The `tokenize` method is passed a callback function so that audio activities
amine@2	464 are delivered as soon as they are detected. Each detected activity is played
amine@2	465 back using the build-in audio output device.
amine@2	466
amine@5	467 As mentioned before , Signal energy is strongly related to many factors such
amine@2	468 microphone sensitivity, background noise (including noise inherent to the hardware),
amine@2	469 distance and your operating system sound settings. Try a lower `energy_threshold`
amine@2	470 if your noise does not seem to be detected and a higher threshold if you notice
amine@2	471 an over detection (echo method prints a detection where you have made no noise).
amine@2	472
amine@2	473 .. code:: python
amine@2	474
amine@2	475 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
amine@2	476
amine@2	477 # record = True so that we'll be able to rewind the source.
amine@2	478 # max_time = 10: read 10 seconds from the microphone
amine@2	479 asource = ADSFactory.ads(record=True, max_time=10)
amine@2	480
amine@2	481 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
amine@2	482 tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
amine@2	483
amine@2	484 player = player_for(asource)
amine@2	485
amine@2	486 def echo(data, start, end):
amine@2	487 print("Acoustic activity at: {0}--{1}".format(start, end))
amine@2	488 player.play(''.join(data))
amine@2	489
amine@2	490 asource.open()
amine@2	491
amine@2	492 tokenizer.tokenize(asource, callback=echo)
amine@2	493
amine@2	494 If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
amine@2	495
amine@2	496 .. code:: python
amine@2	497
amine@2	498 asource.rewind()
amine@2	499 # change energy threshold for example
amine@2	500 tokenizer.validator.set_energy_threshold(55)
amine@2	501 tokenizer.tokenize(asource, callback=echo)
amine@2	502
amine@2	503 In case you want to play the whole recorded signal back use:
amine@2	504
amine@2	505 .. code:: python
amine@2	506
amine@2	507 player.play(asource.get_audio_source().get_data_buffer())
amine@2	508
amine@2	509
amine@2	510 Contributing
amine@2	511 ============
amine@2	512 auditok is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
amine@2	513
amine@2	514
amine@3	515 Amine SEHILI <amine.sehili@gmail.com>
amine@2	516 September 2015
amine@2	517
amine@2	518 License
amine@2	519 =======
amine@2	520
amine@3	521 This package is published under GNU GPL Version 3.

Mercurial > hg > auditok

annotate quickstart.rst @ 5:252d698ae642