Mercurial > hg > auditok
changeset 3:364eeb8e8bd2
README.md, typos fixes
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Tue, 22 Sep 2015 10:49:57 +0200 |
parents | edee860b9f61 |
children | 31c97510b16b |
files | .gitignore README.md auditok/__init__.py auditok/core.py auditok/data/was_der_mensch_saet_das_wir_er_veilfach_enrten_44100Hz_mono_lead_trail_silence.wav auditok/data/was_der_mensch_saet_das_wird_er_vielfach_ernten_44100Hz_mono_lead_tail_silence.wav auditok/dataset.py demos/audio_trim_demo.py quickstart.rst tests/test_StreamTokenizer.py |
diffstat | 10 files changed, 232 insertions(+), 230 deletions(-) [+] |
line wrap: on
line diff
--- a/.gitignore Thu Sep 17 22:01:30 2015 +0200 +++ b/.gitignore Tue Sep 22 10:49:57 2015 +0200 @@ -1,2 +1,11 @@ -pdoc/ +web/ *pyc +auditok/__pycache__ +.*.swp +tags +build +dist +MANIFEST.in +*~ +.pydevproject +.project
--- a/README.md Thu Sep 17 22:01:30 2015 +0200 +++ b/README.md Tue Sep 22 10:49:57 2015 +0200 @@ -15,6 +15,7 @@ Demos ----- This code reads data from the microphone and plays back whatever it detects. + python demos/echo.py `echo.py` accepts two arguments: energy threshold (default=45) and duration in seconds (default=10):
--- a/auditok/__init__.py Thu Sep 17 22:01:30 2015 +0200 +++ b/auditok/__init__.py Tue Sep 22 10:49:57 2015 +0200 @@ -22,10 +22,10 @@ interests you are audio regions made up of a sequence of ``noisy'' windows (whatever kind of noise: speech, baby cry, laughter, etc.). -The most important component of `auditok` is the `StreamTokenizer` class. +The most important component of `auditok` is the `auditok.core.StreamTokenizer` class. An instance of this class encapsulates a `DataValidator` and can be configured to detect the desired regions from a stream. -The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource` +The `StreamTokenizer.tokenize` method accepts a `DataSource` object that has a `read` method. Read data can be of any type accepted by the `validator`. @@ -123,18 +123,18 @@ #!python [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] -Notice the trailing lower case letters "dd" and "ee" at the end of the two -tokens. The default behavior of `StreamTokenizer` is to keep the *trailing +Notice the tailing lower case letters "dd" and "ee" at the end of the two +tokens. The default behavior of `StreamTokenizer` is to keep the *tailing silence* if it does'nt exceed `max_continuous_silence`. This can be changed -using the `DROP_TRAILING_SILENCE` mode (see next example). +using the `DROP_TAILING_SILENCE` mode (see next example). -## Remove trailing silence +## Remove tailing silence -Trailing silence can be useful for many sound recognition applications, including -speech recognition. Moreover, from the human auditory system point of view, trailing +Tailing silence can be useful for many sound recognition applications, including +speech recognition. Moreover, from the human auditory system point of view, tailing low energy signal helps removing abrupt signal cuts. -If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: +If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`: #!python @@ -147,7 +147,7 @@ dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") tokenizer = StreamTokenizer(validator=UpperCaseChecker(), min_length=1, max_length=9999, max_continuous_silence=2, - mode=StreamTokenizer.DROP_TRAILING_SILENCE) + mode=StreamTokenizer.DROP_TAILING_SILENCE) tokenizer.tokenize(dsource) @@ -342,12 +342,12 @@ assert len(tokens) == 6 -## Trim leading and trailing silence +## Trim leading and tailing silence The tokenizer in the following example is set up to remove the silence that precedes the first acoustic activity or follows the last activity in a record. It preserves whatever it founds between the two activities. -In other words, it removes the leading and trailing silence. +In other words, it removes the leading and tailing silence. Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms (i.e. bloc_ksize == 4410) @@ -356,7 +356,7 @@ The tokenizer will start accumulating windows up from the moment it encounters the first analysis window of an energy >= 50. ALL the following windows will be -kept regardless of their energy. At the end of the analysis, it will drop trailing +kept regardless of their energy. At the end of the analysis, it will drop tailing windows with an energy below 50. This is an interesting example because the audio file we're analyzing contains a very @@ -379,10 +379,9 @@ #!python from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset - import pyaudio # record = True so that we'll be able to rewind the source. - asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, + asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence, record=True, block_size=4410) asource.open() @@ -403,9 +402,9 @@ validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) # Create a tokenizer with an unlimited token length and continuous silence within a token - # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence + # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, - max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) + max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE) tokens = trimmer.tokenize(asource) @@ -417,7 +416,7 @@ player = player_for(asource) - print("Playing original signal (with leading and trailing silence)...") + print("Playing original signal (with leading and tailing silence)...") player.play(original_signal) print("Playing trimmed signal...") player.play(trimmed_signal) @@ -439,7 +438,6 @@ #!python from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for - import pyaudio # record = True so that we'll be able to rewind the source. # max_time = 10: read 10 seconds from the microphone
--- a/auditok/core.py Thu Sep 17 22:01:30 2015 +0200 +++ b/auditok/core.py Tue Sep 22 10:49:57 2015 +0200 @@ -12,102 +12,79 @@ class StreamTokenizer(): - """ Class for stream tokenizers. It implements a 4-state automata scheme for interesting sub-sequences extraction. - """ - SILENCE = 0 - POSSIBLE_SILENCE = 1 - POSSIBLE_NOISE = 2 - NOISE = 3 + **Parameters:** - STRICT_MIN_LENGTH = 2 - DROP_TRAILING_SILENCE = 4 + `validator` : + instance of `DataValidator` that implements `is_valid` method. - def __init__(self, validator, - min_length, max_length, max_continuous_silence, - init_min=0, init_max_silence=0, - mode=0): + `min_length` : *(int)* + Minimum number of frames of a valid token. This includes all \ + tolerated non valid frames within the token. - """ + `max_length` : *(int)* + Maximum number of frames of a valid token. This includes all \ + tolerated non valid frames within the token. + + `max_continuous_silence` : *(int)* + Maximum number of consecutive non-valid frames within a token. + Note that, within a valid token, there may be many tolerated \ + *silent* regions that contain each a number of non valid frames up to \ + `max_continuous_silence` + + `init_min` : *(int, default=0)* + Minimum number of consecutive valid frames that must be **initially** \ + gathered before any sequence of non valid frames can be tolerated. This + option is not always needed, it can be used to drop non-valid tokens as + early as possible. **Default = 0** means that the option is by default + ineffective. + + `init_max_silence` : *(int, default=0)* + Maximum number of tolerated consecutive non-valid frames if the \ + number already gathered valid frames has not yet reached 'init_min'. + This arguement is normally used if `init_min` is used. **Default = 0**, + by default this argument is not taken into consideration. - Parameters - ----------- + `mode` : *(int, default=0)* + `mode` can be: + 1. `StreamTokenizer.STRICT_MIN_LENGTH`: + if token *i* is delivered because `max_length` + is reatched, and token *i+1* is immedialtely adjacent to + token *i* (i.e. token *i* ends at frame *k* and token *i+1* starts + at frame *k+1*) then accept toekn *i+1* only of it has a size of at + least `min_length`. The default behavior is to accept toekn *i+1* + event if it is shorter than `min_length` (given that the above conditions + are fullfilled of course). + + ** Example ** + + In the following code, without `STRICT_MIN_LENGTH`, the 'BB' token is + accepted although it is shorter than `min_length` (3), because it immediatly + follows the latest delivered token: + + .. code:: python + + from auditok import StreamTokenizer, StringDataSource, DataValidator - `validator` : - instance of `DataValidator` that implements `is_valid` method. - - `min_length` : *(int)* - Minimum number of frames of a valid token. This includes all \ - tolerated non valid frames within the token. - - `max_length` : *(int)* - Maximum number of frames of a valid token. This includes all \ - tolerated non valid frames within the token. - - `max_continuous_silence` : *(int)* - Maximum number of consecutive non-valid frames within a token. - Note that, within a valid token, there may be many tolerated \ - *silent* regions that contain each a number of non valid frames up to \ - `max_continuous_silence` - - `init_min` : *(int, default=0)* - Minimum number of consecutive valid frames that must be **initially** \ - gathered before any sequence of non valid frames can be tolerated. This - option is not always needed, it can be used to drop non-valid tokens as - early as possible. **Default = 0** means that the option is by default - ineffective. - - `init_max_silence` : *(int, default=0)* - Maximum number of tolerated consecutive non-valid frames if the \ - number already gathered valid frames has not yet reached 'init_min'. - This arguement is normally used if `init_min` is used. **Default = 0**, - by default this argument is not taken into consideration. - - - keep_trailing_silence : boolean, default=False - Whether to keep the trailing non valid frames of a valid token - This seems to be particularly useful to avoid an abrupt cut-off - when tokenizing some kinds of signals (e.g. audio signal) - - `mode` : *(int, default=0)* - `mode` can be: - - 1. `StreamTokenizer.STRICT_MIN_LENGTH`: if token *i* is delivered because `max_length` - is reatched, and token *i+1* is immedialtely adjacent to - token *i* (i.e. token *i* ends at frame *k* and token *i+1* starts - at frame *k+1*) then accept toekn *i+1* only of it has a size of at - least `min_length`. The default behavior is to accept toekn *i+1* - event if it is shorter than `min_length` (given that the above conditions - are fullfilled of course). - - Example - ------- - - In the following code, without `STRICT_MIN_LENGTH`, the 'BB' token is - accepted although it is shorter than `min_length` (3), because it immediatly - follows the latest delivered token: - - #!python - from auditok import StreamTokenizer, StringDataSource, DataValidator - class UpperCaseChecker(DataValidator): - def is_valid(self, frame): - return frame.isupper() - + def is_valid(self, frame): + return frame.isupper() + dsource = StringDataSource("aaaAAAABBbbb") tokenizer = StreamTokenizer(validator=UpperCaseChecker(), min_length=3, max_length=4, max_continuous_silence=0) - + tokenizer.tokenize(dsource) - - + + output: - #!python + .. code:: python + [(['A', 'A', 'A', 'A'], 3, 6), (['B', 'B'], 7, 8)] The following toknizer will however reject the 'BB' token @@ -117,62 +94,81 @@ min_length=3, max_length=4, max_continuous_silence=0, mode=StreamTokenizer.STRICT_MIN_LENGTH) tokenizer.tokenize(dsource) + + output: + + .. code:: python + + [(['A', 'A', 'A', 'A'], 3, 6)] - output: + + 2. `StreamTokenizer.DROP_TAILING_SILENCE`: drop all tailing non-valid frames + from a token to be delivered if and only if it is not **truncated**. + This can be a bit tricky. A token is actually delivered if: + + a. `max_continuous_silence` is reached + + OR + + b. Its length reaches `max_length`. This is called a **truncated** token + + In the current implementation, a `StreamTokenizer`'s decision is only based on seen + data and on incoming data. Thus, if a token is truncated at a non-valid but tolerated + frame (`max_length` is reached but `max_continuous_silence` not yet) any tailing + silence will be kept because it can potentilly be part of valid token (if `max_length` + was bigger). But if `max_continuous_silence` is reched before `max_length`, the delivered + token will not be considered as truncted but a result of *normal* end of detection + (i.e. no more valid data). In that case the tailing silence can be removed if you use + the `StreamTokenizer.DROP_TAILING_SILENCE` mode. + + Take the following example: - #!python - [(['A', 'A', 'A', 'A'], 3, 6)] - - - 2. `StreamTokenizer.DROP_TRAILING_SILENCE`: drop all trailing non-valid frames - from a token to be delivered if and only if it is not **truncated**. - This can be a bit tricky. A token is actually delivered if: - - a. `max_continuous_silence` is reached - - OR - - b. Its length reaches `max_length`. This is called a **truncated** token - - In the current implementation, a `StreamTokenizer`'s decision is only based on seen - data and on incoming data. Thus, if a token is truncated at a non-valid but tolerated - frame (`max_length` is reached but `max_continuous_silence` not yet) any trailing - silence will be kept because it can potentilly be part of valid token (if `max_length` - was bigger). But if `max_continuous_silence` is reched before `max_length`, the delivered - token will not be considered as truncted but a result of *normal* end of detection - (i.e. no more valid data). In that case the trailing silence can be removed if you use - the `StreamTokenizer.DROP_TRAILING_SILENCE` mode. - - Take the following example: - - #!python - tokenizer = StreamTokenizer(validator=UpperCaseChecker(), min_length=3, - max_length=6, max_continuous_silence=3, - mode=StreamTokenizer.DROP_TRAILING_SILENCE) - - dsource = StringDataSource("aaaAAAaaaBBbbbb") - tokenizer.tokenize(dsource) - - output: + .. code:: python + + tokenizer = StreamTokenizer(validator=UpperCaseChecker(), min_length=3, + max_length=6, max_continuous_silence=3, + mode=StreamTokenizer.DROP_TAILING_SILENCE) - #!python - [(['A', 'A', 'A', 'a', 'a', 'a'], 3, 8), (['B', 'B'], 9, 10)] - - The first troken is delivered with its trailing silence because it is truncated - while the second one has its trailing frames removed. + dsource = StringDataSource("aaaAAAaaaBBbbbb") + tokenizer.tokenize(dsource) - Without `StreamTokenizer.DROP_TRAILING_SILENCE` the output whould be: - - #!python - [(['A', 'A', 'A', 'a', 'a', 'a'], 3, 8), (['B', 'B', 'b', 'b', 'b'], 9, 13)] + output: + + .. code:: python + + [(['A', 'A', 'A', 'a', 'a', 'a'], 3, 8), (['B', 'B'], 9, 10)] + + The first troken is delivered with its tailing silence because it is truncated + while the second one has its tailing frames removed. + + Without `StreamTokenizer.DROP_TAILING_SILENCE` the output whould be: + + .. code:: python + + [(['A', 'A', 'A', 'a', 'a', 'a'], 3, 8), (['B', 'B', 'b', 'b', 'b'], 9, 13)] - - - 3. `StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TRAILING_SILENCE`: - use both options. That means: first remove trailing silence, then ckeck if the - token still has at least a length of `min_length`. - """ + + 3. `StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TAILING_SILENCE`: + use both options. That means: first remove tailing silence, then ckeck if the + token still has at least a length of `min_length`. + + """ + + SILENCE = 0 + POSSIBLE_SILENCE = 1 + POSSIBLE_NOISE = 2 + NOISE = 3 + + STRICT_MIN_LENGTH = 2 + DROP_TAILING_SILENCE = 4 + + def __init__(self, validator, + min_length, max_length, max_continuous_silence, + init_min=0, init_max_silence=0, + mode=0): + + if not isinstance(validator, DataValidator): raise TypeError("'validator' must be an instance of 'DataValidator'") @@ -200,7 +196,7 @@ self._mode = None self.set_mode(mode) self._strict_min_length = (mode & self.STRICT_MIN_LENGTH) != 0 - self._drop_trailing_silence = (mode & self.DROP_TRAILING_SILENCE) != 0 + self._drop_tailing_silence = (mode & self.DROP_TAILING_SILENCE) != 0 self._deliver = None self._tokens = None @@ -215,34 +211,27 @@ def set_mode(self, mode): """ - Set this tokenizer's mode. - - Paramerters - ------------ + **Parameters:** `mode` : *(int)* - New mode, must be one of: - - a. `StreamTokenizer.STRICT_MIN_LENGTH` - - b. `StreamTokenizer.DROP_TRAILING_SILENCE` - - c. `StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TRAILING_SILENCE` - - d. 0 - - - See `StreamTokenizer.__init__` for more information about the mode. + New mode, must be one of: + + a. `StreamTokenizer.STRICT_MIN_LENGTH` + b. `StreamTokenizer.DROP_TAILING_SILENCE` + c. `StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TAILING_SILENCE` + d. 0 + + See `StreamTokenizer.__init__` for more information about the mode. """ - if not mode in [self.STRICT_MIN_LENGTH, self.DROP_TRAILING_SILENCE, - self.STRICT_MIN_LENGTH | self.DROP_TRAILING_SILENCE, 0]: + if not mode in [self.STRICT_MIN_LENGTH, self.DROP_TAILING_SILENCE, + self.STRICT_MIN_LENGTH | self.DROP_TAILING_SILENCE, 0]: raise ValueError("Wrong value for mode") self._mode = mode self._strict_min_length = (mode & self.STRICT_MIN_LENGTH) != 0 - self._drop_trailing_silence = (mode & self.DROP_TRAILING_SILENCE) != 0 + self._drop_tailing_silence = (mode & self.DROP_TAILING_SILENCE) != 0 def get_mode(self): @@ -250,10 +239,10 @@ Return the current mode. To check whether a specific mode is activated use the bitwise 'and' operator `&`. Example: - #!python + .. code:: python + if mode & self.STRICT_MIN_LENGTH != 0: ... - """ return self._mode @@ -271,25 +260,25 @@ Read data from `data_source`, one frame a time, and process the read frames in order to detect sequences of frames that make up valid tokens. - Parameters - ---------- + **Parameters:** + `data_source` : instance of the `DataSource` class that implements a 'read' method. - 'read' should return a slice of signal, i.e. frame (of whatever \ - type as long as it can be processed by validator) and None if \ - there is no more signal. + 'read' should return a slice of signal, i.e. frame (of whatever \ + type as long as it can be processed by validator) and None if \ + there is no more signal. `callback` : an optional 3-argument function. - If a `callback` function is given, it will be called each time a valid token - is found. + If a `callback` function is given, it will be called each time a valid token + is found. - Returns - ------- + **Returns:** A list of tokens if `callback` is None. Each token is tuple with the following elemnts: - #!python + .. code:: python + (data, start, end) where `data` is a list of read frames, `start`: index of the first frame in the @@ -422,7 +411,7 @@ def _process_end_of_detection(self, truncated=False): - if not truncated and self._drop_trailing_silence and self._silence_length > 0: + if not truncated and self._drop_tailing_silence and self._silence_length > 0: # happens if max_continuous_silence is reached # or max_length is reached at a silent frame self._data = self._data[0: - self._silence_length]
Binary file auditok/data/was_der_mensch_saet_das_wir_er_veilfach_enrten_44100Hz_mono_lead_trail_silence.wav has changed
Binary file auditok/data/was_der_mensch_saet_das_wird_er_vielfach_ernten_44100Hz_mono_lead_tail_silence.wav has changed
--- a/auditok/dataset.py Thu Sep 17 22:01:30 2015 +0200 +++ b/auditok/dataset.py Tue Sep 22 10:49:57 2015 +0200 @@ -7,7 +7,7 @@ import os -__all__ = ["one_to_six_arabic_16000_mono_bc_noise", "was_der_mensch_saet_mono_44100_lead_trail_silence"] +__all__ = ["one_to_six_arabic_16000_mono_bc_noise", "was_der_mensch_saet_mono_44100_lead_tail_silence"] _current_dir = os.path.dirname(os.path.realpath(__file__)) @@ -15,6 +15,6 @@ 16000_mono_bc_noise.wav".format(cd=_current_dir, sep=os.path.sep) -was_der_mensch_saet_mono_44100_lead_trail_silence = "{cd}{sep}data{sep}was_\ -der_mensch_saet_das_wir_er_veilfach_enrten_44100Hz_mono_lead_trail_\ +was_der_mensch_saet_mono_44100_lead_tail_silence = "{cd}{sep}data{sep}was_\ +der_mensch_saet_das_wird_er_vielfach_ernten_44100Hz_mono_lead_tail_\ silence.wav".format(cd=_current_dir, sep=os.path.sep)
--- a/demos/audio_trim_demo.py Thu Sep 17 22:01:30 2015 +0200 +++ b/demos/audio_trim_demo.py Tue Sep 22 10:49:57 2015 +0200 @@ -3,7 +3,7 @@ September, 2015 """ -# Trim leading and trailing silence from a record +# Trim leading and tailing silence from a record from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset import pyaudio @@ -12,7 +12,7 @@ The tokenizer in the following example is set up to remove the silence that precedes the first acoustic activity or follows the last activity in a record. It preserves whatever it founds between the two activities. -In other words, it removes the leading and trailing silence. +In other words, it removes the leading and tailing silence. Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms (i.e. bloc_ksize == 4410) @@ -21,7 +21,7 @@ The tokenizer will start accumulating windows up from the moment it encounters the first analysis window of an energy >= 50. ALL the following windows will be -kept regardless of their energy. At the end of the analysis, it will drop trailing +kept regardless of their energy. At the end of the analysis, it will drop tailing windows with an energy below 50. This is an interesting example because the audio file we're analyzing contains a very @@ -45,7 +45,7 @@ # record = True so that we'll be able to rewind the source. -asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, +asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence, record=True, block_size=4410) asource.open() @@ -67,7 +67,7 @@ validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) # Create a tokenizer with an unlimited token length and continuous silence within a token -# Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence +# Note the DROP_TRAILING_SILENCE mode that will ensure removing tailing silence trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE, init_min=3, init_max_silence=1) @@ -81,7 +81,7 @@ player = player_for(asource) -print("\n ** Playing original signal (with leading and trailing silence)...") +print("\n ** Playing original signal (with leading and tailing silence)...") player.play(original_signal) print("\n ** Playing trimmed signal...") player.play(trimmed_signal)
--- a/quickstart.rst Thu Sep 17 22:01:30 2015 +0200 +++ b/quickstart.rst Tue Sep 22 10:49:57 2015 +0200 @@ -3,6 +3,9 @@ auditok, an AUDIo TOKenization module ===================================== +.. contents:: `Contents` + :depth: 3 + **auditok** is a module that can be used as a generic tool for data tokenization. Although its core motivation is **Acoustic Activity @@ -26,10 +29,10 @@ interests you are audio regions made up of a sequence of "noisy" windows (whatever kind of noise: speech, baby cry, laughter, etc.). -The most important component of `auditok` is the `StreamTokenizer` class. -An instance of this class encapsulates a `DataValidator` and can be +The most important component of `auditok` is the `auditok.core.StreamTokenizer` +class. An instance of this class encapsulates a `DataValidator` and can be configured to detect the desired regions from a stream. -The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource` +The `StreamTokenizer.tokenize` method accepts a `DataSource` object that has a `read` method. Read data can be of any type accepted by the `validator`. @@ -48,14 +51,10 @@ a higher abstraction level than `AudioSource` thanks to a bunch of handy features: -- Define a fixed-length of block_size (i.e. analysis window) -- Allow overlap between two consecutive analysis windows (hop_size < block_size). - This can be very important if your validator use the **spectral** - information of audio data instead of raw audio samples. -- Limit the amount (i.e. duration) of read data (very useful when reading - data from the microphone) -- Record and rewind data (also useful if you read data from the microphone - and you want to process it many times offline and/or save it) +- Define a fixed-length block_size (i.e. analysis window) +- Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples. +- Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone) +- Record and rewind data (also useful if you read data from the microphone and you want to process it many times offline and/or save it) Last but not least, the current version has only one audio window validator based on @@ -82,6 +81,7 @@ Extract sub-sequences of consecutive upper case letters ------------------------------------------------------- + We want to extract sub-sequences of characters that have: - A minimu length of 1 (`min_length` = 1) @@ -108,11 +108,15 @@ The output is a list of two tuples, each contains the extracted sub-sequence and its start and end position in the original sequence respectively: + +.. code:: python + [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)] -Tolerate up to two non-valid (lower case) letter within an extracted sequence ------------------------------------------------------------------------------ + +Tolerate up to two non-valid (lower case) letters within an extracted sequence +------------------------------------------------------------------------------ To do so, we set `max_continuous_silence` =2: @@ -138,19 +142,19 @@ [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)] -Notice the trailing lower case letters "dd" and "ee" at the end of the two -tokens. The default behavior of `StreamTokenizer` is to keep the *trailing +Notice the tailing lower case letters "dd" and "ee" at the end of the two +tokens. The default behavior of `StreamTokenizer` is to keep the *tailing silence* if it does'nt exceed `max_continuous_silence`. This can be changed -using the `DROP_TRAILING_SILENCE` mode (see next example). +using the `DROP_TAILING_SILENCE` mode (see next example). -Remove trailing silence +Remove tailing silence ----------------------- -Trailing silence can be useful for many sound recognition applications, including -speech recognition. Moreover, from the human auditory system point of view, trailing +Tailing silence can be useful for many sound recognition applications, including +speech recognition. Moreover, from the human auditory system point of view, tailing low energy signal helps removing abrupt signal cuts. -If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`: +If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`: .. code:: python @@ -163,7 +167,7 @@ dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee") tokenizer = StreamTokenizer(validator=UpperCaseChecker(), min_length=1, max_length=9999, max_continuous_silence=2, - mode=StreamTokenizer.DROP_TRAILING_SILENCE) + mode=StreamTokenizer.DROP_TAILING_SILENCE) tokenizer.tokenize(dsource) @@ -174,9 +178,11 @@ [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)] + Limit the length of detected tokens ----------------------------------- + Imagine that you just want to detect and recognize a small part of a long acoustic event (e.g. engine noise, water flow, etc.) and avoid that that event hogs the tokenizer and prevent it from feeding the event to the next @@ -211,11 +217,14 @@ output: +.. code:: python + "token = 'ABCDE', starts at 3, ends at 7" "token = 'FGHIJ', starts at 8, ends at 12" "token = 'K', starts at 13, ends at 13" + Using real audio data ===================== @@ -288,9 +297,7 @@ # Which is the same as max_continuous_silence = int(0.3 / (analysis_window_ms / 1000)) - -Examples --------- + Extract isolated phrases from an utterance ------------------------------------------ @@ -369,13 +376,13 @@ assert len(tokens) == 6 -Trim leading and trailing silence +Trim leading and tailing silence --------------------------------- The tokenizer in the following example is set up to remove the silence that precedes the first acoustic activity or follows the last activity in a record. It preserves whatever it founds between the two activities. -In other words, it removes the leading and trailing silence. +In other words, it removes the leading and tailing silence. Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms (i.e. bloc_ksize == 4410) @@ -384,7 +391,7 @@ The tokenizer will start accumulating windows up from the moment it encounters the first analysis window of an energy >= 50. ALL the following windows will be -kept regardless of their energy. At the end of the analysis, it will drop trailing +kept regardless of their energy. At the end of the analysis, it will drop tailing windows with an energy below 50. This is an interesting example because the audio file we're analyzing contains a very @@ -407,10 +414,9 @@ .. code:: python from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset - import pyaudio # record = True so that we'll be able to rewind the source. - asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence, + asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence, record=True, block_size=4410) asource.open() @@ -431,8 +437,8 @@ validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50) # Create a tokenizer with an unlimited token length and continuous silence within a token - # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence - trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE) + # Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence + trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE) tokens = trimmer.tokenize(asource) @@ -444,7 +450,7 @@ player = player_for(asource) - print("Playing original signal (with leading and trailing silence)...") + print("Playing original signal (with leading and tailing silence)...") player.play(original_signal) print("Playing trimmed signal...") player.play(trimmed_signal) @@ -467,7 +473,6 @@ .. code:: python from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for - import pyaudio # record = True so that we'll be able to rewind the source. # max_time = 10: read 10 seconds from the microphone @@ -507,10 +512,10 @@ **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute. -Amine SEHILI <amine.sehili[_at_]gmail.com> +Amine SEHILI <amine.sehili@gmail.com> September 2015 License ======= -This package is published under GNU GPL Version 3. +This package is published under GNU GPL Version 3. \ No newline at end of file
--- a/tests/test_StreamTokenizer.py Thu Sep 17 22:01:30 2015 +0200 +++ b/tests/test_StreamTokenizer.py Tue Sep 22 10:49:57 2015 +0200 @@ -421,11 +421,11 @@ self.assertEqual(end, 9, msg="wrong end frame for token 1, expected: 9, found: {0} ".format(end)) - def test_DROP_TRAILING_SILENCE(self): + def test_DROP_TAILING_SILENCE(self): tokenizer = StreamTokenizer(self.A_validator, min_length = 5, max_length=10, max_continuous_silence=2, init_min = 3, - init_max_silence = 3, mode=StreamTokenizer.DROP_TRAILING_SILENCE) + init_max_silence = 3, mode=StreamTokenizer.DROP_TAILING_SILENCE) data_source = StringDataSource("aaAAAAAaaaaa") # ^ ^ @@ -446,11 +446,11 @@ self.assertEqual(end, 6, msg="wrong end frame for token 1, expected: 6, found: {0} ".format(end)) - def test_STRICT_MIN_LENGTH_and_DROP_TRAILING_SILENCE(self): + def test_STRICT_MIN_LENGTH_and_DROP_TAILING_SILENCE(self): tokenizer = StreamTokenizer(self.A_validator, min_length = 5, max_length=8, max_continuous_silence=3, init_min = 3, - init_max_silence = 3, mode=StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TRAILING_SILENCE) + init_max_silence = 3, mode=StreamTokenizer.STRICT_MIN_LENGTH | StreamTokenizer.DROP_TAILING_SILENCE) data_source = StringDataSource("aaAAAAAAAAAAAAaa") # ^ ^