auditok: quickstart.rst comparison

comparison quickstart.rst @ 3:364eeb8e8bd2

README.md, typos fixes

author	Amine Sehili <amine.sehili@gmail.com>
date	Tue, 22 Sep 2015 10:49:57 +0200
parents	edee860b9f61
children	252d698ae642

comparison

equal deleted inserted replaced

-:edee860b9f61
+:364eeb8e8bd2
 .. auditok documentation.
 auditok, an AUDIo TOKenization module
 =====================================
+.. contents:: `Contents`
+:depth: 3
 **auditok**  is a module that can be used as a generic tool for data
 tokenization. Although its core motivation is **Acoustic Activity
 Detection** (AAD) and extraction from audio streams (i.e. detect
 if you are interested in sub-sequences of numeric symbols, or a silent
 audio window (of 10, 20 or 100 milliseconds for instance) if what
 interests you are audio regions made up of a sequence of "noisy"
 windows (whatever kind of noise: speech, baby cry, laughter, etc.).
-The most important component of `auditok` is the `StreamTokenizer` class.
+The most important component of `auditok` is the `auditok.core.StreamTokenizer`
-An instance of this class encapsulates a `DataValidator` and can be
+class. An instance of this class encapsulates a `DataValidator` and can be
 configured to detect the desired regions from a stream.
-The `auditok.core.StreamTokenizer.tokenize` method accepts a `DataSource`
+The `StreamTokenizer.tokenize` method accepts a `DataSource`
 object that has a `read` method. Read data can be of any type accepted
 by the `validator`.
 As the main aim of this module is **Audio Activity Detection**,
 The `AudioDataSource` class inherits from `DataSource` and supplies
 a higher abstraction level than `AudioSource` thanks to a bunch of
 handy features:
-- Define a fixed-length of block_size (i.e. analysis window)
+- Define a fixed-length block_size (i.e. analysis window)
-- Allow overlap between two consecutive analysis windows (hop_size < block_size).
+- Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples.
-This can be very important if your validator use the **spectral**
+- Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone)
-information of audio data instead of raw audio samples.
+- Record and rewind data (also useful if you read data from the microphone and you want to process it many times offline and/or save it)
-- Limit the amount (i.e. duration) of read data (very useful when reading
-data from the microphone)
-- Record and rewind data (also useful if you read data from the microphone
-and you want to process it many times offline and/or save it)
 Last but not least, the current version has only one audio window validator based on
 signal energy.
 Extract sub-sequences of consecutive upper case letters
 -------------------------------------------------------
 We want to extract sub-sequences of characters that have:
 - A minimu length of 1 (`min_length` = 1)
 - A maximum length of 9999 (`max_length` = 9999)
 - Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
 tokenizer.tokenize(dsource)
 The output is a list of two tuples, each contains the extracted sub-sequence and its
 start and end position in the original sequence respectively:
+.. code:: python
 [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
-Tolerate up to two non-valid (lower case) letter within an extracted sequence
------------------------------------------------------------------------------
+Tolerate up to two non-valid (lower case) letters within an extracted sequence
+------------------------------------------------------------------------------
 To do so, we set `max_continuous_silence` =2:
 .. code:: python
 .. code:: python
 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
-Notice the trailing lower case letters "dd" and "ee" at the end of the two
+Notice the tailing lower case letters "dd" and "ee" at the end of the two
-tokens. The default behavior of `StreamTokenizer` is to keep the *trailing
+tokens. The default behavior of `StreamTokenizer` is to keep the *tailing
 silence* if it does'nt exceed `max_continuous_silence`. This can be changed
-using the `DROP_TRAILING_SILENCE` mode (see next example).
+using the `DROP_TAILING_SILENCE` mode (see next example).
-Remove trailing silence
+Remove tailing silence
 -----------------------
-Trailing silence can be useful for many sound recognition applications, including
+Tailing silence can be useful for many sound recognition applications, including
-speech recognition. Moreover, from the human auditory system point of view, trailing
+speech recognition. Moreover, from the human auditory system point of view, tailing
 low energy signal helps removing abrupt signal cuts.
-If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
+If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TAILING_SILENCE`:
 .. code:: python
 from auditok import StreamTokenizer, StringDataSource, DataValidator
 return frame.isupper()
 dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
 tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
 min_length=1, max_length=9999, max_continuous_silence=2,
-mode=StreamTokenizer.DROP_TRAILING_SILENCE)
+mode=StreamTokenizer.DROP_TAILING_SILENCE)
 tokenizer.tokenize(dsource)
 output:
 .. code:: python
 [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
 Limit the length of detected tokens
 -----------------------------------
 Imagine that you just want to detect and recognize a small part of a long
 acoustic event (e.g. engine noise, water flow, etc.) and avoid that that
 event hogs the tokenizer and prevent it from feeding the event to the next
 processing step (i.e. a sound recognizer). You can do this by:
 tokenizer.tokenize(dsource, callback=print_token)
 output:
+.. code:: python
 "token = 'ABCDE', starts at 3, ends at 7"
 "token = 'FGHIJ', starts at 8, ends at 12"
 "token = 'K', starts at 13, ends at 13"
 Using real audio data
 =====================
 max_continuous_silence = int(300. / analysis_window_ms)
 # Which is the same as
 max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
-Examples
---------
 Extract isolated phrases from an utterance
 ------------------------------------------
 We will build an `AudioDataSource` using a wave file from  the database.
 player.play(data)
 assert len(tokens) == 6
-Trim leading and trailing silence
+Trim leading and tailing silence
 ---------------------------------
 The  tokenizer in the following example is set up to remove the silence
 that precedes the first acoustic activity or follows the last activity
 in a record. It preserves whatever it founds between the two activities.
-In other words, it removes the leading and trailing silence.
+In other words, it removes the leading and tailing silence.
 Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
 (i.e. bloc_ksize == 4410)
 Energy threshold is 50.
 The tokenizer will start accumulating windows up from the moment it encounters
 the first analysis window of an energy >= 50. ALL the following windows will be
-kept regardless of their energy. At the end of the analysis, it will drop trailing
+kept regardless of their energy. At the end of the analysis, it will drop tailing
 windows with an energy below 50.
 This is an interesting example because the audio file we're analyzing contains a very
 brief noise that occurs within the leading silence. We certainly do want our tokenizer
 to stop at this point and considers whatever it comes after as a useful signal.
 Again we can deal with this situation by using a higher energy threshold (55 for example).
 .. code:: python
 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
-import pyaudio
 # record = True so that we'll be able to rewind the source.
-asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
+asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_tail_silence,
 record=True, block_size=4410)
 asource.open()
 original_signal = []
 # Read the whole signal
 # Create a validator with an energy threshold of 50
 validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
 # Create a tokenizer with an unlimited token length and continuous silence within a token
-# Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
+# Note the DROP_TAILING_SILENCE mode that will ensure removing tailing silence
-trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
+trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TAILING_SILENCE)
 tokens = trimmer.tokenize(asource)
 # Make sure we only have one token
 trimmed_signal = ''.join(tokens[0][0])
 player = player_for(asource)
-print("Playing original signal (with leading and trailing silence)...")
+print("Playing original signal (with leading and tailing silence)...")
 player.play(original_signal)
 print("Playing trimmed signal...")
 player.play(trimmed_signal)
 an over detection (echo method prints a detection where you have made no noise).
 .. code:: python
 from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
-import pyaudio
 # record = True so that we'll be able to rewind the source.
 # max_time = 10: read 10 seconds from the microphone
 asource = ADSFactory.ads(record=True, max_time=10)
 Contributing
 ============
 **auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
-Amine SEHILI <amine.sehili[_at_]gmail.com>
+Amine SEHILI <amine.sehili@gmail.com>
 September 2015
 License
 =======

Mercurial > hg > auditok

comparison quickstart.rst @ 3:364eeb8e8bd2