changeset 40:23dbe3bacdf7

doc update
author Amine Sehili <amine.sehili@gmail.com>
date Thu, 03 Dec 2015 01:21:44 +0100
parents 755ff17eb2bf
children ee6c9924df75
files README.md README.rst quickstart.rst
diffstat 3 files changed, 8 insertions(+), 573 deletions(-) [+]
line wrap: on
line diff
--- a/README.md	Thu Dec 03 01:10:21 2015 +0100
+++ b/README.md	Thu Dec 03 01:21:44 2015 +0100
@@ -41,17 +41,21 @@
 
 Requirements
 ------------
-`auditok` can be used with standard Python! 
-However if you want more features, the following packages are needed:
+`auditok` can be used with standard Python!
+
+However, if you want more features, the following packages are needed:
 - [pydub](https://github.com/jiaaro/pydub): read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
 - [PyAudio](http://people.csail.mit.edu/hubert/pyaudio/): read audio data from the microphone and play back detections
-- `matplotlib`: plot audio signal and detections (see figures above)
-- `numpy`: required by matplotlib. Also used for math operations instead of standard python if available
+- [matplotlib](http://matplotlib.org/): plot audio signal and detections (see figures above)
+- [numpy](http://www.numpy.org): required by matplotlib. Also used for math operations instead of standard python if available
 - Optionnaly, you can use `sox` or `parecord` for data acquisition and feed `auditok` using a pipe.
 
 
 Installation
 ------------
+
+    git clone https://github.com/amsehili/auditok.git
+    cd auditok
     python setup.py install
 
 Command line usage
--- a/README.rst	Thu Dec 03 01:10:21 2015 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,48 +0,0 @@
-auditok, an AUDIo TOKenization tool
-===================================
-
-.. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master
-    :target: https://travis-ci.org/amsehili/auditok
-    
-.. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
-    :target: http://auditok.readthedocs.org/en/latest/?badge=latest
-    :alt: Documentation Status
-
-**auditok** is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program and offers an easy to use API.
-
-The latest version of this documentation can be found at `Readthedocs <http://auditok.readthedocs.org/en/latest/>`_.
-
-Requirements
-------------
-
-`auditok` can be used with standard Python!
-
-However if you want more features, the following packages are needed:
-
-- `Pydub <https://github.com/jiaaro/pydub>`_ : read audio files of popular audio formats (ogg, mp3, etc.) or extract audio from a video file
-
-- `PyAudio <http://people.csail.mit.edu/hubert/pyaudio/>`_ : read audio data from the microphone and play back detections
-
-- `matplotlib <http://matplotlib.org/>`_ : plot audio signal and detections (see figures above)
-
-- `numpy <http://www.numpy.org>`_ : required by matplotlib. Also used for math operations instead of standard python if available
-
-- Optionally, you can use **sox** or **parecord** for data acquisition and feed **auditok** using a pipe.
-
-Installation
-------------
-
-.. code:: bash
-
-    git clone https://github.com/amsehili/auditok.git
-    cd auditok
-    sudo python setup.py install
-
-
-
-Getting started
----------------
-
-- `Command-line Usage Guide <http://auditok.readthedocs.org/en/latest/cmdline.html>`_ 
-- `API Tutorial <http://auditok.readthedocs.org/en/latest/apitutorial.html>`_
-- `API Reference <http://auditok.readthedocs.org/en/latest/index.html>`_
\ No newline at end of file
--- a/quickstart.rst	Thu Dec 03 01:10:21 2015 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,521 +0,0 @@
-.. auditok documentation.
-
-auditok, an AUDIo TOKenization module
-=====================================
-
-.. contents:: `Contents`
-   :depth: 3
-
-
-**auditok**  is a module that can be used as a generic tool for data
-tokenization. Although its core motivation is **Acoustic Activity 
-Detection** (AAD) and extraction from audio streams (i.e. detect
-where a noise/an acoustic activity occurs within an audio stream and
-extract the corresponding portion of signal), it can easily be
-adapted to other tasks.
-
-Globally speaking, it can be used to extract, from a sequence of
-observations, all sub-sequences that meet a certain number of
-criteria in terms of:
-
-1. Minimum length of a **valid** token (i.e. sub-sequence)
-2. Maximum length of a valid token
-3. Maximum tolerated consecutive **non-valid** observations within
-   a valid token
-
-Examples of a non-valid observation are: a non-numeric ascii symbol
-if you are interested in sub-sequences of numeric symbols, or a silent
-audio window (of 10, 20 or 100 milliseconds for instance) if what
-interests you are audio regions made up of a sequence of "noisy"
-windows (whatever kind of noise: speech, baby cry, laughter, etc.).
-
-The most important component of `auditok` is the `auditok.core.StreamTokenizer`
-class. An instance of this class encapsulates a `DataValidator` and can be 
-configured to detect the desired regions from a stream.
-The `StreamTokenizer.tokenize` method accepts a `DataSource`
-object that has a `read` method. Read data can be of any type accepted
-by the `validator`.
-
-
-As the main aim of this module is **Audio Activity Detection**,
-it provides the `auditok.util.ADSFactory` factory class that makes
-it very easy to create an `AudioDataSource` (a class that implements `DataSource`)
-object, be that from:
-
-- A file on the disk
-- A buffer of data
-- The built-in microphone (requires PyAudio)
- 
-
-The `AudioDataSource` class inherits from `DataSource` and supplies
-a higher abstraction level than `AudioSource` thanks to a bunch of
-handy features:
-
-- Define a fixed-length block_size (i.e. analysis window)
-- Allow overlap between two consecutive analysis windows (hop_size < block_size). This can be very important if your validator use the **spectral** information of audio data instead of raw audio samples.
-- Limit the amount (i.e. duration) of read data (very useful when reading data from the microphone)
-- Record and rewind data (also useful if you read data from the microphone and you want to process it many times off-line and/or save it)  
-
-
-Last but not least, the current version has only one audio window validator based on
-signal energy.
-
-Requirements
-============
-
-`auditok` requires `Pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
-for audio acquisition and playback.
-
-
-Illustrative examples with strings
-==================================
-
-Let us look at some examples using the `auditok.util.StringDataSource` class
-created for test and illustration purposes. Imagine that each character of 
-`auditok.util.StringDataSource` data represent an audio slice of 100 ms for
-example. In the following examples we will use upper case letters to represent
-noisy audio slices (i.e. analysis windows or frames) and lower case letter for
-silent frames.
-
-
-Extract sub-sequences of consecutive upper case letters
--------------------------------------------------------
-
-
-We want to extract sub-sequences of characters that have:
-    
-- A minimum length of 1 (`min_length` = 1)
-- A maximum length of 9999 (`max_length` = 9999)
-- Zero consecutive lower case characters within them (`max_continuous_silence` = 0)
-
-We also create the `UpperCaseChecker` whose `read` method returns `True` if the 
-checked character is in upper case and `False` otherwise. 
-
-.. code:: python
-      
-    from auditok import StreamTokenizer, StringDataSource, DataValidator
-    
-    class UpperCaseChecker(DataValidator):
-       def is_valid(self, frame):
-          return frame.isupper()
-    
-    dsource = StringDataSource("aaaABCDEFbbGHIJKccc")
-    tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 
-                 min_length=1, max_length=9999, max_continuous_silence=0)
-                 
-    tokenizer.tokenize(dsource)
-
-The output is a list of two tuples, each contains the extracted sub-sequence and its
-start and end position in the original sequence respectively:
-
-
-.. code:: python
-
-    
-    [(['A', 'B', 'C', 'D', 'E', 'F'], 3, 8), (['G', 'H', 'I', 'J', 'K'], 11, 15)]
-    
-
-Tolerate up to two non-valid (lower case) letters within an extracted sequence
-------------------------------------------------------------------------------
-
-To do so, we set `max_continuous_silence` =2:
-
-.. code:: python
-
-
-    from auditok import StreamTokenizer, StringDataSource, DataValidator
-    
-    class UpperCaseChecker(DataValidator):
-       def is_valid(self, frame):
-          return frame.isupper()
-    
-    dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
-    tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 
-                 min_length=1, max_length=9999, max_continuous_silence=2)
-                 
-    tokenizer.tokenize(dsource)
-
-
-output:
-
-.. code:: python
-  
-    [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I', 'd', 'd'], 3, 16), (['J', 'K', 'e', 'e'], 18, 21)]
-    
-Notice the trailing lower case letters "dd" and "ee" at the end of the two
-tokens. The default behavior of `StreamTokenizer` is to keep the *trailing
-silence* if it doesn't exceed `max_continuous_silence`. This can be changed
-using the `DROP_TRAILING_SILENCE` mode (see next example).
-
-Remove trailing silence
------------------------
-
-Trailing silence can be useful for many sound recognition applications, including
-speech recognition. Moreover, from the human auditory system point of view, trailing
-low energy signal helps removing abrupt signal cuts.
-
-If you want to remove it anyway, you can do it by setting `mode` to `StreamTokenizer.DROP_TRAILING_SILENCE`:
-
-.. code:: python
-
-    from auditok import StreamTokenizer, StringDataSource, DataValidator
-    
-    class UpperCaseChecker(DataValidator):
-       def is_valid(self, frame):
-          return frame.isupper()
-    
-    dsource = StringDataSource("aaaABCDbbEFcGHIdddJKee")
-    tokenizer = StreamTokenizer(validator=UpperCaseChecker(), 
-                 min_length=1, max_length=9999, max_continuous_silence=2,
-                 mode=StreamTokenizer.DROP_TRAILING_SILENCE)
-                 
-    tokenizer.tokenize(dsource)
-
-output:
-
-.. code:: python
-
-    [(['A', 'B', 'C', 'D', 'b', 'b', 'E', 'F', 'c', 'G', 'H', 'I'], 3, 14), (['J', 'K'], 18, 19)]
-
-
-
-Limit the length of detected tokens
------------------------------------
-
-
-Imagine that you just want to detect and recognize a small part of a long
-acoustic event (e.g. engine noise, water flow, etc.) and avoid that that 
-event hogs the tokenizer and prevent it from feeding the event to the next
-processing step (i.e. a sound recognizer). You can do this by:
-
- - limiting the length of a detected token.
- 
- and
- 
- - using a callback function as an argument to `StreamTokenizer.tokenize`
-   so that the tokenizer delivers a token as soon as it is detected.
-
-The following code limits the length of a token to 5:
-
-.. code:: python
-    
-    from auditok import StreamTokenizer, StringDataSource, DataValidator
-    
-    class UpperCaseChecker(DataValidator):
-       def is_valid(self, frame):
-          return frame.isupper()
-    
-    dsource = StringDataSource("aaaABCDEFGHIJKbbb")
-    tokenizer = StreamTokenizer(validator=UpperCaseChecker(),
-                 min_length=1, max_length=5, max_continuous_silence=0)
-                 
-    def print_token(data, start, end):
-        print("token = '{0}', starts at {1}, ends at {2}".format(''.join(data), start, end))
-                 
-    tokenizer.tokenize(dsource, callback=print_token)
-    
-
-output:
-
-.. code:: python
-
-    "token = 'ABCDE', starts at 3, ends at 7"
-    "token = 'FGHIJ', starts at 8, ends at 12"
-    "token = 'K', starts at 13, ends at 13"
-
-
-
-Using real audio data
-=====================
-
-In this section we will use `ADSFactory`, `AudioEnergyValidator` and `StreamTokenizer`
-for an AAD demonstration using audio data. Before we get any, further it is worth
-explaining a certain number of points.
-
-`ADSFactory.ads` method is called to create an `AudioDataSource` object that can be
-passed to  `StreamTokenizer.tokenize`. `ADSFactory.ads` accepts a number of keyword
-arguments, of which none is mandatory. The returned `AudioDataSource` object can 
-however greatly differ depending on the passed arguments. Further details can be found
-in the respective method documentation. Note however the following two calls that will
-create an `AudioDataSource` that read data from an audio file and from the built-in
-microphone respectively.
-
-.. code:: python
-    
-    from auditok import ADSFactory
-    
-    # Get an AudioDataSource from a file
-    file_ads = ADSFactory.ads(filename = "path/to/file/")
-    
-    # Get an AudioDataSource from the built-in microphone
-    # The returned object has the default values for sampling
-    # rate, sample width an number of channels. see method's
-    # documentation for customized values 
-    mic_ads = ADSFactory.ads()
-    
-For `StreamTkenizer`, parameters `min_length`, `max_length` and `max_continuous_silence`
-are expressed in term of number of frames. If you want a `max_length` of *2 seconds* for
-your detected sound events and your *analysis window* is *10 ms* long, you have to specify
-a `max_length` of 200 (`int(2. / (10. / 1000)) == 200`). For a `max_continuous_silence` of *300 ms*
-for instance, the value to pass to StreamTokenizer is 30 (`int(0.3 / (10. / 1000)) == 30`).
-
-
-Where do you get the size of the **analysis window** from?
-
-
-Well this is a parameter you pass to `ADSFactory.ads`. By default `ADSFactory.ads` uses
-an analysis window of 10 ms. the number of samples that 10 ms of signal contain will
-vary depending on the sampling rate of your audio source (file, microphone, etc.).
-For a sampling rate of 16KHz (16000 samples per second), we have 160 samples for 10 ms.
-Therefore you can use block sizes of 160, 320, 1600 for analysis windows of 10, 20 and 100 
-ms respectively.
-
-.. code:: python
-    
-    from auditok import ADSFactory
-    
-    file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 160)
-    
-    file_ads = ADSFactory.ads(filename = "path/to/file/", block_size = 320)
-    
-    # If no sampling rate is specified, ADSFactory use 16KHz as the default
-    # rate for the microphone. If you want to use a window of 100 ms, use 
-    # a block size of 1600 
-    mic_ads = ADSFactory.ads(block_size = 1600)
-    
-So if your not sure what you analysis windows in seconds is, use the following:
-
-.. code:: python
-    
-    my_ads = ADSFactory.ads(...)
-    analysis_win_seconds = float(my_ads.get_block_size()) / my_ads.get_sampling_rate()
-    analysis_window_ms = analysis_win_seconds * 1000
-    
-    # For a `max_continuous_silence` of 300 ms use:
-    max_continuous_silence = int(300. / analysis_window_ms)
-    
-    # Which is the same as
-    max_continuous_silence = int(0.3 / (analysis_window_ms / 1000))
-    
-
-
-Extract isolated phrases from an utterance
-------------------------------------------
-
-We will build an `AudioDataSource` using a wave file from  the database.
-The file contains of isolated pronunciation of digits from 1 to 1
-in Arabic as well as breath-in/out between 2 and 3. The code will play the
-original file then the detected sounds separately. Note that we use an 
-`energy_threshold` of 65, this parameter should be carefully chosen. It depends
-on microphone quality, background noise and the amplitude of events you want to 
-detect.
-
-.. code:: python
-
-    from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
-     
-    # We set the `record` argument to True so that we can rewind the source
-    asource = ADSFactory.ads(filename=dataset.one_to_six_arabic_16000_mono_bc_noise, record=True)
-     
-    validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=65)
-    
-    # Defalut analysis window is 10 ms (float(asource.get_block_size()) / asource.get_sampling_rate())
-    # min_length=20 : minimum length of a valid audio activity is 20 * 10 == 200 ms
-    # max_length=4000 :  maximum length of a valid audio activity is 400 * 10 == 4000 ms == 4 seconds
-    # max_continuous_silence=30 : maximum length of a tolerated  silence within a valid audio activity is 30 * 30 == 300 ms 
-    tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=400, max_continuous_silence=30)
-    
-    asource.open()
-    tokens = tokenizer.tokenize(asource)
-    
-    # Play detected regions back
-    
-    player = player_for(asource)
-    
-    # Rewind and read the whole signal
-    asource.rewind()
-    original_signal = []
-
-    while True:
-       w = asource.read()
-       if w is None:
-          break
-       original_signal.append(w)
-       
-    original_signal = ''.join(original_signal)
-    
-    print("Playing the original file...")
-    player.play(original_signal)
-    
-    print("playing detected regions...")
-    for t in tokens:
-        print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
-        data = ''.join(t[0])
-        player.play(data)
-        
-    assert len(tokens) == 8
-    
-
-The tokenizer extracts 8 audio regions from the signal, including all isolated digits
-(from 1 to 6) as well as the 2-phase respiration of the subject. You might have noticed
-that, in the original file, the last three digit are closer to each other than the 
-previous ones. If you wan them to be extracted as one single phrase, you can do so
-by tolerating a larger continuous silence within a detection:
- 
-.. code:: python
-    
-    tokenizer.max_continuous_silence = 50
-    asource.rewind()
-    tokens = tokenizer.tokenize(asource)
-    
-    for t in tokens:
-       print("Token starts at {0} and ends at {1}".format(t[1], t[2]))
-       data = ''.join(t[0])
-       player.play(data)
-    
-    assert len(tokens) == 6
-        
-         
-Trim leading and trailing silence
----------------------------------
- 
-The  tokenizer in the following example is set up to remove the silence
-that precedes the first acoustic activity or follows the last activity 
-in a record. It preserves whatever it founds between the two activities.
-In other words, it removes the leading and trailing silence.
-
-Sampling rate is 44100 sample per second, we'll use an analysis window of 100 ms
-(i.e. block_size == 4410)
-
-Energy threshold is 50.
-
-The tokenizer will start accumulating windows up from the moment it encounters
-the first analysis window of an energy >= 50. ALL the following windows will be 
-kept regardless of their energy. At the end of the analysis, it will drop trailing
-windows with an energy below 50.
-
-This is an interesting example because the audio file we're analyzing contains a very
-brief noise that occurs within the leading silence. We certainly do want our tokenizer 
-to stop at this point and considers whatever it comes after as a useful signal.
-To force the tokenizer to ignore that brief event we use two other parameters `init_min`
-ans `init_max_silence`. By `init_min` = 3 and `init_max_silence` = 1 we tell the tokenizer
-that a valid event must start with at least 3 noisy windows, between which there
-is at most 1 silent window.
-
-Still with this configuration we can get the tokenizer detect that noise as a valid event
-(if it actually contains 3 consecutive noisy frames). To circumvent this we use an enough
-large analysis window (here of 100 ms) to ensure that the brief noise be surrounded by a much
-longer silence and hence the energy of the overall analysis window will be below 50.
-
-When using a shorter analysis window (of 10ms for instance, block_size == 441), the brief
-noise contributes more to energy calculation which yields an energy of over 50 for the window.
-Again we can deal with this situation by using a higher energy threshold (55 for example).
-
-.. code:: python
-
-    from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for, dataset
-
-    # record = True so that we'll be able to rewind the source.
-    asource = ADSFactory.ads(filename=dataset.was_der_mensch_saet_mono_44100_lead_trail_silence,
-             record=True, block_size=4410)
-    asource.open()
-
-    original_signal = []
-    # Read the whole signal
-    while True:
-       w = asource.read()
-       if w is None:
-          break
-       original_signal.append(w)
-    
-    original_signal = ''.join(original_signal)
-    
-    # rewind source
-    asource.rewind()
-    
-    # Create a validator with an energy threshold of 50
-    validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
-    
-    # Create a tokenizer with an unlimited token length and continuous silence within a token
-    # Note the DROP_TRAILING_SILENCE mode that will ensure removing trailing silence
-    trimmer = StreamTokenizer(validator, min_length = 20, max_length=99999999, init_min=3, init_max_silence=1, max_continuous_silence=9999999, mode=StreamTokenizer.DROP_TRAILING_SILENCE)
-    
-    
-    tokens = trimmer.tokenize(asource)
-    
-    # Make sure we only have one token
-    assert len(tokens) == 1, "Should have detected one single token"
-    
-    trimmed_signal = ''.join(tokens[0][0])
-    
-    player = player_for(asource)
-    
-    print("Playing original signal (with leading and trailing silence)...")
-    player.play(original_signal)
-    print("Playing trimmed signal...")
-    player.play(trimmed_signal)
-    
-
-Online audio signal processing
-------------------------------
-
-In the next example, audio data is directly acquired from the built-in microphone.
-The `tokenize` method is passed a callback function so that audio activities
-are delivered as soon as they are detected. Each detected activity is played
-back using the build-in audio output device.
-
-As mentioned before , Signal energy is strongly related to many factors such
-microphone sensitivity, background noise (including noise inherent to the hardware), 
-distance and your operating system sound settings. Try a lower `energy_threshold`
-if your noise does not seem to be detected and a higher threshold if you notice
-an over detection (echo method prints a detection where you have made no noise).
-
-.. code:: python
-
-    from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for
-     
-    # record = True so that we'll be able to rewind the source.
-    # max_time = 10: read 10 seconds from the microphone
-    asource = ADSFactory.ads(record=True, max_time=10)
-    
-    validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), energy_threshold=50)
-    tokenizer = StreamTokenizer(validator=validator, min_length=20, max_length=250, max_continuous_silence=30)
-    
-    player = player_for(asource)
-    
-    def echo(data, start, end):
-       print("Acoustic activity at: {0}--{1}".format(start, end))
-       player.play(''.join(data))
-       
-    asource.open()
-    
-    tokenizer.tokenize(asource, callback=echo)
-
-If you want to re-run the tokenizer after changing of one or many parameters, use the following code:
-
-.. code:: python
-
-    asource.rewind()
-    # change energy threshold for example
-    tokenizer.validator.set_energy_threshold(55)
-    tokenizer.tokenize(asource, callback=echo)
-
-In case you want to play the whole recorded signal back use:
-
-.. code:: python
-
-    player.play(asource.get_audio_source().get_data_buffer())
-    
-
-Contributing
-============
-**auditok** is on `GitHub <https://github.com/amsehili/auditok>`_. You're welcome to fork it and contribute.
-
-
-Amine SEHILI <amine.sehili@gmail.com>
-September 2015
-
-License
-=======
-
-This package is published under GNU GPL Version 3.