Mercurial > hg > auditok

diff doc/examples.rst @ 377:c6308873f239
Improve documentation, add more examples
author: Amine Sehili <amine.sehili@gmail.com>
date: Wed, 17 Feb 2021 21:18:05 +0100
parents: 0106c4799906
children: df2a320e10d5
--- a/doc/examples.rst	Fri Feb 05 21:44:08 2021 +0100
+++ b/doc/examples.rst	Wed Feb 17 21:18:05 2021 +0100
@@ -1,52 +1,242 @@
-Basic example
--------------
+Loading audio data
+------------------
+
+From a file
+===========
+
+If the first argument of `load` is a string, it should be a path to an audio
+file.
 
 .. code:: python
 
-    from auditok import split
+    import auditok
+    region = auditok.load("audio.ogg")
 
-    # split returns a generator of AudioRegion objects
-    audio_regions = split("audio.wav")
-    for region in audio_regions:
-        region.play(progress_bar=True)
-        filename = region.save("/tmp/region_{meta.start:.3f}.wav")
-        print("region saved as: {}".format(filename))
-
-Example using `AudioRegion`
----------------------------
+If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
+and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
+mandatory. In the following example we pass audio parameters with their short
+names:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
-    regions = region.split_and_plot() # or just region.splitp()
+    region = auditok.load("audio.dat",
+                          audio_format="raw",
+                          sr=44100,
+                          sw=2
+                          ch=1)
+
+From a `bytes` object
+=====================
+
+If the first argument is of type `bytes` it's interpreted as raw audio data:
+
+.. code:: python
+
+    sr = 16000
+    sw = 2
+    ch = 1
+    data = b"\0" * sr * sw * ch
+    load(data, sr=sr, sw=sw, ch=ch)
+    print(region)
+
+output:
+
+.. code:: bash
+
+    AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
+
+From the microphone
+===================
+
+If the first argument is `None`, `load` will try to read data from the microphone.
+Audio parameters, as well as the `max_read` parameter are mandatory:
+
+
+.. code:: python
+
+    sr = 16000
+    sw = 2
+    ch = 1
+    five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
+    print(five_sec_audio)
+
+output:
+
+.. code:: bash
+
+    AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
+
+
+Skip part of audio data
+=======================
+
+If the `skip` parameter is > 0, `load` will skip that leading amount of audio
+data:
+
+.. code:: python
+
+    import auditok
+    region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
+
+This argument must be 0 when reading from the microphone.
+
+
+Basic split example
+-------------------
+
+.. code:: python
+
+    import auditok
+
+    # split returns a generator of AudioRegion objects
+    audio_regions = auditok.split(
+        "audio.wav",
+        min_dur=0.2,     # minimum duration of a valid audio event in seconds
+        max_dur=4,       # maximum duration of an event
+        max_silence=0.3, # maximum duration of tolerated continuous silence within an event
+        energy_threshold=55 # threshold of detection
+    )
+
+    for i, r in enumerate(audio_regions):
+
+        # Regions returned by `split` have 'start' and 'end' metadata fields
+        print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
+
+        # play detection
+        # r.play(progress_bar=True)
+
+        # region's metadata can also be used with the `save` method
+        # (no need to explicitly specify region's object and `format` arguments)
+        filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
+        print("region saved as: {}".format(filename))
+
+output example:
+
+.. code:: bash
+
+    Region 0: 0.700s -- 1.400s
+    region saved as: region_0.700-1.400.wav
+    Region 1: 3.800s -- 4.500s
+    region saved as: region_3.800-4.500.wav
+    Region 2: 8.750s -- 9.950s
+    region saved as: region_8.750-9.950.wav
+    Region 3: 11.700s -- 12.400s
+    region saved as: region_11.700-12.400.wav
+    Region 4: 15.050s -- 15.850s
+    region saved as: region_15.050-15.850.wav
+
+
+Split and plot
+--------------
+
+Visualize audio signal and detections:
+
+.. code:: python
+
+    import auditok
+    region = auditok.load("audio.wav") # returns an AudioRegion object
+    regions = region.split_and_plot(...) # or just region.splitp()
 
 output figure:
 
 .. image:: figures/example_1.png
 
+
+Read and split data from the microphone
+---------------------------------------
+
+If the first argument of `split` is None, audio data is read from the microphone
+(requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
+
+.. code:: python
+
+    import auditok
+
+    sr = 16000
+    sw = 2
+    ch = 1
+    eth = 55 # alias for energy_threshold, default value is 50
+
+    try:
+        for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
+            print(region)
+            region.play(progress_bar=True) # progress bar requires `tqdm`
+    except KeyboardInterrupt:
+         pass
+
+
+`split` will continue reading audio data until you press ``Ctrl-C``. If you want
+to read a specific amount of audio data, pass the desired number of seconds with
+the `max_read` argument.
+
+
+Accessing recorded data after split
+-----------------------------------
+
+Using a `Recorder` object you can get hold of acquired audio:
+
+
+.. code:: python
+
+    import auditok
+
+    sr = 16000
+    sw = 2
+    ch = 1
+    eth = 55 # alias for energy_threshold, default value is 50
+
+    rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
+
+    try:
+        for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
+            print(region)
+            region.play(progress_bar=True) # progress bar requires `tqdm`
+    except KeyboardInterrupt:
+         pass
+
+    rec.rewind()
+    full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
+
+
+`Recorder` also accepts a `max_read` argument.
+
 Working with AudioRegions
 -------------------------
 
 Beyond splitting, there are a couple of interesting operations you can do with
 `AudioRegion` objects.
 
+
+Basic region information
+========================
+
+.. code:: python
+
+    import auditok
+    region = auditok.load("audio.wav")
+    len(region) # number of audio samples int the regions, one channel considered
+    region.duration # duration in seconds
+    region.sampling_rate # alias `sr`
+    region.sample_width # alias `sw`
+    region.channels # alias `ch`
+
+
 Concatenate regions
 ===================
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region_1 = AudioRegion.load("audio_1.wav")
-    region_2 = AudioRegion.load("audio_2.wav")
+    import auditok
+    region_1 = auditok.load("audio_1.wav")
+    region_2 = auditok.load("audio_2.wav")
     region_3 = region_1 + region_2
 
-Particularly useful if you want to join regions returned by ``split``:
+Particularly useful if you want to join regions returned by `split`:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    regions = AudioRegion.load("audio.wav").split()
+    import auditok
+    regions = auditok.load("audio.wav").split()
     gapless_region = sum(regions)
 
 Repeat a region
@@ -56,92 +246,105 @@
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     region_x3 = region * 3
 
-Make slices of equal size out of a region
-=========================================
+Split one region into N regions of equal size
+=============================================
 
 Divide by a positive integer:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     regions = regions / 5
     assert sum(regions) == region
 
-Make audio slices of arbitrary size
-===================================
+Note that if perfect division is possible, the last region might be a bit shorter
+than the previous N-1 regions.
 
-Slicing an ``AudioRegion`` can be interesting in many situations. You can for
-example remove a fixed-size portion of audio data from the beginning or the end
-of a region or crop a region by an arbitrary amount as a data augmentation
+Slice a region by samples, seconds or milliseconds
+==================================================
+
+Slicing an `AudioRegion` can be interesting in many situations. You can for
+example remove a fixed-size portion of audio data from the beginning or from the
+end of a region or crop a region by an arbitrary amount as a data augmentation
 strategy, etc.
 
-The most accurate way to slice an ``AudioRegion`` is to use indices that
+The most accurate way to slice an `AudioRegion` is to use indices that
 directly refer to raw audio samples. In the following example, assuming that the
 sampling rate of audio data is 16000, you can extract a 5-second region from
 main region, starting from the 20th second as follows:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     start = 20 * 16000
     stop = 25 * 16000
     five_second_region = region[start:stop]
 
-This allows you to practically start and stop at any sample within the region.
+This allows you to practically start and stop at any audio sample of the region.
 Just as with a `list` you can omit one of `start` and `stop`, or both. You can
 also use negative indices:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     start = -3 * region.sr # `sr` is an alias of `sampling_rate`
     three_last_seconds = region[start:]
 
 While slicing by raw samples is accurate, slicing with temporal indices is more
-intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of
-``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``).
+intuitive. You can do so by accessing the `millis` or `seconds` views of an
+`AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
 
-With the ``millis`` view:
+With the `millis` view:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     five_second_region = region.millis[5000:10000]
 
-or with the ``seconds`` view:
+or with the `seconds` view:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
     five_second_region = region.seconds[5:10]
 
-Get an array of audio samples
-=============================
+`seconds` indices can also be floats:
 
 .. code:: python
 
-    from auditok import AudioRegion
-    region = AudioRegion.load("audio.wav")
+    import auditok
+    region = auditok.load("audio.wav")
+    five_second_region = region.seconds[2.5:7.5]
+
+Get arrays of audio samples
+===========================
+
+If `numpy` is not installed, the `samples` attributes is a list of audio samples
+arrays (standard `array.array` objects), one per channels. If numpy is installed,
+`samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
+and the second is the the sample.
+
+.. code:: python
+
+    import auditok
+    region = auditok.load("audio.wav")
     samples = region.samples
 
-If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data
-is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not
-installed this will return a standard ``array.array`` for mono data, and a list
-of ``array.array`` for multichannel data.
 
-Alternatively you can use:
+If `numpy` is not installed you can use:
 
 .. code:: python
 
     import numpy as np
-    region = AudioRegion.load("audio.wav")
+    region = auditok.load("audio.wav")
     samples = np.asarray(region)
+    assert len(samples.shape) == 2
author	Amine Sehili <amine.sehili@gmail.com>
date	Wed, 17 Feb 2021 21:18:05 +0100
parents	0106c4799906
children	df2a320e10d5