amine@377: Loading audio data
amine@377: ------------------
amine@377: 
amine@379: Audio data is loaded with the :func:`load` function which can read from audio
amine@379: files, the microphone or use raw audio data.
amine@379: 
amine@377: From a file
amine@377: ===========
amine@377: 
amine@379: If the first argument of :func:`load` is a string, it should be a path to an audio
amine@377: file.
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.ogg")
amine@369: 
amine@377: If input file contains a raw (headerless) audio data, passing `audio_format="raw"`
amine@377: and other audio parameters (`sampling_rate`, `sample_width` and `channels`) is
amine@377: mandatory. In the following example we pass audio parameters with their short
amine@377: names:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     region = auditok.load("audio.dat",
amine@377:                           audio_format="raw",
amine@379:                           sr=44100, # alias for `sampling_rate`
amine@379:                           sw=2      # alias for `sample_width`
amine@379:                           ch=1      # alias for `channels`
amine@379:                           )
amine@377: 
amine@377: From a `bytes` object
amine@377: =====================
amine@377: 
amine@379: If the type of the first argument `bytes`, it's interpreted as raw audio data:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     data = b"\0" * sr * sw * ch
amine@379:     region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377:     print(region)
amine@377: 
amine@377: output:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@377:     AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377: 
amine@377: From the microphone
amine@377: ===================
amine@377: 
amine@379: If the first argument is `None`, :func:`load` will try to read data from the
amine@379: microphone. Audio parameters, as well as the `max_read` parameter are mandatory:
amine@377: 
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377:     print(five_sec_audio)
amine@377: 
amine@377: output:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@377:     AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377: 
amine@377: 
amine@377: Skip part of audio data
amine@377: =======================
amine@377: 
amine@379: If the `skip` parameter is > 0, :func:`load` will skip that leading amount of audio
amine@377: data:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377: 
amine@377: This argument must be 0 when reading from the microphone.
amine@377: 
amine@377: 
amine@377: Basic split example
amine@377: -------------------
amine@377: 
amine@379: In the following we'll use the :func:`split` function to tokenize an audio file,
amine@379: requiring that valid audio events be at least 0.2 second long, at most 4 seconds
amine@379: long and contain a maximum of 0.3 second of continuous silence. Limiting the size
amine@379: of detected events to 4 seconds means that an event of, say, 9.5 seconds will
amine@379: be returned as two 4-second events plus a third 1.5-second event. Moreover, a
amine@379: valid event might contain many *silences* as far as none of them exceeds 0.3
amine@379: second.
amine@379: 
amine@379: :func:`split` returns a generator of :class:`AudioRegion`. An :class:`AudioRegion`
amine@379: can be played, saved, repeated (i.e., multiplied by an integer) and concatenated
amine@379: with another region (see examples below). Notice that :class:`AudioRegion` objects
amine@379: returned by :func:`split` have a ``start`` a ``stop`` information stored in
amine@379: their meta data that can be accessed like `object.meta.start`.
amine@379: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@377:     # split returns a generator of AudioRegion objects
amine@377:     audio_regions = auditok.split(
amine@377:         "audio.wav",
amine@377:         min_dur=0.2,     # minimum duration of a valid audio event in seconds
amine@377:         max_dur=4,       # maximum duration of an event
amine@377:         max_silence=0.3, # maximum duration of tolerated continuous silence within an event
amine@377:         energy_threshold=55 # threshold of detection
amine@377:     )
amine@377: 
amine@377:     for i, r in enumerate(audio_regions):
amine@377: 
amine@377:         # Regions returned by `split` have 'start' and 'end' metadata fields
amine@377:         print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
amine@377: 
amine@377:         # play detection
amine@377:         # r.play(progress_bar=True)
amine@377: 
amine@377:         # region's metadata can also be used with the `save` method
amine@377:         # (no need to explicitly specify region's object and `format` arguments)
amine@377:         filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav")
amine@377:         print("region saved as: {}".format(filename))
amine@377: 
amine@377: output example:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@377:     Region 0: 0.700s -- 1.400s
amine@377:     region saved as: region_0.700-1.400.wav
amine@377:     Region 1: 3.800s -- 4.500s
amine@377:     region saved as: region_3.800-4.500.wav
amine@377:     Region 2: 8.750s -- 9.950s
amine@377:     region saved as: region_8.750-9.950.wav
amine@377:     Region 3: 11.700s -- 12.400s
amine@377:     region saved as: region_11.700-12.400.wav
amine@377:     Region 4: 15.050s -- 15.850s
amine@377:     region saved as: region_15.050-15.850.wav
amine@377: 
amine@377: 
amine@377: Split and plot
amine@377: --------------
amine@377: 
amine@377: Visualize audio signal and detections:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377:     regions = region.split_and_plot(...) # or just region.splitp()
amine@369: 
amine@369: output figure:
amine@369: 
amine@369: .. image:: figures/example_1.png
amine@369: 
amine@377: 
amine@377: Read and split data from the microphone
amine@377: ---------------------------------------
amine@377: 
amine@379: If the first argument of :func:`split` is None, audio data is read from the
amine@379: microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     eth = 55 # alias for energy_threshold, default value is 50
amine@377: 
amine@377:     try:
amine@377:         for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377:             print(region)
amine@377:             region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377:     except KeyboardInterrupt:
amine@377:          pass
amine@377: 
amine@377: 
amine@379: :func:`split` will continue reading audio data until you press ``Ctrl-C``. If
amine@379: you want to read a specific amount of audio data, pass the desired number of
amine@379: seconds with the `max_read` argument.
amine@377: 
amine@377: 
amine@377: Accessing recorded data after split
amine@377: -----------------------------------
amine@377: 
amine@379: Using a :class:`Recorder` object you can get hold of acquired audio data:
amine@377: 
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     eth = 55 # alias for energy_threshold, default value is 50
amine@377: 
amine@377:     rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@377: 
amine@377:     try:
amine@377:         for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377:             print(region)
amine@377:             region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377:     except KeyboardInterrupt:
amine@377:          pass
amine@377: 
amine@377:     rec.rewind()
amine@377:     full_audio = load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379:     # alternatively you can use
amine@379:     full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@377: 
amine@377: 
amine@379: :class:`Recorder` also accepts a `max_read` argument.
amine@377: 
amine@369: Working with AudioRegions
amine@369: -------------------------
amine@369: 
amine@379: The following are a couple of interesting operations you can do with
amine@379: :class:`AudioRegion` objects.
amine@369: 
amine@377: 
amine@377: Basic region information
amine@377: ========================
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@377:     len(region) # number of audio samples int the regions, one channel considered
amine@377:     region.duration # duration in seconds
amine@377:     region.sampling_rate # alias `sr`
amine@377:     region.sample_width # alias `sw`
amine@377:     region.channels # alias `ch`
amine@377: 
amine@377: 
amine@369: Concatenate regions
amine@369: ===================
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region_1 = auditok.load("audio_1.wav")
amine@377:     region_2 = auditok.load("audio_2.wav")
amine@369:     region_3 = region_1 + region_2
amine@369: 
amine@379: Particularly useful if you want to join regions returned by :func:`split`:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     regions = auditok.load("audio.wav").split()
amine@369:     gapless_region = sum(regions)
amine@369: 
amine@369: Repeat a region
amine@369: ===============
amine@369: 
amine@369: Multiply by a positive integer:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     region_x3 = region * 3
amine@369: 
amine@377: Split one region into N regions of equal size
amine@377: =============================================
amine@369: 
amine@379: Divide by a positive integer (this has nothing to do with silence-based
amine@379: tokenization):
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     regions = regions / 5
amine@369:     assert sum(regions) == region
amine@369: 
amine@379: Note that if no perfect division is possible, the last region might be a bit
amine@379: shorter than the previous N-1 regions.
amine@369: 
amine@377: Slice a region by samples, seconds or milliseconds
amine@377: ==================================================
amine@377: 
amine@379: Slicing an :class:`AudioRegion` can be interesting in many situations. You can for
amine@377: example remove a fixed-size portion of audio data from the beginning or from the
amine@377: end of a region or crop a region by an arbitrary amount as a data augmentation
amine@379: strategy.
amine@369: 
amine@377: The most accurate way to slice an `AudioRegion` is to use indices that
amine@369: directly refer to raw audio samples. In the following example, assuming that the
amine@369: sampling rate of audio data is 16000, you can extract a 5-second region from
amine@369: main region, starting from the 20th second as follows:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     start = 20 * 16000
amine@369:     stop = 25 * 16000
amine@369:     five_second_region = region[start:stop]
amine@369: 
amine@379: This allows you to practically start and stop at any audio sample within the region.
amine@369: Just as with a `list` you can omit one of `start` and `stop`, or both. You can
amine@369: also use negative indices:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369:     three_last_seconds = region[start:]
amine@369: 
amine@379: While slicing by raw samples is flexible, slicing with temporal indices is more
amine@379: intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of an
amine@377: `AudioRegion` (or their shortcut alias `ms` and `sec` or `s`).
amine@369: 
amine@379: With the ``millis`` view:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     five_second_region = region.millis[5000:10000]
amine@369: 
amine@379: or with the ``seconds`` view:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     five_second_region = region.seconds[5:10]
amine@369: 
amine@379: ``seconds`` indices can also be floats:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@377:     five_second_region = region.seconds[2.5:7.5]
amine@377: 
amine@377: Get arrays of audio samples
amine@377: ===========================
amine@377: 
amine@377: If `numpy` is not installed, the `samples` attributes is a list of audio samples
amine@377: arrays (standard `array.array` objects), one per channels. If numpy is installed,
amine@377: `samples` is a 2-D `numpy.ndarray` where the fist dimension is the channel
amine@377: and the second is the the sample.
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     samples = region.samples
amine@379:     assert len(samples) == region.channels
amine@369: 
amine@369: 
amine@377: If `numpy` is not installed you can use:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@369:     import numpy as np
amine@377:     region = auditok.load("audio.wav")
amine@369:     samples = np.asarray(region)
amine@377:     assert len(samples.shape) == 2