amine@387: Load audio data
amine@387: ---------------
amine@377: 
amine@432: Audio data is loaded using the :func:`load` function, which can read from
amine@432: audio files, capture from the microphone, or accept raw audio data
amine@432: (as a ``bytes`` object).
amine@379: 
amine@377: From a file
amine@377: ===========
amine@377: 
amine@441: If the first argument of :func:`load` is a string or a ``Path``, it should
amine@432: refer to an existing audio file.
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.ogg")
amine@369: 
amine@432: If the input file contains raw (headerless) audio data, specifying audio
amine@432: parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required.
amine@432: Additionally, if the file name does not end with 'raw', you should explicitly
amine@441: pass ``audio_format="raw"`` to the function.
amine@432: 
amine@432: In the example below, we provide audio parameters using their abbreviated names:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     region = auditok.load("audio.dat",
amine@377:                           audio_format="raw",
amine@379:                           sr=44100, # alias for `sampling_rate`
amine@432:                           sw=2,      # alias for `sample_width`
amine@379:                           ch=1      # alias for `channels`
amine@379:                           )
amine@377: 
amine@432: Alternatively you can user :class:`AudioRegion` to load audio data:
amine@432: 
amine@432: .. code:: python
amine@432: 
amine@432:     from auditok import AudioRegion
amine@432:     region = AudioRegion.load("audio.dat",
amine@432:                               audio_format="raw",
amine@432:                               sr=44100, # alias for `sampling_rate`
amine@441:                               sw=2,     # alias for `sample_width`
amine@432:                               ch=1      # alias for `channels`
amine@432:                               )
amine@432: 
amine@432: 
amine@441: From a ``bytes`` object
amine@441: =======================
amine@377: 
amine@441: If the first argument is of type ``bytes``, it is interpreted as raw audio data:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     data = b"\0" * sr * sw * ch
amine@379:     region = auditok.load(data, sr=sr, sw=sw, ch=ch)
amine@377:     print(region)
amine@387:     # alternatively you can use
amine@432:     region = auditok.AudioRegion(data, sr, sw, ch)
amine@377: 
amine@377: output:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@377:     AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377: 
amine@377: From the microphone
amine@377: ===================
amine@377: 
amine@441: If the first argument is ``None``, :func:`load` will attempt to read data from the
amine@441: microphone. In this case, audio parameters, along with the ``max_read`` parameter,
amine@432: are required.
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5)
amine@377:     print(five_sec_audio)
amine@377: 
amine@377: output:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@377:     AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1)
amine@377: 
amine@377: 
amine@377: Skip part of audio data
amine@377: =======================
amine@377: 
amine@432: If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified
amine@432: amount of leading audio data, measured in seconds:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds
amine@377: 
amine@387: This argument must be 0 when reading data from the microphone.
amine@387: 
amine@387: 
amine@387: Limit the amount of read audio
amine@387: ==============================
amine@387: 
amine@432: If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount
amine@387: in seconds of audio data:
amine@387: 
amine@387: .. code:: python
amine@387: 
amine@387:     import auditok
amine@387:     region = auditok.load("audio.ogg", max_read=5)
amine@387:     assert region.duration <= 5
amine@387: 
amine@432: This argument is required when reading data from the microphone.
amine@377: 
amine@377: 
amine@377: Basic split example
amine@377: -------------------
amine@377: 
amine@432: In the following example, we'll use the :func:`split` function to tokenize an
amine@432: audio file.We’ll specify that valid audio events must be at least 0.2 seconds
amine@432: long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous
amine@432: silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance,
amine@432: will be returned as two 4-second events plus a final 1.5-second event. Additionally,
amine@432: a valid event may contain multiple silences, as long as none exceed 0.3 seconds.
amine@379: 
amine@432: :func:`split` returns a generator of :class:`AudioRegion` objects. Each
amine@432: :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer),
amine@432: and concatenated with another region (see examples below). Note that
amine@441: :class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop``
amine@432: attributes, which mark the beginning and end of the audio event relative to the
amine@432: input audio stream.
amine@379: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@432:     # `split` returns a generator of AudioRegion objects
amine@432:     audio_events = auditok.split(
amine@377:         "audio.wav",
amine@432:         min_dur=0.2,     # Minimum duration of a valid audio event in seconds
amine@432:         max_dur=4,       # Maximum duration of an event
amine@432:         max_silence=0.3, # Maximum tolerated silence duration within an event
amine@432:         energy_threshold=55 # Detection threshold
amine@377:     )
amine@377: 
amine@432:     for i, r in enumerate(audio_events):
amine@432:         # AudioRegions returned by `split` have defined 'start' and 'end' attributes
amine@432:         print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
amine@377: 
amine@432:         # Play the audio event
amine@432:         r.play(progress_bar=True)
amine@377: 
amine@432:         # Save the event with start and end times in the filename
amine@432:         filename = r.save("event_{start:.3f}-{end:.3f}.wav")
amine@441:         print(f"event saved as: {filename}")
amine@377: 
amine@432: Example output:
amine@377: 
amine@377: .. code:: bash
amine@377: 
amine@432:     Event 0: 0.700s -- 1.400s
amine@441:     event saved as: event_0.700-1.400.wav
amine@432:     Event 1: 3.800s -- 4.500s
amine@441:     event saved as: event_3.800-4.500.wav
amine@432:     Event 2: 8.750s -- 9.950s
amine@441:     event saved as: event_8.750-9.950.wav
amine@432:     Event 3: 11.700s -- 12.400s
amine@441:     event saved as: event_11.700-12.400.wav
amine@432:     Event 4: 15.050s -- 15.850s
amine@441:     event saved as: event_15.050-15.850.wav
amine@377: 
amine@377: Split and plot
amine@377: --------------
amine@377: 
amine@377: Visualize audio signal and detections:
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav") # returns an AudioRegion object
amine@377:     regions = region.split_and_plot(...) # or just region.splitp()
amine@369: 
amine@369: output figure:
amine@369: 
amine@369: .. image:: figures/example_1.png
amine@369: 
amine@432: Split an audio stream and re-join (glue) audio events with silence
amine@432: ------------------------------------------------------------------
amine@432: 
amine@432: The following code detects audio events within an audio stream, then insert
amine@432: 1 second of silence between them to create an audio with pauses:
amine@432: 
amine@432: .. code:: python
amine@432: 
amine@432:     # Create a 1-second silent audio region
amine@432:     # Audio parameters must match the original stream
amine@432:     from auditok import split, make_silence
amine@432:     silence = make_silence(duration=1,
amine@432:                            sampling_rate=16000,
amine@432:                            sample_width=2,
amine@432:                            channels=1)
amine@432:     events = split("audio.wav")
amine@432:     audio_with_pauses = silence.join(events)
amine@432: 
amine@432: Alternatively, use ``split_and_join_with_silence``:
amine@432: 
amine@432: .. code:: python
amine@432: 
amine@432:     from auditok import split_and_join_with_silence
amine@432:     audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
amine@432: 
amine@377: 
amine@441: Read audio data from the microphone and perform real-time event detection
amine@441: -------------------------------------------------------------------------
amine@377: 
amine@432: If the first argument of :func:`split` is ``None``, audio data is read from the
amine@379: microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_):
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     eth = 55 # alias for energy_threshold, default value is 50
amine@377: 
amine@377:     try:
amine@377:         for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377:             print(region)
amine@377:             region.play(progress_bar=True) # progress bar requires `tqdm`
amine@377:     except KeyboardInterrupt:
amine@377:          pass
amine@377: 
amine@377: 
amine@432: :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read
amine@432: a specific amount of audio data, pass the desired number of seconds using the
amine@441: ``max_read`` argument.
amine@377: 
amine@377: 
amine@387: Access recorded data after split
amine@387: --------------------------------
amine@377: 
amine@432: Using a :class:`Recorder` object you can access to audio data read from a file
amine@432: of from the mirophone. With the following code press ``Ctrl-C`` to stop recording:
amine@377: 
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377: 
amine@377:     sr = 16000
amine@377:     sw = 2
amine@377:     ch = 1
amine@377:     eth = 55 # alias for energy_threshold, default value is 50
amine@377: 
amine@377:     rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch)
amine@432:     events = []
amine@377: 
amine@377:     try:
amine@377:         for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth):
amine@377:             print(region)
amine@432:             region.play(progress_bar=True)
amine@432:             events.append(region)
amine@377:     except KeyboardInterrupt:
amine@377:          pass
amine@377: 
amine@377:     rec.rewind()
amine@454:     full_audio = auditok.load(rec.data, sr=sr, sw=sw, ch=ch)
amine@379:     # alternatively you can use
amine@379:     full_audio = auditok.AudioRegion(rec.data, sr, sw, ch)
amine@432:     full_audio.play(progress_bar=True)
amine@377: 
amine@377: 
amine@441: :class:`Recorder` also accepts a ``max_read`` argument.
amine@377: 
amine@369: Working with AudioRegions
amine@369: -------------------------
amine@369: 
amine@432: In the following sections, we will review several operations
amine@441: that can be performed with :class:`AudioRegion` objects.
amine@377: 
amine@377: Basic region information
amine@377: ========================
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@377:     len(region) # number of audio samples int the regions, one channel considered
amine@377:     region.duration # duration in seconds
amine@377:     region.sampling_rate # alias `sr`
amine@377:     region.sample_width # alias `sw`
amine@377:     region.channels # alias `ch`
amine@377: 
amine@432: When an audio region is returned by the :func:`split` function, it includes defined
amine@432: ``start`` and ``end`` attributes that refer to the beginning and end of the audio
amine@432: event relative to the input audio stream.
amine@377: 
amine@369: Concatenate regions
amine@369: ===================
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region_1 = auditok.load("audio_1.wav")
amine@377:     region_2 = auditok.load("audio_2.wav")
amine@369:     region_3 = region_1 + region_2
amine@369: 
amine@432: This is particularly useful when you want to join regions returned by the
amine@432: :func:`split` function:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     regions = auditok.load("audio.wav").split()
amine@369:     gapless_region = sum(regions)
amine@369: 
amine@369: Repeat a region
amine@369: ===============
amine@369: 
amine@369: Multiply by a positive integer:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     region_x3 = region * 3
amine@369: 
amine@377: Split one region into N regions of equal size
amine@377: =============================================
amine@369: 
amine@432: Divide by a positive integer (this is unrelated to silence-based tokenization!):
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     regions = regions / 5
amine@369:     assert sum(regions) == region
amine@369: 
amine@432: Note that if an exact split is not possible, the last region may be shorter
amine@432: than the preceding N-1 regions.
amine@369: 
amine@377: Slice a region by samples, seconds or milliseconds
amine@377: ==================================================
amine@377: 
amine@432: Slicing an :class:`AudioRegion` can be useful in various situations.
amine@432: For example, you can remove a fixed-length portion of audio data from
amine@432: the beginning or end of a region, or crop a region by an arbitrary amount
amine@432: as a data augmentation strategy.
amine@369: 
amine@441: The most accurate way to slice an :class:`AudioRegion` is by using indices
amine@441: that directly refer to raw audio samples. In the following example, assuming
amine@432: the audio data has a sampling rate of 16000, you can extract a 5-second
amine@432: segment from the main region, starting at the 20th second, as follows:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     start = 20 * 16000
amine@369:     stop = 25 * 16000
amine@369:     five_second_region = region[start:stop]
amine@369: 
amine@432: This allows you to start and stop at any audio sample within the region. Similar
amine@432: to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative
amine@432: indices are also supported:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     start = -3 * region.sr # `sr` is an alias of `sampling_rate`
amine@369:     three_last_seconds = region[start:]
amine@369: 
amine@432: While slicing by raw samples offers flexibility, using temporal indices is
amine@432: often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds``
amine@432: *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``).
amine@369: 
amine@379: With the ``millis`` view:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     five_second_region = region.millis[5000:10000]
amine@432:     # or
amine@432:     five_second_region = region.ms[5000:10000]
amine@369: 
amine@379: or with the ``seconds`` view:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@369:     five_second_region = region.seconds[5:10]
amine@432:     # or
amine@432:     five_second_region = region.sec[5:10]
amine@432:     # or
amine@432:     five_second_region = region.s[5:10]
amine@369: 
amine@379: ``seconds`` indices can also be floats:
amine@369: 
amine@369: .. code:: python
amine@369: 
amine@377:     import auditok
amine@377:     region = auditok.load("audio.wav")
amine@377:     five_second_region = region.seconds[2.5:7.5]
amine@377: 
amine@432: Export an ``AudioRegion`` as a ``numpy`` array
amine@432: ==============================================
amine@377: 
amine@377: .. code:: python
amine@377: 
amine@432:     from auditok import load, AudioRegion
amine@432:     audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
amine@432:     x = audio.numpy()
amine@432:     assert x.shape[0] == audio.channels
amine@432:     assert x.shape[1] == len(audio)