amine@387: Load audio data amine@387: --------------- amine@377: amine@432: Audio data is loaded using the :func:`load` function, which can read from amine@432: audio files, capture from the microphone, or accept raw audio data amine@432: (as a ``bytes`` object). amine@379: amine@377: From a file amine@377: =========== amine@377: amine@441: If the first argument of :func:`load` is a string or a ``Path``, it should amine@432: refer to an existing audio file. amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.ogg") amine@369: amine@432: If the input file contains raw (headerless) audio data, specifying audio amine@432: parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required. amine@432: Additionally, if the file name does not end with 'raw', you should explicitly amine@441: pass ``audio_format="raw"`` to the function. amine@432: amine@432: In the example below, we provide audio parameters using their abbreviated names: amine@369: amine@369: .. code:: python amine@369: amine@377: region = auditok.load("audio.dat", amine@377: audio_format="raw", amine@379: sr=44100, # alias for `sampling_rate` amine@432: sw=2, # alias for `sample_width` amine@379: ch=1 # alias for `channels` amine@379: ) amine@377: amine@432: Alternatively you can user :class:`AudioRegion` to load audio data: amine@432: amine@432: .. code:: python amine@432: amine@432: from auditok import AudioRegion amine@432: region = AudioRegion.load("audio.dat", amine@432: audio_format="raw", amine@432: sr=44100, # alias for `sampling_rate` amine@441: sw=2, # alias for `sample_width` amine@432: ch=1 # alias for `channels` amine@432: ) amine@432: amine@432: amine@441: From a ``bytes`` object amine@441: ======================= amine@377: amine@441: If the first argument is of type ``bytes``, it is interpreted as raw audio data: amine@377: amine@377: .. code:: python amine@377: amine@377: sr = 16000 amine@377: sw = 2 amine@377: ch = 1 amine@377: data = b"\0" * sr * sw * ch amine@379: region = auditok.load(data, sr=sr, sw=sw, ch=ch) amine@377: print(region) amine@387: # alternatively you can use amine@432: region = auditok.AudioRegion(data, sr, sw, ch) amine@377: amine@377: output: amine@377: amine@377: .. code:: bash amine@377: amine@377: AudioRegion(duration=1.000, sampling_rate=16000, sample_width=2, channels=1) amine@377: amine@377: From the microphone amine@377: =================== amine@377: amine@441: If the first argument is ``None``, :func:`load` will attempt to read data from the amine@441: microphone. In this case, audio parameters, along with the ``max_read`` parameter, amine@432: are required. amine@377: amine@377: .. code:: python amine@377: amine@377: sr = 16000 amine@377: sw = 2 amine@377: ch = 1 amine@377: five_sec_audio = load(None, sr=sr, sw=sw, ch=ch, max_read=5) amine@377: print(five_sec_audio) amine@377: amine@377: output: amine@377: amine@377: .. code:: bash amine@377: amine@377: AudioRegion(duration=5.000, sampling_rate=16000, sample_width=2, channels=1) amine@377: amine@377: amine@377: Skip part of audio data amine@377: ======================= amine@377: amine@432: If the ``skip`` parameter is greater than 0, :func:`load` will skip that specified amine@432: amount of leading audio data, measured in seconds: amine@377: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: region = auditok.load("audio.ogg", skip=2) # skip the first 2 seconds amine@377: amine@387: This argument must be 0 when reading data from the microphone. amine@387: amine@387: amine@387: Limit the amount of read audio amine@387: ============================== amine@387: amine@432: If the ``max_read`` parameter is > 0, :func:`load` will read at most that amount amine@387: in seconds of audio data: amine@387: amine@387: .. code:: python amine@387: amine@387: import auditok amine@387: region = auditok.load("audio.ogg", max_read=5) amine@387: assert region.duration <= 5 amine@387: amine@432: This argument is required when reading data from the microphone. amine@377: amine@377: amine@377: Basic split example amine@377: ------------------- amine@377: amine@432: In the following example, we'll use the :func:`split` function to tokenize an amine@432: audio file.We’ll specify that valid audio events must be at least 0.2 seconds amine@432: long, no longer than 4 seconds, and contain no more than 0.3 seconds of continuous amine@432: silence. By setting a 4-second limit, an event lasting 9.5 seconds, for instance, amine@432: will be returned as two 4-second events plus a final 1.5-second event. Additionally, amine@432: a valid event may contain multiple silences, as long as none exceed 0.3 seconds. amine@379: amine@432: :func:`split` returns a generator of :class:`AudioRegion` objects. Each amine@432: :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer), amine@432: and concatenated with another region (see examples below). Note that amine@441: :class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop`` amine@432: attributes, which mark the beginning and end of the audio event relative to the amine@432: input audio stream. amine@379: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: amine@432: # `split` returns a generator of AudioRegion objects amine@432: audio_events = auditok.split( amine@377: "audio.wav", amine@432: min_dur=0.2, # Minimum duration of a valid audio event in seconds amine@432: max_dur=4, # Maximum duration of an event amine@432: max_silence=0.3, # Maximum tolerated silence duration within an event amine@432: energy_threshold=55 # Detection threshold amine@377: ) amine@377: amine@432: for i, r in enumerate(audio_events): amine@432: # AudioRegions returned by `split` have defined 'start' and 'end' attributes amine@432: print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}") amine@377: amine@432: # Play the audio event amine@432: r.play(progress_bar=True) amine@377: amine@432: # Save the event with start and end times in the filename amine@432: filename = r.save("event_{start:.3f}-{end:.3f}.wav") amine@441: print(f"event saved as: {filename}") amine@377: amine@432: Example output: amine@377: amine@377: .. code:: bash amine@377: amine@432: Event 0: 0.700s -- 1.400s amine@441: event saved as: event_0.700-1.400.wav amine@432: Event 1: 3.800s -- 4.500s amine@441: event saved as: event_3.800-4.500.wav amine@432: Event 2: 8.750s -- 9.950s amine@441: event saved as: event_8.750-9.950.wav amine@432: Event 3: 11.700s -- 12.400s amine@441: event saved as: event_11.700-12.400.wav amine@432: Event 4: 15.050s -- 15.850s amine@441: event saved as: event_15.050-15.850.wav amine@377: amine@377: Split and plot amine@377: -------------- amine@377: amine@377: Visualize audio signal and detections: amine@377: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: region = auditok.load("audio.wav") # returns an AudioRegion object amine@377: regions = region.split_and_plot(...) # or just region.splitp() amine@369: amine@369: output figure: amine@369: amine@369: .. image:: figures/example_1.png amine@369: amine@432: Split an audio stream and re-join (glue) audio events with silence amine@432: ------------------------------------------------------------------ amine@432: amine@432: The following code detects audio events within an audio stream, then insert amine@432: 1 second of silence between them to create an audio with pauses: amine@432: amine@432: .. code:: python amine@432: amine@432: # Create a 1-second silent audio region amine@432: # Audio parameters must match the original stream amine@432: from auditok import split, make_silence amine@432: silence = make_silence(duration=1, amine@432: sampling_rate=16000, amine@432: sample_width=2, amine@432: channels=1) amine@432: events = split("audio.wav") amine@432: audio_with_pauses = silence.join(events) amine@432: amine@432: Alternatively, use ``split_and_join_with_silence``: amine@432: amine@432: .. code:: python amine@432: amine@432: from auditok import split_and_join_with_silence amine@432: audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav") amine@432: amine@377: amine@441: Read audio data from the microphone and perform real-time event detection amine@441: ------------------------------------------------------------------------- amine@377: amine@432: If the first argument of :func:`split` is ``None``, audio data is read from the amine@379: microphone (requires `pyaudio `_): amine@377: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: amine@377: sr = 16000 amine@377: sw = 2 amine@377: ch = 1 amine@377: eth = 55 # alias for energy_threshold, default value is 50 amine@377: amine@377: try: amine@377: for region in auditok.split(input=None, sr=sr, sw=sw, ch=ch, eth=eth): amine@377: print(region) amine@377: region.play(progress_bar=True) # progress bar requires `tqdm` amine@377: except KeyboardInterrupt: amine@377: pass amine@377: amine@377: amine@432: :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read amine@432: a specific amount of audio data, pass the desired number of seconds using the amine@441: ``max_read`` argument. amine@377: amine@377: amine@387: Access recorded data after split amine@387: -------------------------------- amine@377: amine@432: Using a :class:`Recorder` object you can access to audio data read from a file amine@432: of from the mirophone. With the following code press ``Ctrl-C`` to stop recording: amine@377: amine@377: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: amine@377: sr = 16000 amine@377: sw = 2 amine@377: ch = 1 amine@377: eth = 55 # alias for energy_threshold, default value is 50 amine@377: amine@377: rec = auditok.Recorder(input=None, sr=sr, sw=sw, ch=ch) amine@432: events = [] amine@377: amine@377: try: amine@377: for region in auditok.split(rec, sr=sr, sw=sw, ch=ch, eth=eth): amine@377: print(region) amine@432: region.play(progress_bar=True) amine@432: events.append(region) amine@377: except KeyboardInterrupt: amine@377: pass amine@377: amine@377: rec.rewind() amine@454: full_audio = auditok.load(rec.data, sr=sr, sw=sw, ch=ch) amine@379: # alternatively you can use amine@379: full_audio = auditok.AudioRegion(rec.data, sr, sw, ch) amine@432: full_audio.play(progress_bar=True) amine@377: amine@377: amine@441: :class:`Recorder` also accepts a ``max_read`` argument. amine@377: amine@369: Working with AudioRegions amine@369: ------------------------- amine@369: amine@432: In the following sections, we will review several operations amine@441: that can be performed with :class:`AudioRegion` objects. amine@377: amine@377: Basic region information amine@377: ======================== amine@377: amine@377: .. code:: python amine@377: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@377: len(region) # number of audio samples int the regions, one channel considered amine@377: region.duration # duration in seconds amine@377: region.sampling_rate # alias `sr` amine@377: region.sample_width # alias `sw` amine@377: region.channels # alias `ch` amine@377: amine@432: When an audio region is returned by the :func:`split` function, it includes defined amine@432: ``start`` and ``end`` attributes that refer to the beginning and end of the audio amine@432: event relative to the input audio stream. amine@377: amine@369: Concatenate regions amine@369: =================== amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region_1 = auditok.load("audio_1.wav") amine@377: region_2 = auditok.load("audio_2.wav") amine@369: region_3 = region_1 + region_2 amine@369: amine@432: This is particularly useful when you want to join regions returned by the amine@432: :func:`split` function: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: regions = auditok.load("audio.wav").split() amine@369: gapless_region = sum(regions) amine@369: amine@369: Repeat a region amine@369: =============== amine@369: amine@369: Multiply by a positive integer: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: region_x3 = region * 3 amine@369: amine@377: Split one region into N regions of equal size amine@377: ============================================= amine@369: amine@432: Divide by a positive integer (this is unrelated to silence-based tokenization!): amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: regions = regions / 5 amine@369: assert sum(regions) == region amine@369: amine@432: Note that if an exact split is not possible, the last region may be shorter amine@432: than the preceding N-1 regions. amine@369: amine@377: Slice a region by samples, seconds or milliseconds amine@377: ================================================== amine@377: amine@432: Slicing an :class:`AudioRegion` can be useful in various situations. amine@432: For example, you can remove a fixed-length portion of audio data from amine@432: the beginning or end of a region, or crop a region by an arbitrary amount amine@432: as a data augmentation strategy. amine@369: amine@441: The most accurate way to slice an :class:`AudioRegion` is by using indices amine@441: that directly refer to raw audio samples. In the following example, assuming amine@432: the audio data has a sampling rate of 16000, you can extract a 5-second amine@432: segment from the main region, starting at the 20th second, as follows: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: start = 20 * 16000 amine@369: stop = 25 * 16000 amine@369: five_second_region = region[start:stop] amine@369: amine@432: This allows you to start and stop at any audio sample within the region. Similar amine@432: to a ``list``, you can omit either ``start`` or ``stop``, or both. Negative amine@432: indices are also supported: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: start = -3 * region.sr # `sr` is an alias of `sampling_rate` amine@369: three_last_seconds = region[start:] amine@369: amine@432: While slicing by raw samples offers flexibility, using temporal indices is amine@432: often more intuitive. You can achieve this by accessing the ``millis`` or ``seconds`` amine@432: *views* of an :class:`AudioRegion` (or using their shortcut aliases ``ms``, ``sec``, or ``s``). amine@369: amine@379: With the ``millis`` view: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: five_second_region = region.millis[5000:10000] amine@432: # or amine@432: five_second_region = region.ms[5000:10000] amine@369: amine@379: or with the ``seconds`` view: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@369: five_second_region = region.seconds[5:10] amine@432: # or amine@432: five_second_region = region.sec[5:10] amine@432: # or amine@432: five_second_region = region.s[5:10] amine@369: amine@379: ``seconds`` indices can also be floats: amine@369: amine@369: .. code:: python amine@369: amine@377: import auditok amine@377: region = auditok.load("audio.wav") amine@377: five_second_region = region.seconds[2.5:7.5] amine@377: amine@432: Export an ``AudioRegion`` as a ``numpy`` array amine@432: ============================================== amine@377: amine@377: .. code:: python amine@377: amine@432: from auditok import load, AudioRegion amine@432: audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")` amine@432: x = audio.numpy() amine@432: assert x.shape[0] == audio.channels amine@432: assert x.shape[1] == len(audio)