Mercurial > hg > auditok
view README.rst @ 455:7dae98b84cdd tip master
Merge branch 'master' of https://github.com/amsehili/auditok
author | www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk> |
---|---|
date | Tue, 03 Dec 2024 09:18:01 +0000 |
parents | 3911ff1d719d |
children |
line wrap: on
line source
.. image:: doc/figures/auditok-logo.png :align: center .. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/ :alt: Build Status .. image:: https://codecov.io/github/amsehili/auditok/graph/badge.svg?token=0rwAqYBdkf :target: https://codecov.io/github/amsehili/auditok .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest :target: http://auditok.readthedocs.org/en/latest/?badge=latest :alt: Documentation Status ``auditok`` is an **Audio Activity Detection** tool that processes online data (from an audio device or standard input) and audio files. It can be used via the command line or through its API. Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_. Installation ------------ ``auditok`` requires Python 3.7 or higher. To install the latest stable version, use pip: .. code:: bash sudo pip install auditok To install the latest development version from GitHub: .. code:: bash pip install git+https://github.com/amsehili/auditok Alternatively, clone the repository and install it manually: .. code:: bash pip install git+https://github.com/amsehili/auditok or .. code:: bash git clone https://github.com/amsehili/auditok.git cd auditok python setup.py install Basic example ------------- Here's a simple example of using ``auditok`` to detect audio events: .. code:: python import auditok # `split` returns a generator of AudioRegion objects audio_events = auditok.split( "audio.wav", min_dur=0.2, # Minimum duration of a valid audio event in seconds max_dur=4, # Maximum duration of an event max_silence=0.3, # Maximum tolerated silence duration within an event energy_threshold=55 # Detection threshold ) for i, r in enumerate(audio_events): # AudioRegions returned by `split` have defined 'start' and 'end' attributes print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}") # Play the audio event r.play(progress_bar=True) # Save the event with start and end times in the filename filename = r.save("event_{start:.3f}-{end:.3f}.wav") print(f"Event saved as: {filename}") Example output: .. code:: bash Event 0: 0.700s -- 1.400s Event saved as: event_0.700-1.400.wav Event 1: 3.800s -- 4.500s Event saved as: event_3.800-4.500.wav Event 2: 8.750s -- 9.950s Event saved as: event_8.750-9.950.wav Event 3: 11.700s -- 12.400s Event saved as: event_11.700-12.400.wav Event 4: 15.050s -- 15.850s Event saved as: event_15.050-15.850.wav Split and plot -------------- Visualize the audio signal with detected events: .. code:: python import auditok region = auditok.load("audio.wav") # Returns an AudioRegion object regions = region.split_and_plot(...) # Or simply use `region.splitp()` Example output: .. image:: doc/figures/example_1.png Split an audio stream and re-join (glue) audio events with silence ------------------------------------------------------------------ The following code detects audio events within an audio stream, then insert 1 second of silence between them to create an audio with pauses: .. code:: python # Create a 1-second silent audio region # Audio parameters must match the original stream from auditok import split, make_silence silence = make_silence(duration=1, sampling_rate=16000, sample_width=2, channels=1) events = split("audio.wav") audio_with_pauses = silence.join(events) Alternatively, use ``split_and_join_with_silence``: .. code:: python from auditok import split_and_join_with_silence audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav") Export an ``AudioRegion`` as a ``numpy`` array ---------------------------------------------- .. code:: python from auditok import load, AudioRegion audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")` x = audio.numpy() assert x.shape[0] == audio.channels assert x.shape[1] == len(audio) Limitations ----------- The detection algorithm is based on audio signal energy. While it performs well in low-noise environments (e.g., podcasts, language lessons, or quiet recordings), performance may drop in noisy settings. Additionally, the algorithm does not distinguish between speech and other sounds, so it is not suitable for Voice Activity Detection in multi-sound environments. License ------- MIT.