Mercurial > hg > auditok
diff README.rst @ 428:1baa80ec22c3
Update README
author | Amine Sehili <amine.sehili@gmail.com> |
---|---|
date | Wed, 30 Oct 2024 10:47:56 +0100 |
parents | c89c0977db47 |
children | 97eff033c8f8 |
line wrap: on
line diff
--- a/README.rst Tue Oct 29 18:52:13 2024 +0100 +++ b/README.rst Wed Oct 30 10:47:56 2024 +0100 @@ -1,51 +1,37 @@ .. image:: doc/figures/auditok-logo.png :align: center - :alt: Build status -.. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master - :target: https://travis-ci.org/amsehili/auditok +.. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg + :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/ + :alt: Build Status .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest :target: http://auditok.readthedocs.org/en/latest/?badge=latest - :alt: Documentation status + :alt: Documentation Status -``auditok`` is an **Audio Activity Detection** tool that can process online data -(read from an audio device or from standard input) as well as audio files. -It can be used as a command-line program or by calling its API. +``auditok`` is an **Audio Activity Detection** tool that processes online data +(from an audio device or standard input) and audio files. It can be used via the command line or through its API. -The latest version of the documentation can be found on -`readthedocs. <https://auditok.readthedocs.io/en/latest/>`_ - +Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_. Installation ------------ -A basic version of ``auditok`` will run with standard Python (>=3.4). However, -without installing additional dependencies, ``auditok`` can only deal with audio -files in *wav* or *raw* formats. if you want more features, the following -packages are needed: +``auditok`` requires Python 3.7+. -- `pydub <https://github.com/jiaaro/pydub>`_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. -- `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ : read audio data from the microphone and play audio back. -- `tqdm <https://github.com/tqdm/tqdm>`_ : show progress bar while playing audio clips. -- `matplotlib <https://matplotlib.org/stable/index.html>`_ : plot audio signal and detections. -- `numpy <https://numpy.org/>`_ : required by matplotlib. Also used for some math operations instead of standard python if available. - -Install the latest stable version with pip: - +To install the latest stable version, use pip: .. code:: bash sudo pip install auditok - -Install the latest development version from github: +To install the latest development version from GitHub: .. code:: bash pip install git+https://github.com/amsehili/auditok -or +Alternatively, clone the repository and install it manually: .. code:: bash @@ -53,79 +39,100 @@ cd auditok python setup.py install - Basic example ------------- +Here's a simple example of using `auditok` to detect audio events: + .. code:: python import auditok - # split returns a generator of AudioRegion objects - audio_regions = auditok.split( + # `split` returns a generator of AudioRegion objects + audio_events = auditok.split( "audio.wav", - min_dur=0.2, # minimum duration of a valid audio event in seconds - max_dur=4, # maximum duration of an event - max_silence=0.3, # maximum duration of tolerated continuous silence within an event - energy_threshold=55 # threshold of detection + min_dur=0.2, # Minimum duration of a valid audio event in seconds + max_dur=4, # Maximum duration of an event + max_silence=0.3, # Maximum tolerated silence duration within an event + energy_threshold=55 # Detection threshold ) - for i, r in enumerate(audio_regions): + for i, r in enumerate(audio_events): + # AudioRegions returned by `split` have defined 'start' and 'end' attributes + print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}") - # Regions returned by `split` have 'start' and 'end' metadata fields - print("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r)) + # Play the audio event + r.play(progress_bar=True) - # play detection - # r.play(progress_bar=True) + # Save the event with start and end times in the filename + filename = r.save("event_{start:.3f}-{end:.3f}.wav") + print(f"Event saved as: {filename}") - # region's metadata can also be used with the `save` method - # (no need to explicitly specify region's object and `format` arguments) - filename = r.save("region_{meta.start:.3f}-{meta.end:.3f}.wav") - print("region saved as: {}".format(filename)) - -output example: +Example output: .. code:: bash - Region 0: 0.700s -- 1.400s - region saved as: region_0.700-1.400.wav - Region 1: 3.800s -- 4.500s - region saved as: region_3.800-4.500.wav - Region 2: 8.750s -- 9.950s - region saved as: region_8.750-9.950.wav - Region 3: 11.700s -- 12.400s - region saved as: region_11.700-12.400.wav - Region 4: 15.050s -- 15.850s - region saved as: region_15.050-15.850.wav - + Event 0: 0.700s -- 1.400s + Event saved as: event_0.700-1.400.wav + Event 1: 3.800s -- 4.500s + Event saved as: event_3.800-4.500.wav + Event 2: 8.750s -- 9.950s + Event saved as: event_8.750-9.950.wav + Event 3: 11.700s -- 12.400s + Event saved as: event_11.700-12.400.wav + Event 4: 15.050s -- 15.850s + Event saved as: event_15.050-15.850.wav Split and plot -------------- -Visualize audio signal and detections: +Visualize the audio signal with detected events: .. code:: python import auditok - region = auditok.load("audio.wav") # returns an AudioRegion object - regions = region.split_and_plot(...) # or just region.splitp() + region = auditok.load("audio.wav") # Returns an AudioRegion object + regions = region.split_and_plot(...) # Or simply use `region.splitp()` -output figure: +Example output: .. image:: doc/figures/example_1.png +Split an audio stream and re-join (glue) audio events with silence +------------------------------------------------------------------ + +The following detects audio events within an audio stream, then insert +1 second of silence between them to create an audio with pauses. + +.. code:: python + + # Create a 1-second silent audio region + # Audio parameters must match the original stream + from auditok import split, make_silence + silence = make_silence(duration=1, + sampling_rate=16000, + sample_width=2, + channels=1) + events = split("audio.wav") + audio_with_pauses = silence.join(events) + +Alternatively, use `split_and_join_with_silence`: + +.. code:: python + + from auditok import split_and_join_with_silence + audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav") Limitations ----------- -Currently, the core detection algorithm is based on the energy of audio signal. -While this is fast and works very well for audio streams with low background -noise (e.g., podcasts with few people talking, language lessons, audio recorded -in a rather quiet environment, etc.) the performance can drop as the level of -noise increases. Furthermore, the algorithm makes no distinction between speech -and other kinds of sounds, so you shouldn't use it for Voice Activity Detection -if your audio data also contain non-speech events. +The detection algorithm is based on audio signal energy. While it performs well +in low-noise environments (e.g., podcasts, language lessons, or quiet recordings), +performance may drop in noisy settings. Additionally, the algorithm does not +distinguish between speech and other sounds, so it is not suitable for Voice +Activity Detection in multi-sound environments. License ------- + MIT.