annotate README.rst @ 455:7dae98b84cdd tip master

Merge branch 'master' of https://github.com/amsehili/auditok
author www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk>
date Tue, 03 Dec 2024 09:18:01 +0000
parents 3911ff1d719d
children
rev   line source
amine@344 1 .. image:: doc/figures/auditok-logo.png
amsehili@343 2 :align: center
amsehili@343 3
amine@428 4 .. image:: https://github.com/amsehili/auditok/actions/workflows/ci.yml/badge.svg
amine@428 5 :target: https://github.com/amsehili/auditok/actions/workflows/ci.yml/
amine@428 6 :alt: Build Status
amine@336 7
amine@446 8 .. image:: https://codecov.io/github/amsehili/auditok/graph/badge.svg?token=0rwAqYBdkf
amine@446 9 :target: https://codecov.io/github/amsehili/auditok
amine@446 10
amine@336 11 .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest
amine@336 12 :target: http://auditok.readthedocs.org/en/latest/?badge=latest
amine@428 13 :alt: Documentation Status
amine@336 14
amine@428 15 ``auditok`` is an **Audio Activity Detection** tool that processes online data
amine@428 16 (from an audio device or standard input) and audio files. It can be used via the command line or through its API.
amsehili@349 17
amine@428 18 Full documentation is available on `Read the Docs <https://auditok.readthedocs.io/en/latest/>`_.
amsehili@349 19
amsehili@349 20 Installation
amsehili@349 21 ------------
amsehili@349 22
amine@432 23 ``auditok`` requires Python 3.7 or higher.
amine@374 24
amine@428 25 To install the latest stable version, use pip:
amsehili@383 26
amine@375 27 .. code:: bash
amine@375 28
amine@375 29 sudo pip install auditok
amine@375 30
amine@428 31 To install the latest development version from GitHub:
amine@374 32
amsehili@349 33 .. code:: bash
amsehili@349 34
amsehili@354 35 pip install git+https://github.com/amsehili/auditok
amsehili@354 36
amine@428 37 Alternatively, clone the repository and install it manually:
amine@375 38
amine@375 39 .. code:: bash
amine@375 40
amsehili@381 41 pip install git+https://github.com/amsehili/auditok
amsehili@381 42 or
amsehili@381 43 .. code:: bash
amsehili@381 44
amine@375 45 git clone https://github.com/amsehili/auditok.git
amine@375 46 cd auditok
amine@375 47 python setup.py install
amine@375 48
amsehili@343 49 Basic example
amsehili@343 50 -------------
amsehili@343 51
amine@429 52 Here's a simple example of using ``auditok`` to detect audio events:
amine@428 53
amine@336 54 .. code:: python
amine@336 55
amine@374 56 import auditok
amsehili@343 57
amine@428 58 # `split` returns a generator of AudioRegion objects
amine@428 59 audio_events = auditok.split(
amine@374 60 "audio.wav",
amine@428 61 min_dur=0.2, # Minimum duration of a valid audio event in seconds
amine@428 62 max_dur=4, # Maximum duration of an event
amine@428 63 max_silence=0.3, # Maximum tolerated silence duration within an event
amine@428 64 energy_threshold=55 # Detection threshold
amine@374 65 )
amine@375 66
amine@428 67 for i, r in enumerate(audio_events):
amine@428 68 # AudioRegions returned by `split` have defined 'start' and 'end' attributes
amine@428 69 print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")
amine@375 70
amine@428 71 # Play the audio event
amine@428 72 r.play(progress_bar=True)
amine@375 73
amine@428 74 # Save the event with start and end times in the filename
amine@428 75 filename = r.save("event_{start:.3f}-{end:.3f}.wav")
amine@428 76 print(f"Event saved as: {filename}")
amine@375 77
amine@428 78 Example output:
amine@375 79
amine@375 80 .. code:: bash
amine@375 81
amine@428 82 Event 0: 0.700s -- 1.400s
amine@428 83 Event saved as: event_0.700-1.400.wav
amine@428 84 Event 1: 3.800s -- 4.500s
amine@428 85 Event saved as: event_3.800-4.500.wav
amine@428 86 Event 2: 8.750s -- 9.950s
amine@428 87 Event saved as: event_8.750-9.950.wav
amine@428 88 Event 3: 11.700s -- 12.400s
amine@428 89 Event saved as: event_11.700-12.400.wav
amine@428 90 Event 4: 15.050s -- 15.850s
amine@428 91 Event saved as: event_15.050-15.850.wav
amine@375 92
amine@374 93 Split and plot
amine@374 94 --------------
amine@336 95
amine@428 96 Visualize the audio signal with detected events:
amine@375 97
amine@336 98 .. code:: python
amine@336 99
amine@374 100 import auditok
amine@428 101 region = auditok.load("audio.wav") # Returns an AudioRegion object
amine@428 102 regions = region.split_and_plot(...) # Or simply use `region.splitp()`
amine@336 103
amine@428 104 Example output:
amine@336 105
amine@336 106 .. image:: doc/figures/example_1.png
amsehili@349 107
amine@428 108 Split an audio stream and re-join (glue) audio events with silence
amine@428 109 ------------------------------------------------------------------
amine@428 110
amine@429 111 The following code detects audio events within an audio stream, then insert
amine@429 112 1 second of silence between them to create an audio with pauses:
amine@428 113
amine@428 114 .. code:: python
amine@428 115
amine@428 116 # Create a 1-second silent audio region
amine@428 117 # Audio parameters must match the original stream
amine@428 118 from auditok import split, make_silence
amine@428 119 silence = make_silence(duration=1,
amine@428 120 sampling_rate=16000,
amine@428 121 sample_width=2,
amine@428 122 channels=1)
amine@428 123 events = split("audio.wav")
amine@428 124 audio_with_pauses = silence.join(events)
amine@428 125
amine@429 126 Alternatively, use ``split_and_join_with_silence``:
amine@428 127
amine@428 128 .. code:: python
amine@428 129
amine@428 130 from auditok import split_and_join_with_silence
amine@428 131 audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")
amine@375 132
amine@429 133 Export an ``AudioRegion`` as a ``numpy`` array
amine@429 134 ----------------------------------------------
amine@429 135
amine@429 136 .. code:: python
amine@429 137
amine@429 138 from auditok import load, AudioRegion
amine@429 139 audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
amine@429 140 x = audio.numpy()
amine@429 141 assert x.shape[0] == audio.channels
amine@429 142 assert x.shape[1] == len(audio)
amine@429 143
amine@429 144
amine@374 145 Limitations
amine@374 146 -----------
amsehili@349 147
amine@428 148 The detection algorithm is based on audio signal energy. While it performs well
amine@428 149 in low-noise environments (e.g., podcasts, language lessons, or quiet recordings),
amine@428 150 performance may drop in noisy settings. Additionally, the algorithm does not
amine@428 151 distinguish between speech and other sounds, so it is not suitable for Voice
amine@428 152 Activity Detection in multi-sound environments.
amsehili@349 153
amsehili@349 154 License
amsehili@349 155 -------
amine@428 156
amsehili@349 157 MIT.