Mercurial > hg > auditok
changeset 443:0fec1621aa3f
Merge branch 'master' of https://github.com/amsehili/auditok
author | www-data <www-data@c4dm-xenserv-virt2.eecs.qmul.ac.uk> |
---|---|
date | Thu, 31 Oct 2024 08:17:59 +0000 |
parents | ca8090edca2f (current diff) 44dcd2c4d860 (diff) |
children | 3911ff1d719d |
files | |
diffstat | 5 files changed, 46 insertions(+), 39 deletions(-) [+] |
line wrap: on
line diff
--- a/INSTALL Wed Oct 30 22:17:58 2024 +0000 +++ b/INSTALL Thu Oct 31 08:17:59 2024 +0000 @@ -4,6 +4,10 @@ ### Install the latest version on Github + pip install git+https://github.com/amsehili/auditok + +or: + git clone https://github.com/amsehili/auditok.git cd auditok - sudo python setup.py install + python setup.py install
--- a/auditok/signal.py Wed Oct 30 22:17:58 2024 +0000 +++ b/auditok/signal.py Thu Oct 31 08:17:59 2024 +0000 @@ -90,7 +90,7 @@ The energy is calculated as: .. math:: - \text{energy} = 20 \log\left(\sqrt{\frac{1}{N} \sum_{i=1}^{N} a_i^2}\right) % # noqa: W605 + \\text{energy} = 20 \\log(\\sqrt({1}/{N} \\sum_{i=1}^{N} {a_i}^2)) % # noqa: W605 where `a_i` is the i-th audio sample and `N` is the total number of samples in `x`.
--- a/auditok/util.py Wed Oct 30 22:17:58 2024 +0000 +++ b/auditok/util.py Thu Oct 31 08:17:59 2024 +0000 @@ -260,7 +260,7 @@ as: .. math:: - energy = 20 \log(\sqrt({1}/{N} \sum_{i=1}^{N} {a_i}^2)) % # noqa: W605 + \\text{energy} = 20 \\log(\\sqrt({1}/{N} \\sum_{i=1}^{N} {a_i}^2)) % # noqa: W605 where `a_i` represents the i-th audio sample.
--- a/doc/command_line_usage.rst Wed Oct 30 22:17:58 2024 +0000 +++ b/doc/command_line_usage.rst Thu Oct 31 08:17:59 2024 +0000 @@ -11,8 +11,8 @@ Below, we provide several examples covering the most common use cases. -Read audio data and detect audio events online ----------------------------------------------- +Real-Time audio acquisition and event detection +----------------------------------------------- To try ``auditok`` from the command line with your own voice, you’ll need to either install `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_ so @@ -96,8 +96,8 @@ program but you can use it to run any other command. -Print out detection information -------------------------------- +Output detection details +------------------------ By default, ``auditok`` outputs the **id**, **start**, and **end** times for each detected audio event. The start and end values indicate the beginning and end of @@ -139,12 +139,14 @@ To completely disable printing detection information use ``-q``. + Save detections --------------- You can save audio events to disk as they're detected using ``-o`` or -``--save-detections-as``. To create a uniq file name for each event, you can use -``{id}``, ``{start}``, ``{end}`` and ``{duration}`` placeholders. Example: +``--save-detections-as`` followed by a file name with placeholders. To create +a uniq file name for each event, you can use ``{id}``, ``{start}``, ``{end}`` +and ``{duration}`` placeholders as in this example: .. code:: bash @@ -160,8 +162,8 @@ auditok -o "{id}_{start:.3f}_{end:.3f}.wav" -Record the full audio stream ----------------------------- +Save the full audio stream +-------------------------- When reading audio data from the microphone, you may want to save it to disk. To do this, use the ``-O`` or ``--save-stream`` option: @@ -176,25 +178,26 @@ Join detected audio events, inserting a silence between them ------------------------------------------------------------ -Sometimes, you may want to detect audio events while also -creating a file that contains the same events with modified -pause durations. +Sometimes, you may want to detect audio events and create a new file containing +these events with pauses of a specific duration between them. This is useful if +you wish to preserve your original audio data while adjusting the length of pauses +(either shortening or extending them). To achieve this, use the ``-j`` or ``--join-detections`` option together with the ``-O`` / ``--save-stream`` option. In the example below, we -read data from `input.wav` and save audio events to `output.wav`, adding +read data from ``input.wav`` and save audio events to ``output.wav``, adding 1-second pauses between them: - .. code:: bash auditok input.wav --join-detections 1 -O output.wav + Plot detections --------------- Audio signal and detections can be plotted using the ``-p`` or ``--plot`` option. -You can also save plot to disk using ``--save-image``. The following example +You can also save the plot to disk using ``--save-image``. The following example demonstrates both: .. code:: bash
--- a/doc/examples.rst Wed Oct 30 22:17:58 2024 +0000 +++ b/doc/examples.rst Thu Oct 31 08:17:59 2024 +0000 @@ -8,7 +8,7 @@ From a file =========== -If the first argument of :func:`load` is a string or a `Path`, it should +If the first argument of :func:`load` is a string or a ``Path``, it should refer to an existing audio file. .. code:: python @@ -19,7 +19,7 @@ If the input file contains raw (headerless) audio data, specifying audio parameters (``sampling_rate``, ``sample_width``, and ``channels``) is required. Additionally, if the file name does not end with 'raw', you should explicitly -pass `audio_format="raw"` to the function. +pass ``audio_format="raw"`` to the function. In the example below, we provide audio parameters using their abbreviated names: @@ -40,15 +40,15 @@ region = AudioRegion.load("audio.dat", audio_format="raw", sr=44100, # alias for `sampling_rate` - sw=2, # alias for `sample_width` + sw=2, # alias for `sample_width` ch=1 # alias for `channels` ) -From a `bytes` object -===================== +From a ``bytes`` object +======================= -If the first argument is of type `bytes`, it is interpreted as raw audio data: +If the first argument is of type ``bytes``, it is interpreted as raw audio data: .. code:: python @@ -70,8 +70,8 @@ From the microphone =================== -If the first argument is `None`, :func:`load` will attempt to read data from the -microphone. In this case, audio parameters, along with the `max_read` parameter, +If the first argument is ``None``, :func:`load` will attempt to read data from the +microphone. In this case, audio parameters, along with the ``max_read`` parameter, are required. .. code:: python @@ -131,7 +131,7 @@ :func:`split` returns a generator of :class:`AudioRegion` objects. Each :class:`AudioRegion` can be played, saved, repeated (multiplied by an integer), and concatenated with another region (see examples below). Note that -:class:`AudioRegion` objects returned by :func:`split` include `start` and `stop` +:class:`AudioRegion` objects returned by :func:`split` include ``start`` and ``stop`` attributes, which mark the beginning and end of the audio event relative to the input audio stream. @@ -157,22 +157,22 @@ # Save the event with start and end times in the filename filename = r.save("event_{start:.3f}-{end:.3f}.wav") - print(f"Event saved as: {filename}") + print(f"event saved as: {filename}") Example output: .. code:: bash Event 0: 0.700s -- 1.400s - Event saved as: event_0.700-1.400.wav + event saved as: event_0.700-1.400.wav Event 1: 3.800s -- 4.500s - Event saved as: event_3.800-4.500.wav + event saved as: event_3.800-4.500.wav Event 2: 8.750s -- 9.950s - Event saved as: event_8.750-9.950.wav + event saved as: event_8.750-9.950.wav Event 3: 11.700s -- 12.400s - Event saved as: event_11.700-12.400.wav + event saved as: event_11.700-12.400.wav Event 4: 15.050s -- 15.850s - Event saved as: event_15.050-15.850.wav + event saved as: event_15.050-15.850.wav Split and plot -------------- @@ -215,8 +215,8 @@ audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav") -Read and split data from the microphone ---------------------------------------- +Read audio data from the microphone and perform real-time event detection +------------------------------------------------------------------------- If the first argument of :func:`split` is ``None``, audio data is read from the microphone (requires `pyaudio <https://people.csail.mit.edu/hubert/pyaudio>`_): @@ -240,7 +240,7 @@ :func:`split` will continue reading audio data until you press ``Ctrl-C``. To read a specific amount of audio data, pass the desired number of seconds using the -`max_read` argument. +``max_read`` argument. Access recorded data after split @@ -277,13 +277,13 @@ full_audio.play(progress_bar=True) -:class:`Recorder` also accepts a `max_read` argument. +:class:`Recorder` also accepts a ``max_read`` argument. Working with AudioRegions ------------------------- In the following sections, we will review several operations -that can be performed with :class:AudioRegion objects. +that can be performed with :class:`AudioRegion` objects. Basic region information ======================== @@ -355,8 +355,8 @@ the beginning or end of a region, or crop a region by an arbitrary amount as a data augmentation strategy. -The most accurate way to slice an `AudioRegion` is by using indices that -directly refer to raw audio samples. In the following example, assuming +The most accurate way to slice an :class:`AudioRegion` is by using indices +that directly refer to raw audio samples. In the following example, assuming the audio data has a sampling rate of 16000, you can extract a 5-second segment from the main region, starting at the 20th second, as follows: