amsehili@349: amsehili@343: amine@344: .. image:: doc/figures/auditok-logo.png amsehili@343: :align: center amsehili@343: amine@336: .. image:: https://travis-ci.org/amsehili/auditok.svg?branch=master amine@336: :target: https://travis-ci.org/amsehili/auditok amine@336: amine@336: .. image:: https://readthedocs.org/projects/auditok/badge/?version=latest amine@336: :target: http://auditok.readthedocs.org/en/latest/?badge=latest amine@336: :alt: Documentation Status amine@336: amsehili@349: ``auditok`` is an **Audio Activity Detection** tool that can process online data (read from an audio device or from standard input) as well as audio files. It can be used as a command line program or by calling its API. amsehili@349: amsehili@349: A basic version of ``auditok`` will run with standard Python (>=3.4). Without installing additional dependencies, ``auditok`` can only deal with audio files in *wav* or *raw* formats. if you want more features, the following packages are needed: amsehili@349: amsehili@349: - `pydub `_ : read audio files in popular audio formats (ogg, mp3, etc.) or extract audio from a video file. amsehili@349: amsehili@349: - `pyaudio `_ : read audio data from the microphone and play back detections. amsehili@349: amsehili@349: - `tqdm `_ : show progress bar while playing audio clips. amsehili@349: amsehili@349: - `matplotlib `_ : plot audio signal and detections (see figures above ). amsehili@349: amsehili@349: - `numpy `_ : required by matplotlib. Also used for some math operations instead of standard python if available. amsehili@349: amsehili@349: Installation amsehili@349: ------------ amsehili@349: amsehili@349: .. code:: bash amsehili@349: amsehili@349: git clone https://github.com/amsehili/auditok.git amsehili@349: cd auditok amsehili@349: python setup.py install amsehili@349: amsehili@343: Basic example amsehili@343: ------------- amsehili@343: amine@336: .. code:: python amine@336: amine@336: from auditok import split amsehili@343: amsehili@343: # split returns a generator of AudioRegion objects amine@336: audio_regions = split("audio.wav") amine@336: for region in audio_regions: amine@336: region.play(progress_bar=True) amine@336: filename = region.save("/tmp/region_{meta.start:.3f}.wav") amine@336: print("region saved as: {}".format(filename)) amine@336: amsehili@343: Example using `AudioRegion` amsehili@349: --------------------------- amine@336: amine@336: .. code:: python amine@336: amine@336: from auditok import AudioRegion amine@336: region = AudioRegion.load("audio.wav") amsehili@343: regions = region.split_and_plot() # or just region.splitp() amine@336: amsehili@349: output figure: amine@336: amine@336: .. image:: doc/figures/example_1.png amsehili@349: amsehili@349: Working with AudioRegions amsehili@349: ------------------------- amsehili@349: amsehili@349: Beyond splitting, there are a couple of interesting operations you can do with ``AudioRegion`` objects. amsehili@349: amsehili@349: Concatenate regions amsehili@349: =================== amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region_1 = AudioRegion.load("audio_1.wav") amsehili@349: region_2 = AudioRegion.load("audio_2.wav") amsehili@349: region_3 = region_1 + region_2 amsehili@349: amsehili@349: Particularly useful if you want to join regions returned by ``split``: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: regions = AudioRegion.load("audio.wav").split() amsehili@349: gapless_region = sum(regions) amsehili@349: amsehili@349: Repeat a region amsehili@349: =============== amsehili@349: amsehili@349: Multiply by a positive integer: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: region_x3 = region * 3 amsehili@349: amsehili@349: Make slices of equal size out of a region amsehili@349: ========================================= amsehili@349: amsehili@349: Divide by a positive integer: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: regions = regions / 5 amsehili@349: assert sum(regions) == region amsehili@349: amsehili@349: Make audio slices of arbitrary size amsehili@349: =================================== amsehili@349: amsehili@349: Slicing an ``AudioRegion`` can be interesting in many situations. You can for example remove a fixed-size portion of audio data from the beginning or the end of a region or crop a region by an arbitrary amount as a data augmentation strategy, etc. amsehili@349: amsehili@349: The most accurate way to slice an ``AudioRegion`` is to use indices that directly refer to raw audio samples. In the following example, assuming that the sampling rate of audio data is 16000, you can extract a 5-second region from main region, starting from the 20th second as follows: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: start = 20 * 16000 amsehili@349: stop = 25 * 16000 amsehili@349: five_second_region = region[start:stop] amsehili@349: amsehili@349: This allows you to practically start and stop at any sample within the region. Just as with a `list` you can omit one of `start` and `stop`, or both. You can also use negative indices: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: start = -3 * region.sr # `sr` is an alias of `sampling_rate` amsehili@349: three_last_seconds = region[start:] amsehili@349: amsehili@349: While slicing by raw samples is accurate, slicing with temporal indices is more intuitive. You can do so by accessing the ``millis`` or ``seconds`` views of ``AudioRegion`` (or their shortcut alias ``ms`` and ``sec``/``s``). amsehili@349: amsehili@349: With the ``millis`` view: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: five_second_region = region.millis[5000:10000] amsehili@349: amsehili@349: or with the ``seconds`` view amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: five_second_region = region.seconds[5:10] amsehili@349: amsehili@349: Get an array of audio samples amsehili@349: ============================= amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: from auditok import AudioRegion amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: samples = region.samples amsehili@349: amsehili@349: If ``numpy`` is installed, this will return a ``numpy.ndarray``. If audio data is mono the returned array is 1D, otherwise it's 2D. If ``numpy`` is not installed this will return a standard ``array.array`` for mono data, and a list of ``array.array`` for multichannel data. amsehili@349: amsehili@349: Alternatively you can use: amsehili@349: amsehili@349: .. code:: python amsehili@349: amsehili@349: import numpy as np amsehili@349: region = AudioRegion.load("audio.wav") amsehili@349: samples = np.asarray(region) amsehili@349: amsehili@349: License amsehili@349: ------- amsehili@349: MIT. amsehili@349: