Class AudioProcessor

  • java.lang.Object
    • at.ofai.music.beatroot.AudioProcessor


  • public class AudioProcessor
    extends java.lang.Object
    Audio processing class (adapted from PerformanceMatcher).
    • Field Summary

      Fields 
      Modifier and Type Field and Description
      protected java.lang.String audioFileName
      Source of input data.
      protected javax.sound.sampled.AudioFormat audioFormat
      Format of the audio data in pcmInputStream
      protected javax.sound.sampled.SourceDataLine audioOut
      Line for audio output (not used, since output is done by AudioPlayer)
      static boolean batchMode
      Flag for batch mode.
      protected int cbIndex
      The index of the next position to write in the circular buffer.
      protected int channels
      Number of channels of audio in audioFormat
      protected double[] circBuffer
      Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).
      static boolean debug
      Flag for enabling or disabling debugging output
      static boolean doOnsetPlot
      Flag for plotting onset detection function.
      protected double[] energy
      The RMS energy of all frames.
      static int energyOversampleFactor
      Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size
      protected int fftSize
      The size of an FFT frame in samples (see fftTime)
      protected double fftTime
      The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime.
      protected int frameCount
      The number of overlapping frames of audio data which have been read.
      protected double frameRMS
      RMS amplitude of the current frame.
      protected double[][] frames
      The magnitude spectra of all frames, used for plotting the spectrogram.
      protected int[] freqMap
      A mapping function for mapping FFT bins to final frequency bins.
      protected int freqMapSize
      The number of entries in freqMap.
      protected int hopSize
      Spacing of audio frames in samples (see hopTime)
      protected double hopTime
      Spacing of audio frames (determines the amount of overlap or skip between frames).
      protected double[] imBuffer
      The imaginary part of the data for the in-place FFT computation.
      protected byte[] inputBuffer
      Audio data is initially read in PCM format into this buffer.
      static int liveInputBufferSize
      Audio buffer for live input.
      protected double ltAverage
      Long term average frame energy (in frequency domain representation).
      static int MAX_LENGTH
      Maximum file length in seconds.
      protected double[] newFrame
      The magnitude spectrum of the current frame.
      static int normaliseMode
      Determines method of normalisation.
      protected EventList onsetList
      The estimated onset times and their saliences.
      protected double[] onsets
      The estimated onset times from peak-picking the onset detection function(s).
      protected javax.sound.sampled.AudioInputStream pcmInputStream
      Uncompressed version of rawInputStream.
      protected double[] phaseDeviation
      Phase deviation onset detection function, indexed by frame.
      protected double[] prevFrame
      The magnitude spectrum of the most recent frame.
      protected double[] prevPhase
      Phase of the previous frame, for calculating an onset function based on spectral phase deviation.
      protected double[] prevPrevPhase
      Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.
      protected ProgressIndicator progressCallback
      GUI component which shows progress of audio processing.
      static double rangeThreshold
      For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.
      protected javax.sound.sampled.AudioInputStream rawInputStream
      Input data stream for this performance (possibly in compressed format)
      protected double[] reBuffer
      The real part of the data for the in-place FFT computation.
      protected float sampleRate
      Sample rate of audio in audioFormat
      static double silenceThreshold
      RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.
      protected static boolean silent
      Flag for suppressing all standard output messages except results.
      protected double[] spectralFlux
      Spectral flux onset detection function, indexed by frame.
      protected int totalFrames
      Total number of audio frames if known, or -1 for live or compressed input.
      protected double[] window
      The window function for the STFT, currently a Hamming window.
      protected double[] y2Onsets
      The y-coordinates of the onsets for plotting.
    • Constructor Summary

      Constructors 
      Constructor and Description
      AudioProcessor()
      Constructor: note that streams are not opened until the input file is set (see setInputFile()).
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method and Description
      void closeStreams()
      Closes the input stream(s) associated with this object.
      void findOnsets(double p1, double p2) 
      static double[] getFeatures(java.lang.String fileName)
      Reads a text file containing a list of whitespace-separated feature values.
      boolean getFrame()
      Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.
      protected void init()
      Allocates memory for arrays, based on parameter settings
      protected void makeFreqMap(int fftSize, float sampleRate)
      Creates a map of FFT frequency bins to comparison bins.
      void print()
      For debugging, outputs information about the AudioProcessor to standard error.
      void processFeatures(java.lang.String fileName, double hopTime)
      Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.
      void processFile()
      Processes a complete file of audio data.
      protected void processFrame()
      Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.
      java.lang.String readLine()
      For interactive pause - wait for user to hit Enter
      void setDisplay(BeatTrackDisplay btd)
      Copies output of audio processing to the display panel.
      void setInputFile(java.lang.String fileName)
      Sets up the streams and buffers for audio file input.
      void setLiveInput()
      Sets up the streams and buffers for live audio input (CD quality).
      void setProgressCallback(ProgressIndicator c)
      Adds a link to the GUI component which shows the progress of matching.
      java.lang.String toString()
      Gives some basic information about the audio being processed.
      protected void weightedPhaseDeviation()
      Calculates the weighted phase deviation onset detection function.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • audioFileName

        protected java.lang.String audioFileName
        Source of input data. Could be extended to include live input from the sound card.
      • audioFormat

        protected javax.sound.sampled.AudioFormat audioFormat
        Format of the audio data in pcmInputStream
      • audioOut

        protected javax.sound.sampled.SourceDataLine audioOut
        Line for audio output (not used, since output is done by AudioPlayer)
      • batchMode

        public static boolean batchMode
        Flag for batch mode.
      • cbIndex

        protected int cbIndex
        The index of the next position to write in the circular buffer.
      • channels

        protected int channels
        Number of channels of audio in audioFormat
      • circBuffer

        protected double[] circBuffer
        Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).
      • debug

        public static boolean debug
        Flag for enabling or disabling debugging output
      • doOnsetPlot

        public static boolean doOnsetPlot
        Flag for plotting onset detection function.
      • energy

        protected double[] energy
        The RMS energy of all frames.
      • energyOversampleFactor

        public static int energyOversampleFactor
        Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size
      • fftSize

        protected int fftSize
        The size of an FFT frame in samples (see fftTime)
      • fftTime

        protected double fftTime
        The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime. (Default = 0.04644s). The value is adjusted so that fftSize is always power of 2.
      • frameCount

        protected int frameCount
        The number of overlapping frames of audio data which have been read.
      • frameRMS

        protected double frameRMS
        RMS amplitude of the current frame.
      • frames

        protected double[][] frames
        The magnitude spectra of all frames, used for plotting the spectrogram.
      • freqMap

        protected int[] freqMap
        A mapping function for mapping FFT bins to final frequency bins. The mapping is linear (1-1) until the resolution reaches 2 points per semitone, then logarithmic with a semitone resolution. e.g. for 44.1kHz sampling rate and fftSize of 2048 (46ms), bin spacing is 21.5Hz, which is mapped linearly for bins 0-34 (0 to 732Hz), and logarithmically for the remaining bins (midi notes 79 to 127, bins 35 to 83), where all energy above note 127 is mapped into the final bin.
      • freqMapSize

        protected int freqMapSize
        The number of entries in freqMap. Note that the length of the array is greater, because its size is not known at creation time.
      • hopSize

        protected int hopSize
        Spacing of audio frames in samples (see hopTime)
      • hopTime

        protected double hopTime
        Spacing of audio frames (determines the amount of overlap or skip between frames). This value is expressed in seconds and can be set by the command line option -h hopTime. (Default = 0.020s)
      • imBuffer

        protected double[] imBuffer
        The imaginary part of the data for the in-place FFT computation. Since input data is real, this initially contains zeros.
      • inputBuffer

        protected byte[] inputBuffer
        Audio data is initially read in PCM format into this buffer.
      • liveInputBufferSize

        public static final int liveInputBufferSize
        Audio buffer for live input. (Not used yet)
        See Also:
        Constant Field Values
      • ltAverage

        protected double ltAverage
        Long term average frame energy (in frequency domain representation).
      • MAX_LENGTH

        public static final int MAX_LENGTH
        Maximum file length in seconds. Used for static allocation of arrays.
        See Also:
        Constant Field Values
      • newFrame

        protected double[] newFrame
        The magnitude spectrum of the current frame.
      • normaliseMode

        public static int normaliseMode
        Determines method of normalisation. Values can be:
        • 0: no normalisation
        • 1: normalisation by current frame energy
        • 2: normalisation by exponential average of frame energy
      • onsetList

        protected EventList onsetList
        The estimated onset times and their saliences.
      • onsets

        protected double[] onsets
        The estimated onset times from peak-picking the onset detection function(s).
      • pcmInputStream

        protected javax.sound.sampled.AudioInputStream pcmInputStream
        Uncompressed version of rawInputStream. In the (normal) case where the input is already PCM data, rawInputStream == pcmInputStream
      • phaseDeviation

        protected double[] phaseDeviation
        Phase deviation onset detection function, indexed by frame.
      • prevFrame

        protected double[] prevFrame
        The magnitude spectrum of the most recent frame. Used for calculating the spectral flux.
      • prevPhase

        protected double[] prevPhase
        Phase of the previous frame, for calculating an onset function based on spectral phase deviation.
      • prevPrevPhase

        protected double[] prevPrevPhase
        Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.
      • progressCallback

        protected ProgressIndicator progressCallback
        GUI component which shows progress of audio processing.
      • rangeThreshold

        public static double rangeThreshold
        For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.
      • rawInputStream

        protected javax.sound.sampled.AudioInputStream rawInputStream
        Input data stream for this performance (possibly in compressed format)
      • reBuffer

        protected double[] reBuffer
        The real part of the data for the in-place FFT computation. Since input data is real, this initially contains the input data.
      • sampleRate

        protected float sampleRate
        Sample rate of audio in audioFormat
      • silenceThreshold

        public static double silenceThreshold
        RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.
      • silent

        protected static boolean silent
        Flag for suppressing all standard output messages except results.
      • spectralFlux

        protected double[] spectralFlux
        Spectral flux onset detection function, indexed by frame.
      • totalFrames

        protected int totalFrames
        Total number of audio frames if known, or -1 for live or compressed input.
      • window

        protected double[] window
        The window function for the STFT, currently a Hamming window.
      • y2Onsets

        protected double[] y2Onsets
        The y-coordinates of the onsets for plotting. Only used if doOnsetPlot is true
    • Constructor Detail

      • AudioProcessor

        public AudioProcessor()
        Constructor: note that streams are not opened until the input file is set (see setInputFile()).
    • Method Detail

      • print

        public void print()
        For debugging, outputs information about the AudioProcessor to standard error.
      • readLine

        public java.lang.String readLine()
        For interactive pause - wait for user to hit Enter
      • toString

        public java.lang.String toString()
        Gives some basic information about the audio being processed.
        Overrides:
        toString in class java.lang.Object
      • setProgressCallback

        public void setProgressCallback(ProgressIndicator c)
        Adds a link to the GUI component which shows the progress of matching.
        Parameters:
        c - the AudioProcessor representing the other performance
      • setLiveInput

        public void setLiveInput()
        Sets up the streams and buffers for live audio input (CD quality). If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.
      • setInputFile

        public void setInputFile(java.lang.String fileName)
        Sets up the streams and buffers for audio file input. If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.
        Parameters:
        fileName - The path name of the input audio file.
      • init

        protected void init()
        Allocates memory for arrays, based on parameter settings
      • closeStreams

        public void closeStreams()
        Closes the input stream(s) associated with this object.
      • makeFreqMap

        protected void makeFreqMap(int fftSize,
                                   float sampleRate)
        Creates a map of FFT frequency bins to comparison bins. Where the spacing of FFT bins is less than 0.5 semitones, the mapping is one to one. Where the spacing is greater than 0.5 semitones, the FFT energy is mapped into semitone-wide bins. No scaling is performed; that is the energy is summed into the comparison bins. See also processFrame()
      • weightedPhaseDeviation

        protected void weightedPhaseDeviation()
        Calculates the weighted phase deviation onset detection function. Not used. TODO: Test the change to WPD fn
      • getFrame

        public boolean getFrame()
        Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.
        Returns:
        true if a frame (or part of a frame, if it is the final frame) is read. If a complete frame cannot be read, the InputStream is set to null.
      • processFrame

        protected void processFrame()
        Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.
      • processFile

        public void processFile()
        Processes a complete file of audio data.
      • findOnsets

        public void findOnsets(double p1,
                               double p2)
      • getFeatures

        public static double[] getFeatures(java.lang.String fileName)
        Reads a text file containing a list of whitespace-separated feature values. Created for paper submitted to ICASSP'07.
        Parameters:
        fileName - File containing the data
        Returns:
        An array containing the feature values
      • processFeatures

        public void processFeatures(java.lang.String fileName,
                                    double hopTime)
        Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.
        Parameters:
        fileName - The file of feature values
        hopTime - The spacing of feature values in time
      • setDisplay

        public void setDisplay(BeatTrackDisplay btd)
        Copies output of audio processing to the display panel.