Package at.ofai.music.beatroot
Class AudioProcessor
- java.lang.Object
-
- at.ofai.music.beatroot.AudioProcessor
-
public class AudioProcessor extends java.lang.Object
Audio processing class (adapted from PerformanceMatcher).
-
-
Field Summary
Fields Modifier and Type Field and Description protected java.lang.String
audioFileName
Source of input data.protected javax.sound.sampled.AudioFormat
audioFormat
Format of the audio data inpcmInputStream
protected javax.sound.sampled.SourceDataLine
audioOut
Line for audio output (not used, since output is done by AudioPlayer)static boolean
batchMode
Flag for batch mode.protected int
cbIndex
The index of the next position to write in the circular buffer.protected int
channels
Number of channels of audio inaudioFormat
protected double[]
circBuffer
Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).static boolean
debug
Flag for enabling or disabling debugging outputstatic boolean
doOnsetPlot
Flag for plotting onset detection function.protected double[]
energy
The RMS energy of all frames.static int
energyOversampleFactor
Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop sizeprotected int
fftSize
The size of an FFT frame in samples (seefftTime
)protected double
fftTime
The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime.protected int
frameCount
The number of overlapping frames of audio data which have been read.protected double
frameRMS
RMS amplitude of the current frame.protected double[][]
frames
The magnitude spectra of all frames, used for plotting the spectrogram.protected int[]
freqMap
A mapping function for mapping FFT bins to final frequency bins.protected int
freqMapSize
The number of entries infreqMap
.protected int
hopSize
Spacing of audio frames in samples (seehopTime
)protected double
hopTime
Spacing of audio frames (determines the amount of overlap or skip between frames).protected double[]
imBuffer
The imaginary part of the data for the in-place FFT computation.protected byte[]
inputBuffer
Audio data is initially read in PCM format into this buffer.static int
liveInputBufferSize
Audio buffer for live input.protected double
ltAverage
Long term average frame energy (in frequency domain representation).static int
MAX_LENGTH
Maximum file length in seconds.protected double[]
newFrame
The magnitude spectrum of the current frame.static int
normaliseMode
Determines method of normalisation.protected EventList
onsetList
The estimated onset times and their saliences.protected double[]
onsets
The estimated onset times from peak-picking the onset detection function(s).protected javax.sound.sampled.AudioInputStream
pcmInputStream
Uncompressed version ofrawInputStream
.protected double[]
phaseDeviation
Phase deviation onset detection function, indexed by frame.protected double[]
prevFrame
The magnitude spectrum of the most recent frame.protected double[]
prevPhase
Phase of the previous frame, for calculating an onset function based on spectral phase deviation.protected double[]
prevPrevPhase
Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.protected ProgressIndicator
progressCallback
GUI component which shows progress of audio processing.static double
rangeThreshold
For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.protected javax.sound.sampled.AudioInputStream
rawInputStream
Input data stream for this performance (possibly in compressed format)protected double[]
reBuffer
The real part of the data for the in-place FFT computation.protected float
sampleRate
Sample rate of audio inaudioFormat
static double
silenceThreshold
RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.protected static boolean
silent
Flag for suppressing all standard output messages except results.protected double[]
spectralFlux
Spectral flux onset detection function, indexed by frame.protected int
totalFrames
Total number of audio frames if known, or -1 for live or compressed input.protected double[]
window
The window function for the STFT, currently a Hamming window.protected double[]
y2Onsets
The y-coordinates of the onsets for plotting.
-
Constructor Summary
Constructors Constructor and Description AudioProcessor()
Constructor: note that streams are not opened until the input file is set (seesetInputFile()
).
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description void
closeStreams()
Closes the input stream(s) associated with this object.void
findOnsets(double p1, double p2)
static double[]
getFeatures(java.lang.String fileName)
Reads a text file containing a list of whitespace-separated feature values.boolean
getFrame()
Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.protected void
init()
Allocates memory for arrays, based on parameter settingsprotected void
makeFreqMap(int fftSize, float sampleRate)
Creates a map of FFT frequency bins to comparison bins.void
print()
For debugging, outputs information about the AudioProcessor to standard error.void
processFeatures(java.lang.String fileName, double hopTime)
Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored inonsetList
andonsets
.void
processFile()
Processes a complete file of audio data.protected void
processFrame()
Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.java.lang.String
readLine()
For interactive pause - wait for user to hit Entervoid
setDisplay(BeatTrackDisplay btd)
Copies output of audio processing to the display panel.void
setInputFile(java.lang.String fileName)
Sets up the streams and buffers for audio file input.void
setLiveInput()
Sets up the streams and buffers for live audio input (CD quality).void
setProgressCallback(ProgressIndicator c)
Adds a link to the GUI component which shows the progress of matching.java.lang.String
toString()
Gives some basic information about the audio being processed.protected void
weightedPhaseDeviation()
Calculates the weighted phase deviation onset detection function.
-
-
-
Field Detail
-
audioFileName
protected java.lang.String audioFileName
Source of input data. Could be extended to include live input from the sound card.
-
audioFormat
protected javax.sound.sampled.AudioFormat audioFormat
Format of the audio data inpcmInputStream
-
audioOut
protected javax.sound.sampled.SourceDataLine audioOut
Line for audio output (not used, since output is done by AudioPlayer)
-
batchMode
public static boolean batchMode
Flag for batch mode.
-
cbIndex
protected int cbIndex
The index of the next position to write in the circular buffer.
-
channels
protected int channels
Number of channels of audio inaudioFormat
-
circBuffer
protected double[] circBuffer
Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).
-
debug
public static boolean debug
Flag for enabling or disabling debugging output
-
doOnsetPlot
public static boolean doOnsetPlot
Flag for plotting onset detection function.
-
energy
protected double[] energy
The RMS energy of all frames.
-
energyOversampleFactor
public static int energyOversampleFactor
Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size
-
fftSize
protected int fftSize
The size of an FFT frame in samples (seefftTime
)
-
fftTime
protected double fftTime
The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime. (Default = 0.04644s). The value is adjusted so thatfftSize
is always power of 2.
-
frameCount
protected int frameCount
The number of overlapping frames of audio data which have been read.
-
frameRMS
protected double frameRMS
RMS amplitude of the current frame.
-
frames
protected double[][] frames
The magnitude spectra of all frames, used for plotting the spectrogram.
-
freqMap
protected int[] freqMap
A mapping function for mapping FFT bins to final frequency bins. The mapping is linear (1-1) until the resolution reaches 2 points per semitone, then logarithmic with a semitone resolution. e.g. for 44.1kHz sampling rate and fftSize of 2048 (46ms), bin spacing is 21.5Hz, which is mapped linearly for bins 0-34 (0 to 732Hz), and logarithmically for the remaining bins (midi notes 79 to 127, bins 35 to 83), where all energy above note 127 is mapped into the final bin.
-
freqMapSize
protected int freqMapSize
The number of entries infreqMap
. Note that the length of the array is greater, because its size is not known at creation time.
-
hopSize
protected int hopSize
Spacing of audio frames in samples (seehopTime
)
-
hopTime
protected double hopTime
Spacing of audio frames (determines the amount of overlap or skip between frames). This value is expressed in seconds and can be set by the command line option -h hopTime. (Default = 0.020s)
-
imBuffer
protected double[] imBuffer
The imaginary part of the data for the in-place FFT computation. Since input data is real, this initially contains zeros.
-
inputBuffer
protected byte[] inputBuffer
Audio data is initially read in PCM format into this buffer.
-
liveInputBufferSize
public static final int liveInputBufferSize
Audio buffer for live input. (Not used yet)- See Also:
- Constant Field Values
-
ltAverage
protected double ltAverage
Long term average frame energy (in frequency domain representation).
-
MAX_LENGTH
public static final int MAX_LENGTH
Maximum file length in seconds. Used for static allocation of arrays.- See Also:
- Constant Field Values
-
newFrame
protected double[] newFrame
The magnitude spectrum of the current frame.
-
normaliseMode
public static int normaliseMode
Determines method of normalisation. Values can be:- 0: no normalisation
- 1: normalisation by current frame energy
- 2: normalisation by exponential average of frame energy
-
onsetList
protected EventList onsetList
The estimated onset times and their saliences.
-
onsets
protected double[] onsets
The estimated onset times from peak-picking the onset detection function(s).
-
pcmInputStream
protected javax.sound.sampled.AudioInputStream pcmInputStream
Uncompressed version ofrawInputStream
. In the (normal) case where the input is already PCM data,rawInputStream == pcmInputStream
-
phaseDeviation
protected double[] phaseDeviation
Phase deviation onset detection function, indexed by frame.
-
prevFrame
protected double[] prevFrame
The magnitude spectrum of the most recent frame. Used for calculating the spectral flux.
-
prevPhase
protected double[] prevPhase
Phase of the previous frame, for calculating an onset function based on spectral phase deviation.
-
prevPrevPhase
protected double[] prevPrevPhase
Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.
-
progressCallback
protected ProgressIndicator progressCallback
GUI component which shows progress of audio processing.
-
rangeThreshold
public static double rangeThreshold
For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.
-
rawInputStream
protected javax.sound.sampled.AudioInputStream rawInputStream
Input data stream for this performance (possibly in compressed format)
-
reBuffer
protected double[] reBuffer
The real part of the data for the in-place FFT computation. Since input data is real, this initially contains the input data.
-
sampleRate
protected float sampleRate
Sample rate of audio inaudioFormat
-
silenceThreshold
public static double silenceThreshold
RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.
-
silent
protected static boolean silent
Flag for suppressing all standard output messages except results.
-
spectralFlux
protected double[] spectralFlux
Spectral flux onset detection function, indexed by frame.
-
totalFrames
protected int totalFrames
Total number of audio frames if known, or -1 for live or compressed input.
-
window
protected double[] window
The window function for the STFT, currently a Hamming window.
-
y2Onsets
protected double[] y2Onsets
The y-coordinates of the onsets for plotting. Only used if doOnsetPlot is true
-
-
Constructor Detail
-
AudioProcessor
public AudioProcessor()
Constructor: note that streams are not opened until the input file is set (seesetInputFile()
).
-
-
Method Detail
-
print
public void print()
For debugging, outputs information about the AudioProcessor to standard error.
-
readLine
public java.lang.String readLine()
For interactive pause - wait for user to hit Enter
-
toString
public java.lang.String toString()
Gives some basic information about the audio being processed.- Overrides:
toString
in classjava.lang.Object
-
setProgressCallback
public void setProgressCallback(ProgressIndicator c)
Adds a link to the GUI component which shows the progress of matching.- Parameters:
c
- the AudioProcessor representing the other performance
-
setLiveInput
public void setLiveInput()
Sets up the streams and buffers for live audio input (CD quality). If any Exception is thrown within this method, it is caught, and any opened streams are closed, andpcmInputStream
is set tonull
, indicating that the method did not complete successfully.
-
setInputFile
public void setInputFile(java.lang.String fileName)
Sets up the streams and buffers for audio file input. If any Exception is thrown within this method, it is caught, and any opened streams are closed, andpcmInputStream
is set tonull
, indicating that the method did not complete successfully.- Parameters:
fileName
- The path name of the input audio file.
-
init
protected void init()
Allocates memory for arrays, based on parameter settings
-
closeStreams
public void closeStreams()
Closes the input stream(s) associated with this object.
-
makeFreqMap
protected void makeFreqMap(int fftSize, float sampleRate)
Creates a map of FFT frequency bins to comparison bins. Where the spacing of FFT bins is less than 0.5 semitones, the mapping is one to one. Where the spacing is greater than 0.5 semitones, the FFT energy is mapped into semitone-wide bins. No scaling is performed; that is the energy is summed into the comparison bins. See also processFrame()
-
weightedPhaseDeviation
protected void weightedPhaseDeviation()
Calculates the weighted phase deviation onset detection function. Not used. TODO: Test the change to WPD fn
-
getFrame
public boolean getFrame()
Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.- Returns:
- true if a frame (or part of a frame, if it is the final frame) is read. If a complete frame cannot be read, the InputStream is set to null.
-
processFrame
protected void processFrame()
Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.
-
processFile
public void processFile()
Processes a complete file of audio data.
-
findOnsets
public void findOnsets(double p1, double p2)
-
getFeatures
public static double[] getFeatures(java.lang.String fileName)
Reads a text file containing a list of whitespace-separated feature values. Created for paper submitted to ICASSP'07.- Parameters:
fileName
- File containing the data- Returns:
- An array containing the feature values
-
processFeatures
public void processFeatures(java.lang.String fileName, double hopTime)
Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored inonsetList
andonsets
.- Parameters:
fileName
- The file of feature valueshopTime
- The spacing of feature values in time
-
setDisplay
public void setDisplay(BeatTrackDisplay btd)
Copies output of audio processing to the display panel.
-
-