Daniel@0: function [f,p,m,fe] = mirsegment(x,varargin) Daniel@0: % f = mirsegment(a) segments an audio signal. It can also be the name of an Daniel@0: % audio file or 'Folder', for the analysis of the audio files in the Daniel@0: % current folder. The segmentation of audio signal already decomposed Daniel@0: % into frames is not available for the moment. Daniel@0: % f = mirsegment(...,'Novelty') segments using a self-similarity matrix Daniel@0: % (Foote & Cooper, 2003) (by default) Daniel@0: % f = mirsegment(...,feature) bases the segmentation strategy on a Daniel@0: % specific feature. Daniel@0: % 'Spectrum': from FFT spectrum (by default) Daniel@0: % 'MFCC': from MFCCs Daniel@0: % 'Keystrength': from the key strength profile Daniel@0: % 'AutocorPitch': from the autocorrelation function computed as Daniel@0: % for pitch extraction. Daniel@0: % The option related to this feature extraction can be specified. Daniel@0: % Example: mirsegment(...,'Spectrum','Window','bartlett') Daniel@0: % mirsegment(...,'MFCC','Rank',1:10) Daniel@0: % mirsegment(...,'Keystrength','Weight',.5) Daniel@0: % These feature need to be frame-based, in order to appreciate their Daniel@0: % temporal evolution. Therefore, the audio signal x is first Daniel@0: % decomposed into frames. This decomposition can be controled Daniel@0: % using the 'Frame' keyword. Daniel@0: % The options available for the chosen strategies can be specified Daniel@0: % directly as options of the segment function. Daniel@0: % Example: mirsegment(a,'Novelty','KernelSize',10) Daniel@0: % f = mirsegment(...,'HCDF') segments using the Harmonic Change Detection Daniel@0: % Function (Harte & Sandler, 2006) Daniel@0: % f = mirsegment(...,'RMS') segments at positions of long silences. A Daniel@0: % frame decomposed RMS is computed using mirrms (with default Daniel@0: % options), and segments are selected from temporal positions Daniel@0: % where the RMS rises to a given 'On' threshold, until temporal Daniel@0: % positions where the RMS drops back to a given 'Off' threshold. Daniel@0: % f = mirsegment(...,'Off',t1) specifies the RMS 'Off' threshold. Daniel@0: % Default value: t1 = .01 Daniel@0: % f = mirsegment(...,'On',t2) specifies the RMS 'On' threshold. Daniel@0: % Default value: t2 = .02 Daniel@0: % Daniel@0: % f = mirsegment(a,s) segments a using the results of a segmentation Daniel@0: % analysis s. s can be the peaks detected on an analysis of the Daniel@0: % audio for instance. Daniel@0: % Daniel@0: % f = mirsegment(a,v) where v is an array of numbers, segments a using Daniel@0: % the temporal positions specified in v (in s.) Daniel@0: % Daniel@0: % Foote, J. & Cooper, M. (2003). Media Segmentation using Self-Similarity Daniel@0: % Decomposition,. In Proc. SPIE Storage and Retrieval for Multimedia Daniel@0: % Databases, Vol. 5021, pp. 167-75. Daniel@0: % Harte, C. A. & Sandler, M. B. (2006). Detecting harmonic change in Daniel@0: % musical audio, in Proceedings of Audio and Music Computing for Daniel@0: % Multimedia Workshop, Santa Barbara, CA. Daniel@0: Daniel@0: Daniel@0: % [f,p] = mirsegment(...) also displays the analysis produced by the chosen Daniel@0: % strategy. Daniel@0: % For 'Novelty', p is the novelty curve. Daniel@0: % For 'HCDF', p is the Harmonic Change Detection Function. Daniel@0: % [f,p,m] = mirsegment(...) also displays the preliminary analysis Daniel@0: % undertaken in the chosen strategy. Daniel@0: % For 'Novelty', m is the similarity matrix. Daniel@0: % For 'HCDF', m is the tonal centroid. Daniel@0: % [f,p,m,fe] = mirsegment(...) also displays the temporal evolution of the Daniel@0: % feature used for the analysis. Daniel@0: Daniel@0: % f = mirsegment(...,'Novelty') Daniel@0: Daniel@0: mfc.key = {'Rank','MFCC'}; Daniel@0: mfc.type = 'Integers'; Daniel@0: mfc.default = 0; Daniel@0: mfc.keydefault = 1:13; Daniel@0: option.mfc = mfc; Daniel@0: Daniel@0: K.key = 'KernelSize'; Daniel@0: K.type = 'Integer'; Daniel@0: K.default = 128; Daniel@0: option.K = K; Daniel@0: Daniel@0: distance.key = 'Distance'; Daniel@0: distance.type = 'String'; Daniel@0: distance.default = 'cosine'; Daniel@0: option.distance = distance; Daniel@0: Daniel@0: measure.key = {'Measure','Similarity'}; Daniel@0: measure.type = 'String'; Daniel@0: measure.default = 'exponential'; Daniel@0: option.measure = measure; Daniel@0: Daniel@0: tot.key = 'Total'; Daniel@0: tot.type = 'Integer'; Daniel@0: tot.default = Inf; Daniel@0: option.tot = tot; Daniel@0: Daniel@0: cthr.key = 'Contrast'; Daniel@0: cthr.type = 'Integer'; Daniel@0: cthr.default = .1; Daniel@0: option.cthr = cthr; Daniel@0: Daniel@0: frame.key = 'Frame'; Daniel@0: frame.type = 'Integer'; Daniel@0: frame.number = 2; Daniel@0: frame.default = [0 0]; Daniel@0: frame.keydefault = [3 .1]; Daniel@0: option.frame = frame; Daniel@0: Daniel@0: ana.type = 'String'; Daniel@0: ana.choice = {'Spectrum','Keystrength','AutocorPitch','Pitch'}; Daniel@0: ana.default = 0; Daniel@0: option.ana = ana; Daniel@0: Daniel@0: % f = mirsegment(...,'Spectrum') Daniel@0: Daniel@0: band.choice = {'Mel','Bark','Freq'}; Daniel@0: band.type = 'String'; Daniel@0: band.default = 'Freq'; Daniel@0: option.band = band; Daniel@0: Daniel@0: mi.key = 'Min'; Daniel@0: mi.type = 'Integer'; Daniel@0: mi.default = 0; Daniel@0: option.mi = mi; Daniel@0: Daniel@0: ma.key = 'Max'; Daniel@0: ma.type = 'Integer'; Daniel@0: ma.default = 0; Daniel@0: option.ma = ma; Daniel@0: Daniel@0: norm.key = 'Normal'; Daniel@0: norm.type = 'Boolean'; Daniel@0: norm.default = 0; Daniel@0: option.norm = norm; Daniel@0: Daniel@0: win.key = 'Window'; Daniel@0: win.type = 'String'; Daniel@0: win.default = 'hamming'; Daniel@0: option.win = win; Daniel@0: Daniel@0: % f = mirsegment(...,'Silence') Daniel@0: Daniel@0: throff.key = 'Off'; Daniel@0: throff.type = 'Integer'; Daniel@0: throff.default = .01; Daniel@0: option.throff = throff; Daniel@0: Daniel@0: thron.key = 'On'; Daniel@0: thron.type = 'Integer'; Daniel@0: thron.default = .02; Daniel@0: option.thron = thron; Daniel@0: Daniel@0: strat.choice = {'Novelty','HCDF','RMS'}; % should remain as last field Daniel@0: strat.default = 'Novelty'; Daniel@0: strat.position = 2; Daniel@0: option.strat = strat; Daniel@0: Daniel@0: specif.option = option; Daniel@0: Daniel@0: Daniel@0: p = {}; Daniel@0: m = {}; Daniel@0: fe = {}; Daniel@0: Daniel@0: if isa(x,'mirdesign') Daniel@0: if not(get(x,'Eval')) Daniel@0: % During bottom-up construction of the general design Daniel@0: Daniel@0: [unused option] = miroptions(@mirframe,x,specif,varargin); Daniel@0: type = get(x,'Type'); Daniel@0: f = mirdesign(@mirsegment,x,option,{},struct,type); Daniel@0: Daniel@0: sg = get(x,'Segment'); Daniel@0: if not(isempty(sg)) Daniel@0: f = set(f,'Segment',sg); Daniel@0: else Daniel@0: f = set(f,'Segment',option.strat); Daniel@0: end Daniel@0: Daniel@0: else Daniel@0: % During top-down evaluation initiation Daniel@0: Daniel@0: f = evaleach(x); Daniel@0: if iscell(f) Daniel@0: f = f{1}; Daniel@0: end Daniel@0: p = x; Daniel@0: end Daniel@0: elseif isa(x,'mirdata') Daniel@0: [unused option] = miroptions(@mirframe,x,specif,varargin); Daniel@0: if ischar(option.strat) Daniel@0: dx = get(x,'Data'); Daniel@0: if size(dx{1},2) > 1 Daniel@0: error('ERROR IN MIRSEGMENT: The segmentation of audio signal already decomposed into frames is not available for the moment.'); Daniel@0: end Daniel@0: if strcmpi(option.strat,'Novelty') Daniel@0: if not(option.frame.length.val) Daniel@0: if strcmpi(option.ana,'Keystrength') Daniel@0: option.frame.length.val = .5; Daniel@0: option.frame.hop.val = .2; Daniel@0: elseif strcmpi(option.ana,'AutocorPitch') ... Daniel@0: || strcmpi(option.ana,'Pitch') Daniel@0: option.frame.length.val = .05; Daniel@0: option.frame.hop.val = .01; Daniel@0: else Daniel@0: option.frame.length.val = .05; Daniel@0: option.frame.hop.val = 1; Daniel@0: end Daniel@0: end Daniel@0: fr = mirframenow(x,option); Daniel@0: if not(isequal(option.mfc,0)) Daniel@0: fe = mirmfcc(fr,'Rank',option.mfc); Daniel@0: elseif strcmpi(option.ana,'Spectrum') Daniel@0: fe = mirspectrum(fr,'Min',option.mi,'Max',option.ma,... Daniel@0: 'Normal',option.norm,option.band,... Daniel@0: 'Window',option.win); Daniel@0: elseif strcmpi(option.ana,'Keystrength') Daniel@0: fe = mirkeystrength(fr); Daniel@0: elseif strcmpi(option.ana,'AutocorPitch') ... Daniel@0: || strcmpi(option.ana,'Pitch') Daniel@0: [unused,fe] = mirpitch(x,'Frame'); Daniel@0: else Daniel@0: fe = fr; Daniel@0: end Daniel@0: [n m] = mirnovelty(fe,'Distance',option.distance,... Daniel@0: 'Measure',option.measure,... Daniel@0: 'KernelSize',option.K); Daniel@0: p = mirpeaks(n,'Total',option.tot,... Daniel@0: 'Contrast',option.cthr,... Daniel@0: 'Chrono','NoBegin','NoEnd'); Daniel@0: elseif strcmpi(option.strat,'HCDF') Daniel@0: if not(option.frame.length.val) Daniel@0: option.frame.length.val = .743; Daniel@0: option.frame.hop.val = 1/8; Daniel@0: end Daniel@0: fr = mirframenow(x,option); Daniel@0: %[df m fe] = mirhcdf(fr); Daniel@0: df = mirhcdf(fr); Daniel@0: p = mirpeaks(df); Daniel@0: elseif strcmpi(option.strat,'RMS') Daniel@0: if not(option.frame.length.val) Daniel@0: option.frame.length.val = .05; Daniel@0: option.frame.hop.val = .5; Daniel@0: end Daniel@0: fr = mirframenow(x,option); Daniel@0: %[df m fe] = mirhcdf(fr); Daniel@0: df = mirrms(fr); Daniel@0: fp = get(df,'FramePos'); Daniel@0: p = mircompute(@findsilence,df,fp,option.throff,option.thron); Daniel@0: end Daniel@0: f = mirsegment(x,p); Daniel@0: else Daniel@0: dx = get(x,'Data'); Daniel@0: dt = get(x,'Time'); Daniel@0: Daniel@0: if isa(option.strat,'mirscalar') Daniel@0: ds = get(option.strat,'PeakPos'); Daniel@0: fp = get(option.strat,'FramePos'); Daniel@0: elseif isa(option.strat,'mirdata') Daniel@0: ds = get(option.strat,'AttackPos'); Daniel@0: if isempty(ds) || isempty(ds{1}) Daniel@0: ds = get(option.strat,'PeakPos'); Daniel@0: end Daniel@0: xx = get(option.strat,'Pos'); Daniel@0: else Daniel@0: ds = option.strat; Daniel@0: fp = cell(1,length(dx)); Daniel@0: end Daniel@0: st = cell(1,length(dx)); Daniel@0: sx = cell(1,length(dx)); Daniel@0: cl = cell(1,length(dx)); Daniel@0: for k = 1:length(dx) Daniel@0: dxk = dx{k}{1}; % values in kth audio file Daniel@0: dtk = dt{k}{1}; % time positions in kth audio file Daniel@0: if isa(option.strat,'mirdata') Daniel@0: dsk = ds{k}{1}; % segmentation times in kth audio file Daniel@0: else Daniel@0: dsk = {ds}; Daniel@0: end Daniel@0: fsk = []; % the structured array of segmentation times Daniel@0: % needs to be flatten Daniel@0: for j = 1:length(dsk) Daniel@0: if isa(option.strat,'mirdata') Daniel@0: dsj = dsk{j}; % segmentation times in jth segment Daniel@0: else Daniel@0: dsj = ds; Daniel@0: end Daniel@0: if not(iscell(dsj)) Daniel@0: dsj = {dsj}; Daniel@0: end Daniel@0: for m = 1:length(dsj) Daniel@0: % segmentation times in mth bank channel Daniel@0: if isa(option.strat,'mirscalar') Daniel@0: dsm = fp{k}{m}(1,dsj{m}); Daniel@0: elseif isa(option.strat,'mirdata') Daniel@0: dsm = xx{k}{m}(dsj{m}); Daniel@0: else Daniel@0: dsm = dsj{m}; Daniel@0: end Daniel@0: if iscell(dsm) Daniel@0: dsm = dsm{1}; Daniel@0: end Daniel@0: dsm(:,find(dsm(1,:) < dtk(1))) = []; Daniel@0: dsm(:,find(dsm(end,:) > dtk(end))) = []; Daniel@0: % It is presupposed here that the segmentations times Daniel@0: % for a given channel are not decomposed per frames, Daniel@0: % because the segmentation of the frame decomposition Daniel@0: % is something that does not seem very clear. Daniel@0: % Practically, the peak picking for instance is based Daniel@0: % therefore on a frame analysis (such as novelty), and Daniel@0: % segmentation are inferred between these frames... Daniel@0: if size(dsm,2) == 1 Daniel@0: dsm = dsm'; Daniel@0: end Daniel@0: fsk = [fsk dsm]; Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: fsk = sort(fsk); % Here is the chronological ordering Daniel@0: Daniel@0: if isempty(fsk) Daniel@0: ffsk = {[0;dtk(end)]}; Daniel@0: sxk = {dxk}; Daniel@0: stk = {dtk}; Daniel@0: n = 1; Daniel@0: elseif size(fsk,1) == 1 Daniel@0: ffsk = cell(1,length(fsk)+1); Daniel@0: ffsk{1} = [dtk(1);fsk(1)]; Daniel@0: for h = 1:length(fsk)-1 Daniel@0: ffsk{h+1} = [fsk(h);fsk(h+1)]; Daniel@0: end Daniel@0: ffsk{end} = [fsk(end);dtk(end)]; Daniel@0: Daniel@0: n = length(ffsk); Daniel@0: Daniel@0: crd = zeros(1,n+1); % the sample positions of the Daniel@0: % segmentations in the channel Daniel@0: crd0 = 0; Daniel@0: for i = 1:n Daniel@0: crd0 = crd0 + find(dtk(crd0+1:end)>=ffsk{i}(1),1); Daniel@0: crd(i) = crd0; Daniel@0: end Daniel@0: crd(n+1) = size(dxk,1)+1; Daniel@0: Daniel@0: sxk = cell(1,n); % each cell contains a segment Daniel@0: stk = cell(1,n); % each cell contains Daniel@0: % the corresponding time positions Daniel@0: Daniel@0: for i = 1:n Daniel@0: sxk{i} = dxk(crd(i):crd(i+1)-1,1,:); Daniel@0: stk{i} = dtk(crd(i):crd(i+1)-1); Daniel@0: end Daniel@0: Daniel@0: elseif size(fsk,1) == 2 Daniel@0: ffsk = cell(1,size(fsk,2)); Daniel@0: for h = 1:length(fsk) Daniel@0: ffsk{h} = [fsk(1,h);fsk(2,h)]; Daniel@0: end Daniel@0: n = length(ffsk); Daniel@0: crd = zeros(2,n); % the sample positions of the Daniel@0: % segmentations in the channel Daniel@0: crd0 = 0; Daniel@0: for i = 1:n Daniel@0: crd0 = crd0 + find(dtk(crd0+1:end)>=ffsk{i}(1),1); Daniel@0: crd(i,1) = crd0; Daniel@0: crd0 = crd0 + find(dtk(crd0+1:end)>=ffsk{i}(2),1); Daniel@0: crd(i,2) = crd0; Daniel@0: end Daniel@0: sxk = cell(1,n); % each cell contains a segment Daniel@0: stk = cell(1,n); % each cell contains Daniel@0: % the corresponding time positions Daniel@0: for i = 1:n Daniel@0: sxk{i} = dxk(crd(i,1):crd(i,2),1,:); Daniel@0: stk{i} = dtk(crd(i,1):crd(i,2)); Daniel@0: end Daniel@0: end Daniel@0: sx{k} = sxk; Daniel@0: st{k} = stk; Daniel@0: fp{k} = ffsk; Daniel@0: cl{k} = 1:n; Daniel@0: end Daniel@0: f = set(x,'Data',sx,'Time',st,'FramePos',fp,'Clusters',cl); Daniel@0: p = strat; Daniel@0: m = {}; Daniel@0: fe = {}; Daniel@0: end Daniel@0: else Daniel@0: [f p] = mirsegment(miraudio(x),varargin{:}); Daniel@0: end Daniel@0: Daniel@0: Daniel@0: function p = findsilence(d,fp,throff,thron) Daniel@0: d = [0 d 0]; Daniel@0: begseg = find(d(1:end-1)=thron); Daniel@0: nseg = length(begseg); Daniel@0: endseg = zeros(1,nseg); Daniel@0: removed = []; Daniel@0: for i = 1:nseg Daniel@0: endseg(i) = begseg(i) + find(d(begseg(i)+1:end)<=throff, 1)-1; Daniel@0: if i>1 && endseg(i) == endseg(i-1) Daniel@0: removed = [removed i]; Daniel@0: end Daniel@0: end Daniel@0: begseg(removed) = []; Daniel@0: %endseg(removed) = []; Daniel@0: %endseg(end) = min(endseg(end),length(d)+1); Daniel@0: p = fp(1,begseg); %; fp(2,endseg-1)];