Chris@2: function [ph pz sumY] = transcriptionMultipleTemplates(filename,iter,sz,su) Chris@2: Chris@2: Chris@2: % Load note templates Chris@2: load('noteTemplatesBassoon'); W(:,:,1) = noteTemplatesBassoon; Chris@2: load('noteTemplatesCello'); W(:,:,2) = noteTemplatesCello; Chris@2: load('noteTemplatesClarinet'); W(:,:,3) = noteTemplatesClarinet; Chris@2: load('noteTemplatesFlute'); W(:,:,4) = noteTemplatesFlute; Chris@2: load('noteTemplatesGuitar'); W(:,:,5) = noteTemplatesGuitar; Chris@2: load('noteTemplatesHorn'); W(:,:,6) = noteTemplatesHorn; Chris@2: load('noteTemplatesOboe'); W(:,:,7) = noteTemplatesOboe; Chris@2: load('noteTemplatesTenorSax'); W(:,:,8) = noteTemplatesTenorSax; Chris@2: load('noteTemplatesViolin'); W(:,:,9) = noteTemplatesViolin; Chris@9: Chris@9: %% SptkBGCl -> piano (it stands for Sampletek Steinway "Black Grand"). Chris@9: %% It took me a while to figure this out! (It is documented in the Chris@9: %% MAPS database) Chris@2: load('noteTemplatesSptkBGCl'); W(:,:,10) = noteTemplatesSptkBGCl; Chris@2: Chris@2: %pitchActivity = [14 16 30 40 20 21 38 24 35 1; 52 61 69 76 56 57 71 55 80 88]'; Chris@2: pitchActivity = [16 16 30 40 20 21 38 24 35 16; 52 61 69 73 56 57 71 55 73 73]'; Chris@2: Chris@2: Chris@5: %% this turns W0 into a 10x88 cell array in which W0{instrument}{note} Chris@5: %% is the 545x1 template for the given instrument and note number. Chris@2: W = permute(W,[2 1 3]); Chris@2: W0 = squeeze(num2cell(W,1))'; Chris@5: Chris@2: clear('noteTemplatesBassoon','noteTemplatesCello','noteTemplatesClarinet','noteTemplatesFlute','noteTemplatesGuitar',... Chris@2: 'noteTemplatesHorn','noteTemplatesOboe','noteTemplatesTenorSax','noteTemplatesViolin','noteTemplatesSptkBGCl','W'); Chris@2: Chris@2: Chris@2: % Compute CQT Chris@5: Chris@5: %% The CQT parameters are hardcoded in computeCQT. It has frequency Chris@5: %% range 27.5 -> samplerate/3, 60 bins per octave, a 'q' of 0.8 (lower Chris@5: %% than the maximum, and default, value of 1), 'atomHopFactor' 0.3 Chris@5: %% rather than the default 0.25 (why?), Hann window, default sparsity Chris@10: %% threshold. The CQT obtained is the interpolated real-valued Chris@10: %% magnitude spectrogram rather than the complex output. Chris@10: Chris@10: %% The audio is always resampled to 44100Hz (if it isn't at that rate Chris@10: %% already) and mixed down to mono. Chris@10: Chris@10: %% The computed CQT parameters actually obtained are: Chris@10: %% 10 octaves Chris@10: %% highest frequency 14700Hz Chris@10: %% lowest frequency 14.5223Hz Chris@10: %% column height 600 (60 bpo * 10 oct) Chris@10: %% But only bins 56:600 are used, the first 55 are dropped, leaving Chris@10: %% 545 bins per column. I *think* the spectrogram is the "right" way Chris@10: %% up at this point so those first 55 bins are the lowest-frequency Chris@10: %% ones, meaning the frequency range actually returned is 27.4144Hz Chris@10: %% to 14700Hz. Chris@5: Chris@5: %% for a 43.5 second 44.1 KHz audio file, intCQT will be a 545x30941 Chris@5: %% array, one column every 0.0014 seconds. Chris@2: [intCQT] = computeCQT(filename); Chris@5: Chris@5: %% X is sampled from intCQT at 7.1128-column intervals, giving Chris@5: %% 4350x545 in this case, so clearly 100 columns per second; then Chris@5: %% transposed Chris@2: X = intCQT(:,round(1:7.1128:size(intCQT,2)))'; Chris@5: Chris@48: %% median filter to remove broadband noise (i.e we filter across Chris@48: %% frequency rather than time) Chris@2: noiseLevel1 = medfilt1(X',40); Chris@2: noiseLevel2 = medfilt1(min(X',noiseLevel1),40); Chris@2: X = max(X-noiseLevel2',0); Chris@5: Chris@5: %% take every 4th row. We had 100 per second (10ms) so this is 40ms as Chris@48: %% the comment says. It's not clear to me why we denoise before doing Chris@48: %% this rather than after? Y is now 1088x545 in our example and looks Chris@48: %% pretty clean as a contour plot. Chris@2: Y = X(1:4:size(X,1),:); % 40ms step Chris@5: Chris@5: %% a 1x1088 array containing the sum of each column. Doesn't appear to Chris@8: %% be used in here, but it is returned to the caller. Chris@2: sumY = sum(Y'); Chris@5: Chris@2: clear('intCQT','X','noiseLevel1','noiseLevel2'); Chris@2: Chris@2: fprintf('%s','done'); Chris@2: fprintf('\n'); Chris@2: fprintf('%s',['Estimating F0s...........']); Chris@2: Chris@2: % For each 2sec segment, perform SIPLCA with fixed W0 Chris@2: ph = zeros(440,size(Y,1)); Chris@2: pz = zeros(88,size(Y,1)); Chris@2: Chris@2: for j=1:floor(size(Y,1)/100) Chris@2: Chris@2: x=[zeros(2,100); Y(1+(j-1)*100:j*100,:)'; zeros(2,100)]; Chris@2: [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity); Chris@2: Chris@2: H=[]; for i=1:88 H=[H; h{i}]; end; Chris@2: ph(:,1+(j-1)*100:j*100) = H; Chris@2: Z=[]; for i=1:88 Z=[Z z{i}]; end; Chris@2: pz(:,1+(j-1)*100:j*100) = Z'; Chris@2: perc = 100*(j/(floor(size(Y,1)/100)+1)); Chris@2: fprintf('\n'); Chris@2: fprintf('%.2f%% complete',perc); Chris@2: end; Chris@2: Chris@2: len=size(Y,1)-j*100; % Final segment Chris@2: Chris@2: if (len >0) Chris@2: x=[zeros(2,len); Y(1+j*100:end,:)'; zeros(2,len)]; Chris@2: [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity); Chris@2: fprintf('\n'); Chris@2: fprintf('100%% complete'); Chris@2: Chris@2: H=[]; for i=1:88 H=[H; h{i}]; end; Chris@2: ph(:,1+j*100:end) = H; Chris@2: Z=[]; for i=1:88 Z=[Z z{i}]; end; Chris@2: pz(:,1+j*100:end) = Z'; Chris@5: end;