annotate notes/transcriptionMultipleTemplates-annotated.m @ 116:91bb029a847a timing

Reorder the calculations to match the series of vector operations in the most recent bqvec code, just in case it's the order of vector calculations that is saving the time rather than the avoidance of std::vector
author Chris Cannam
date Wed, 07 May 2014 09:57:19 +0100
parents 1a4cab304d68
children
rev   line source
Chris@2 1 function [ph pz sumY] = transcriptionMultipleTemplates(filename,iter,sz,su)
Chris@2 2
Chris@2 3
Chris@2 4 % Load note templates
Chris@2 5 load('noteTemplatesBassoon'); W(:,:,1) = noteTemplatesBassoon;
Chris@2 6 load('noteTemplatesCello'); W(:,:,2) = noteTemplatesCello;
Chris@2 7 load('noteTemplatesClarinet'); W(:,:,3) = noteTemplatesClarinet;
Chris@2 8 load('noteTemplatesFlute'); W(:,:,4) = noteTemplatesFlute;
Chris@2 9 load('noteTemplatesGuitar'); W(:,:,5) = noteTemplatesGuitar;
Chris@2 10 load('noteTemplatesHorn'); W(:,:,6) = noteTemplatesHorn;
Chris@2 11 load('noteTemplatesOboe'); W(:,:,7) = noteTemplatesOboe;
Chris@2 12 load('noteTemplatesTenorSax'); W(:,:,8) = noteTemplatesTenorSax;
Chris@2 13 load('noteTemplatesViolin'); W(:,:,9) = noteTemplatesViolin;
Chris@9 14
Chris@9 15 %% SptkBGCl -> piano (it stands for Sampletek Steinway "Black Grand").
Chris@9 16 %% It took me a while to figure this out! (It is documented in the
Chris@9 17 %% MAPS database)
Chris@2 18 load('noteTemplatesSptkBGCl'); W(:,:,10) = noteTemplatesSptkBGCl;
Chris@2 19
Chris@2 20 %pitchActivity = [14 16 30 40 20 21 38 24 35 1; 52 61 69 76 56 57 71 55 80 88]';
Chris@2 21 pitchActivity = [16 16 30 40 20 21 38 24 35 16; 52 61 69 73 56 57 71 55 73 73]';
Chris@2 22
Chris@2 23
Chris@5 24 %% this turns W0 into a 10x88 cell array in which W0{instrument}{note}
Chris@5 25 %% is the 545x1 template for the given instrument and note number.
Chris@2 26 W = permute(W,[2 1 3]);
Chris@2 27 W0 = squeeze(num2cell(W,1))';
Chris@5 28
Chris@2 29 clear('noteTemplatesBassoon','noteTemplatesCello','noteTemplatesClarinet','noteTemplatesFlute','noteTemplatesGuitar',...
Chris@2 30 'noteTemplatesHorn','noteTemplatesOboe','noteTemplatesTenorSax','noteTemplatesViolin','noteTemplatesSptkBGCl','W');
Chris@2 31
Chris@2 32
Chris@2 33 % Compute CQT
Chris@5 34
Chris@5 35 %% The CQT parameters are hardcoded in computeCQT. It has frequency
Chris@5 36 %% range 27.5 -> samplerate/3, 60 bins per octave, a 'q' of 0.8 (lower
Chris@5 37 %% than the maximum, and default, value of 1), 'atomHopFactor' 0.3
Chris@5 38 %% rather than the default 0.25 (why?), Hann window, default sparsity
Chris@10 39 %% threshold. The CQT obtained is the interpolated real-valued
Chris@10 40 %% magnitude spectrogram rather than the complex output.
Chris@10 41
Chris@10 42 %% The audio is always resampled to 44100Hz (if it isn't at that rate
Chris@10 43 %% already) and mixed down to mono.
Chris@10 44
Chris@10 45 %% The computed CQT parameters actually obtained are:
Chris@10 46 %% 10 octaves
Chris@10 47 %% highest frequency 14700Hz
Chris@10 48 %% lowest frequency 14.5223Hz
Chris@10 49 %% column height 600 (60 bpo * 10 oct)
Chris@10 50 %% But only bins 56:600 are used, the first 55 are dropped, leaving
Chris@10 51 %% 545 bins per column. I *think* the spectrogram is the "right" way
Chris@10 52 %% up at this point so those first 55 bins are the lowest-frequency
Chris@10 53 %% ones, meaning the frequency range actually returned is 27.4144Hz
Chris@10 54 %% to 14700Hz.
Chris@5 55
Chris@5 56 %% for a 43.5 second 44.1 KHz audio file, intCQT will be a 545x30941
Chris@5 57 %% array, one column every 0.0014 seconds.
Chris@2 58 [intCQT] = computeCQT(filename);
Chris@5 59
Chris@5 60 %% X is sampled from intCQT at 7.1128-column intervals, giving
Chris@5 61 %% 4350x545 in this case, so clearly 100 columns per second; then
Chris@5 62 %% transposed
Chris@2 63 X = intCQT(:,round(1:7.1128:size(intCQT,2)))';
Chris@5 64
Chris@48 65 %% median filter to remove broadband noise (i.e we filter across
Chris@48 66 %% frequency rather than time)
Chris@2 67 noiseLevel1 = medfilt1(X',40);
Chris@2 68 noiseLevel2 = medfilt1(min(X',noiseLevel1),40);
Chris@2 69 X = max(X-noiseLevel2',0);
Chris@5 70
Chris@5 71 %% take every 4th row. We had 100 per second (10ms) so this is 40ms as
Chris@48 72 %% the comment says. It's not clear to me why we denoise before doing
Chris@48 73 %% this rather than after? Y is now 1088x545 in our example and looks
Chris@48 74 %% pretty clean as a contour plot.
Chris@2 75 Y = X(1:4:size(X,1),:); % 40ms step
Chris@5 76
Chris@5 77 %% a 1x1088 array containing the sum of each column. Doesn't appear to
Chris@8 78 %% be used in here, but it is returned to the caller.
Chris@2 79 sumY = sum(Y');
Chris@5 80
Chris@2 81 clear('intCQT','X','noiseLevel1','noiseLevel2');
Chris@2 82
Chris@2 83 fprintf('%s','done');
Chris@2 84 fprintf('\n');
Chris@2 85 fprintf('%s',['Estimating F0s...........']);
Chris@2 86
Chris@2 87 % For each 2sec segment, perform SIPLCA with fixed W0
Chris@2 88 ph = zeros(440,size(Y,1));
Chris@2 89 pz = zeros(88,size(Y,1));
Chris@2 90
Chris@2 91 for j=1:floor(size(Y,1)/100)
Chris@2 92
Chris@2 93 x=[zeros(2,100); Y(1+(j-1)*100:j*100,:)'; zeros(2,100)];
Chris@2 94 [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity);
Chris@2 95
Chris@2 96 H=[]; for i=1:88 H=[H; h{i}]; end;
Chris@2 97 ph(:,1+(j-1)*100:j*100) = H;
Chris@2 98 Z=[]; for i=1:88 Z=[Z z{i}]; end;
Chris@2 99 pz(:,1+(j-1)*100:j*100) = Z';
Chris@2 100 perc = 100*(j/(floor(size(Y,1)/100)+1));
Chris@2 101 fprintf('\n');
Chris@2 102 fprintf('%.2f%% complete',perc);
Chris@2 103 end;
Chris@2 104
Chris@2 105 len=size(Y,1)-j*100; % Final segment
Chris@2 106
Chris@2 107 if (len >0)
Chris@2 108 x=[zeros(2,len); Y(1+j*100:end,:)'; zeros(2,len)];
Chris@2 109 [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity);
Chris@2 110 fprintf('\n');
Chris@2 111 fprintf('100%% complete');
Chris@2 112
Chris@2 113 H=[]; for i=1:88 H=[H; h{i}]; end;
Chris@2 114 ph(:,1+j*100:end) = H;
Chris@2 115 Z=[]; for i=1:88 Z=[Z z{i}]; end;
Chris@2 116 pz(:,1+j*100:end) = Z';
Chris@5 117 end;