Chris@2
|
1 function [ph pz sumY] = transcriptionMultipleTemplates(filename,iter,sz,su)
|
Chris@2
|
2
|
Chris@2
|
3
|
Chris@2
|
4 % Load note templates
|
Chris@2
|
5 load('noteTemplatesBassoon'); W(:,:,1) = noteTemplatesBassoon;
|
Chris@2
|
6 load('noteTemplatesCello'); W(:,:,2) = noteTemplatesCello;
|
Chris@2
|
7 load('noteTemplatesClarinet'); W(:,:,3) = noteTemplatesClarinet;
|
Chris@2
|
8 load('noteTemplatesFlute'); W(:,:,4) = noteTemplatesFlute;
|
Chris@2
|
9 load('noteTemplatesGuitar'); W(:,:,5) = noteTemplatesGuitar;
|
Chris@2
|
10 load('noteTemplatesHorn'); W(:,:,6) = noteTemplatesHorn;
|
Chris@2
|
11 load('noteTemplatesOboe'); W(:,:,7) = noteTemplatesOboe;
|
Chris@2
|
12 load('noteTemplatesTenorSax'); W(:,:,8) = noteTemplatesTenorSax;
|
Chris@2
|
13 load('noteTemplatesViolin'); W(:,:,9) = noteTemplatesViolin;
|
Chris@9
|
14
|
Chris@9
|
15 %% SptkBGCl -> piano (it stands for Sampletek Steinway "Black Grand").
|
Chris@9
|
16 %% It took me a while to figure this out! (It is documented in the
|
Chris@9
|
17 %% MAPS database)
|
Chris@2
|
18 load('noteTemplatesSptkBGCl'); W(:,:,10) = noteTemplatesSptkBGCl;
|
Chris@2
|
19
|
Chris@2
|
20 %pitchActivity = [14 16 30 40 20 21 38 24 35 1; 52 61 69 76 56 57 71 55 80 88]';
|
Chris@2
|
21 pitchActivity = [16 16 30 40 20 21 38 24 35 16; 52 61 69 73 56 57 71 55 73 73]';
|
Chris@2
|
22
|
Chris@2
|
23
|
Chris@5
|
24 %% this turns W0 into a 10x88 cell array in which W0{instrument}{note}
|
Chris@5
|
25 %% is the 545x1 template for the given instrument and note number.
|
Chris@2
|
26 W = permute(W,[2 1 3]);
|
Chris@2
|
27 W0 = squeeze(num2cell(W,1))';
|
Chris@5
|
28
|
Chris@2
|
29 clear('noteTemplatesBassoon','noteTemplatesCello','noteTemplatesClarinet','noteTemplatesFlute','noteTemplatesGuitar',...
|
Chris@2
|
30 'noteTemplatesHorn','noteTemplatesOboe','noteTemplatesTenorSax','noteTemplatesViolin','noteTemplatesSptkBGCl','W');
|
Chris@2
|
31
|
Chris@2
|
32
|
Chris@2
|
33 % Compute CQT
|
Chris@5
|
34
|
Chris@5
|
35 %% The CQT parameters are hardcoded in computeCQT. It has frequency
|
Chris@5
|
36 %% range 27.5 -> samplerate/3, 60 bins per octave, a 'q' of 0.8 (lower
|
Chris@5
|
37 %% than the maximum, and default, value of 1), 'atomHopFactor' 0.3
|
Chris@5
|
38 %% rather than the default 0.25 (why?), Hann window, default sparsity
|
Chris@10
|
39 %% threshold. The CQT obtained is the interpolated real-valued
|
Chris@10
|
40 %% magnitude spectrogram rather than the complex output.
|
Chris@10
|
41
|
Chris@10
|
42 %% The audio is always resampled to 44100Hz (if it isn't at that rate
|
Chris@10
|
43 %% already) and mixed down to mono.
|
Chris@10
|
44
|
Chris@10
|
45 %% The computed CQT parameters actually obtained are:
|
Chris@10
|
46 %% 10 octaves
|
Chris@10
|
47 %% highest frequency 14700Hz
|
Chris@10
|
48 %% lowest frequency 14.5223Hz
|
Chris@10
|
49 %% column height 600 (60 bpo * 10 oct)
|
Chris@10
|
50 %% But only bins 56:600 are used, the first 55 are dropped, leaving
|
Chris@10
|
51 %% 545 bins per column. I *think* the spectrogram is the "right" way
|
Chris@10
|
52 %% up at this point so those first 55 bins are the lowest-frequency
|
Chris@10
|
53 %% ones, meaning the frequency range actually returned is 27.4144Hz
|
Chris@10
|
54 %% to 14700Hz.
|
Chris@5
|
55
|
Chris@5
|
56 %% for a 43.5 second 44.1 KHz audio file, intCQT will be a 545x30941
|
Chris@5
|
57 %% array, one column every 0.0014 seconds.
|
Chris@2
|
58 [intCQT] = computeCQT(filename);
|
Chris@5
|
59
|
Chris@5
|
60 %% X is sampled from intCQT at 7.1128-column intervals, giving
|
Chris@5
|
61 %% 4350x545 in this case, so clearly 100 columns per second; then
|
Chris@5
|
62 %% transposed
|
Chris@2
|
63 X = intCQT(:,round(1:7.1128:size(intCQT,2)))';
|
Chris@5
|
64
|
Chris@48
|
65 %% median filter to remove broadband noise (i.e we filter across
|
Chris@48
|
66 %% frequency rather than time)
|
Chris@2
|
67 noiseLevel1 = medfilt1(X',40);
|
Chris@2
|
68 noiseLevel2 = medfilt1(min(X',noiseLevel1),40);
|
Chris@2
|
69 X = max(X-noiseLevel2',0);
|
Chris@5
|
70
|
Chris@5
|
71 %% take every 4th row. We had 100 per second (10ms) so this is 40ms as
|
Chris@48
|
72 %% the comment says. It's not clear to me why we denoise before doing
|
Chris@48
|
73 %% this rather than after? Y is now 1088x545 in our example and looks
|
Chris@48
|
74 %% pretty clean as a contour plot.
|
Chris@2
|
75 Y = X(1:4:size(X,1),:); % 40ms step
|
Chris@5
|
76
|
Chris@5
|
77 %% a 1x1088 array containing the sum of each column. Doesn't appear to
|
Chris@8
|
78 %% be used in here, but it is returned to the caller.
|
Chris@2
|
79 sumY = sum(Y');
|
Chris@5
|
80
|
Chris@2
|
81 clear('intCQT','X','noiseLevel1','noiseLevel2');
|
Chris@2
|
82
|
Chris@2
|
83 fprintf('%s','done');
|
Chris@2
|
84 fprintf('\n');
|
Chris@2
|
85 fprintf('%s',['Estimating F0s...........']);
|
Chris@2
|
86
|
Chris@2
|
87 % For each 2sec segment, perform SIPLCA with fixed W0
|
Chris@2
|
88 ph = zeros(440,size(Y,1));
|
Chris@2
|
89 pz = zeros(88,size(Y,1));
|
Chris@2
|
90
|
Chris@2
|
91 for j=1:floor(size(Y,1)/100)
|
Chris@2
|
92
|
Chris@2
|
93 x=[zeros(2,100); Y(1+(j-1)*100:j*100,:)'; zeros(2,100)];
|
Chris@2
|
94 [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity);
|
Chris@2
|
95
|
Chris@2
|
96 H=[]; for i=1:88 H=[H; h{i}]; end;
|
Chris@2
|
97 ph(:,1+(j-1)*100:j*100) = H;
|
Chris@2
|
98 Z=[]; for i=1:88 Z=[Z z{i}]; end;
|
Chris@2
|
99 pz(:,1+(j-1)*100:j*100) = Z';
|
Chris@2
|
100 perc = 100*(j/(floor(size(Y,1)/100)+1));
|
Chris@2
|
101 fprintf('\n');
|
Chris@2
|
102 fprintf('%.2f%% complete',perc);
|
Chris@2
|
103 end;
|
Chris@2
|
104
|
Chris@2
|
105 len=size(Y,1)-j*100; % Final segment
|
Chris@2
|
106
|
Chris@2
|
107 if (len >0)
|
Chris@2
|
108 x=[zeros(2,len); Y(1+j*100:end,:)'; zeros(2,len)];
|
Chris@2
|
109 [w,h,z,u,xa] = cplcaMT( x, 88, [545 1], 10, W0, [], [], [], iter, 1, 1, sz, su, 0, 1, 1, 1, pitchActivity);
|
Chris@2
|
110 fprintf('\n');
|
Chris@2
|
111 fprintf('100%% complete');
|
Chris@2
|
112
|
Chris@2
|
113 H=[]; for i=1:88 H=[H; h{i}]; end;
|
Chris@2
|
114 ph(:,1+j*100:end) = H;
|
Chris@2
|
115 Z=[]; for i=1:88 Z=[Z z{i}]; end;
|
Chris@2
|
116 pz(:,1+j*100:end) = Z';
|
Chris@5
|
117 end;
|