mirex-meta-analysis: match_mirex_to_public

annotate match_mirex_to_public_data.m @ 6:e2337cd691b1 tip

Finishing writing the matlab code to replicate all observations made in the article. Added the article to the repository. Renamed the two main scripts ("1-get_mirex_estimates.rb" and "2-generate_smith2013_ismir.m") to not have dashes (since this was annoying within Matlab) Added new Michael Jackson figure.

author	Jordan Smith <jordan.smith@eecs.qmul.ac.uk>
date	Wed, 05 Mar 2014 01:02:26 +0000
parents	92b5a46bc67b
children

rev	line source
jordan@1	1 function [pub2mir, mir2pub, P] = match_mirex_to_public_data(mirex_truth, public_truth, mirex_dset_origin, public_dset_origin, mir2pub_relevance)
jordan@1	2 % function [pub2mir, mir2pub] = match_mirex_to_public_data(mirex_truth,
jordan@1	3 % public_truth, mirex_dset_origin, public_dset_origin, mir2pub_relevance)
jordan@1	4 %
jordan@1	5 % This function looks through the collections of MIREX and PUBLIC annotations
jordan@1	6 % and attempts to find matches between them, i.e., possible public annotations
jordan@1	7 % that could be the same as the MIREX annotations.
jordan@1	8 %
jordan@1	9 % MIREX_TRUTH and PUBLIC_TRUTH are Nx1 and Mx1 structures where N and M are the number
jordan@1	10 % of MIREX and PUBLIC annotations available, respectively. Each element contains fields
jordan@1	11 % TIM, LAB and DSET which give the time points and labels of the annotation, as well as
jordan@1	12 % the index of the dataset.
jordan@1	13 %
jordan@1	14 % MIREX_DSET_ORIGIN and PUBLIC_DSET_ORIGIN contain the same information in the DSET
jordan@1	15 % field, but in an array.
jordan@1	16 %
jordan@1	17 % The output vectors PUB2MIR and MIR2PUB work in the following way. If the nth MIREX and
jordan@1	18 % the mth PUBLIC annotations are found to match, then PUB2MIR(m) = n and MIR2PUB(n) = m.
jordan@1	19 %
jordan@1	20 % MIR2PUB_RELEVANCE contains a simple Px2 array where each row contains (1) the index of
jordan@1	21 % a MIREX dataset and (2) the index of a PUBLIC dataset that are hypothesized to contain
jordan@1	22 % some of the same songs. Including this cuts down on the number of datasets that are
jordan@1	23 % searched for matches.
jordan@1	24 %
jordan@1	25 % For example, if the default values are kept in all the other scripts, then the correct
jordan@1	26 % relevance matches are:
jordan@1	27 %
jordan@1	28 % mir2pub_relevance = [1 2; 1 3; 1 5; 2 1; 2 4; 3 1; 3 4; 4 1; 4 2; 4 3; 4 4; 4 5; 4 6];
jordan@1	29 %
jordan@1	30 % That was based on the following assumptions:
jordan@1	31 % public_dset_origin: 1 = RWC [AIST], 2 = Isophonics, 3 = EP, 4 = IRISA [Euro and RWC], 5 = TUT, 6 = SALAMI
jordan@1	32 % mirex_dset_origin: 1 = 09 [Isophonics, Beatles, EP], 2 = 10a [RWC, boundaries only], 3 = 10b [RWC AIST], 4 = 12 [salami]
jordan@1	33
jordan@1	34
jordan@1	35 pub2mir = zeros(length(public_truth),1);
jordan@1	36 mir2pub = zeros(length(mirex_truth),1);
jordan@1	37
jordan@1	38 % public_dset_origin: 1 = RWC [AIST], 2 = Isophonics, 3 = EP, 4 = IRISA [Euro and RWC], 5 = TUT, 6 = SALAMI
jordan@1	39 % mirex_dset_origin: 1 = 09 [Isophonics, Beatles, EP], 2 = 10a [RWC, boundaries only], 3 = 10b [RWC AIST], 4 = 12 [salami]
jordan@1	40
jordan@1	41 % Look through all the MIREX annotations. For each one, look at the public annotations available.
jordan@1	42 % When you find a song which has the same length (within a second), compare the structures and save the output.
jordan@1	43
jordan@1	44 % We shall do this dataset by dataset. Starting with 2009, with first select the relevant datasets from the MIREX and public domains.
jordan@1	45 rel(1).rel_mir = find(mirex_dset_origin==1);
jordan@1	46 rel(1).rel_pub = find(public_dset_origin(:,1)==2 \| public_dset_origin(:,1)==3 \| public_dset_origin(:,1)==5);
jordan@1	47 rel(2).rel_mir = find(mirex_dset_origin==2);
jordan@1	48 rel(2).rel_pub = find(public_dset_origin(:,1)==4);
jordan@1	49 rel(3).rel_mir = find(mirex_dset_origin==3);
jordan@1	50 rel(3).rel_pub = find(public_dset_origin(:,1)==1);
jordan@1	51 rel(4).rel_mir = find(mirex_dset_origin==4);
jordan@1	52 rel(4).rel_pub = find(public_dset_origin(:,1)==6);
jordan@1	53
jordan@4	54 % The metric is the boundary f-measure. The quality threshold is the minimum value of this metric that we consider to indicate a match. 0.99 is really high!
jordan@1	55 quality_threshes = [.99 0.99 0.99 0.99];
jordan@1	56
jordan@4	57 fprintf('OK, we are going to look through each dataset 3 times, each time with a different length threshold. This is because the matching algorithm is slow and brute-force, and we want to speed it up.\n')
jordan@4	58 fprintf('The first look, we consider every song within 5 seconds of the same length as the target song, and compare the structures.\n')
jordan@4	59 fprintf('The second and third passes consider deviations of 10 and 15 seconds, respectively. But we ignore songs that have already been matched, which speeds things up, see?\n')
jordan@1	60 for K=1:4,
jordan@1	61 rel_mir = rel(K).rel_mir;
jordan@1	62 rel_pub = rel(K).rel_pub;
jordan@1	63 quality_thresh = quality_threshes(K);
jordan@1	64
jordan@1	65 % We maintain a list of the songs that have not been matched yet:
jordan@1	66 unmatched_mirdata = rel_mir;
jordan@1	67 unmatched_pubdata = rel_pub;
jordan@1	68
jordan@1	69 % We also make a matrix to hold the match between all the songs.
jordan@1	70 pwf = zeros(length(rel_mir),length(rel_pub));
jordan@1	71
jordan@1	72 % Run the follow script, optionally several times with increasing values of length_thresh to search more widely.
jordan@1	73 % (We reduce the search space each time, so using a longer threshold becomes more and more feasible on later interations.)
jordan@4	74 fprintf('Looking at dataset %i. First pass.\n',K)
jordan@1	75 length_thresh = 5;
jordan@1	76 [mir2pub pub2mir pwf] = match_mirex_to_public_data_macro(mir2pub, pub2mir, pwf, mirex_truth, public_truth, rel_mir, rel_pub, length_thresh, quality_thresh);
jordan@4	77 fprintf('Looking at dataset %i. Second pass.\n',K)
jordan@1	78 length_thresh = 10;
jordan@1	79 [mir2pub pub2mir pwf] = match_mirex_to_public_data_macro(mir2pub, pub2mir, pwf, mirex_truth, public_truth, rel_mir, rel_pub, length_thresh, quality_thresh);
jordan@4	80 fprintf('Looking at dataset %i. Third pass.\n',K)
jordan@1	81 length_thresh = 15;
jordan@1	82 [mir2pub pub2mir pwf] = match_mirex_to_public_data_macro(mir2pub, pub2mir, pwf, mirex_truth, public_truth, rel_mir, rel_pub, length_thresh, quality_thresh);
jordan@1	83
jordan@1	84 % The variable P will contain the quality of the matches between all the songs tested.
jordan@1	85 P(K).pwf = pwf;
jordan@1	86 end
jordan@4	87 fprintf('\nOK, done matching! Phew.\n')
jordan@1	88
jordan@1	89 % That was a lot of searching... We do not want to do it twice! Save the output.
jordan@4	90 fprintf('Saving the output to ./match_mirex_to_public_data_results so that you do not have to repeat this step again.\n\n')
jordan@1	91 save('./match_mirex_to_public_data_results','pub2mir','mir2pub','P');
jordan@1	92
jordan@1	93
jordan@4	94 fprintf('Here is the first thing reported in the article: a table of how many matches you obtained.\n\n')
jordan@1	95 % % Bonus work for Table 2:
jordan@1	96 % How many MIREX songs did I find a match for in each category?
jordan@1	97 fprintf('MIREX dataset......number of pieces.....number identified\n\n')
jordan@1	98 for K=1:4,
jordan@1	99 % This is the number of MIREX songs identified with public annotations.
jordan@1	100 tmp = sum(mir2pub(find(mirex_dset_origin==K))>0);
jordan@1	101 fprintf('Dataset %i .. %i .. %i\n',K,length(find(mirex_dset_origin==K)),tmp)
jordan@1	102 end
jordan@1	103 % Aslo, how many public annotations did I find a match for?
jordan@1	104 for K=1:6,
jordan@1	105 % This is the number of public songs that occurred in MIREX.
jordan@1	106 sum(pub2mir(find(public_dset_origin(:,1)==K))>0)
jordan@1	107 end
jordan@1	108 %
jordan@1	109 % mir2pub(find(mirex_dset_origin==2))
jordan@1	110 %
jordan@1	111 % % Confirm that the songs we are matching are actually the same:
jordan@1	112 % mir_id = find(pub2mir,1); % find the first matching public song
jordan@1	113 % pub_id = mir2pub(mir_id);
jordan@1	114 % mirex_truth(mir_id).tim
jordan@1	115 % public_truth(pub_id).tim
jordan@1	116 % % Are they the same? If not, something is going wrong!
jordan@1	117 %
jordan@1	118 % % How to identify a MIREX song based on its public match:
jordan@1	119 % % Thankfully, we retained the filenames of the public data.
jordan@1	120 % mir_id = find(pub2mir,1); % find the first matching public song
jordan@1	121 % pub_id = mir2pub(mir_id);
jordan@1	122 % mirex_truth(mir_id).file
jordan@1	123 % public_truth(pub_id).file

Mercurial > hg > mirex-meta-analysis

annotate match_mirex_to_public_data.m @ 6:e2337cd691b1 tip