annotate query.txt @ 770:c54bc2ffbf92 tip

update tags
author convert-repo
date Fri, 16 Dec 2011 11:34:01 +0000
parents 342822c2d49a
children
rev   line source
mas01cr@498 1 Currently supported query types:
mas01cr@498 2
mas01cr@498 3 O2_POINT_QUERY
mas01cr@498 4 * dot_product
mas01cr@498 5
mas01cr@498 6 Find and report, from the database, up to "pointNN"
mas01cr@498 7 near-neighbours of length-1 query sequences.
mas01cr@498 8
mas01cr@498 9 O2_TRACK_QUERY
mas01cr@498 10 * dot_product
mas01cr@498 11
mas01cr@498 12 Find, in each track, up to "pointNN" near-neighbours of length-1
mas01cr@498 13 query sequences, reporting the top "trackNN" tracks, ordered by
mas01cr@498 14 the average distance of the pairwise matches.
mas01cr@498 15
mas01cr@498 16 O2_SEQUENCE_QUERY
mas01cr@498 17 - radius, + radius
mas01cr@498 18 * euclidean_normed, euclidean
mas01cr@498 19
mas01cr@498 20 O2_N_SEQUENCE_QUERY
mas01cr@498 21 - radius, + radius
mas01cr@498 22 * euclidean_normed, euclidean
mas01cr@498 23
mas01cr@498 24 Find, in each track, up to "pointNN" near-neighbours of query
mas01cr@498 25 sequences. Report the results from the "trackNN" top tracks,
mas01cr@498 26 where the tracks are ordered by the average distance of the
mas01cr@498 27 retrieved pairwise matches. The difference between SEQUENCE and
mas01cr@498 28 N_SEQUENCE is that the SEQUENCE case reports only the average,
mas01cr@498 29 while the N_SEQUENCE reports the individual points too.
mas01cr@498 30
mas01cr@498 31 (Ordering by average is arbitrary, and it's not hard to construct
mas01cr@498 32 cases where it is suboptimal. The two cases where it is not
mas01cr@498 33 arbitrary are when pointNN is 1, and when trackNN is equal to the
mas01cr@498 34 number of files in the database.)
mas01cr@498 35
mas01cr@498 36 O2_ONE_TO_ONE_N_SEQUENCE_QUERY
mas01cr@498 37 + radius
mas01cr@498 38 * euclidean_normed
mas01cr@498 39
mas01cr@498 40 For all applicable query sequences, find and report the closest
mas01cr@498 41 target instance point. Each query sequence is responsible for
mas01cr@498 42 exactly one result.
mas01cr@498 43
mas01cr@498 44 (This feels like it should be more orthogonal than a separate
mas01cr@498 45 query type; the restriction on using a target instance point only
mas01cr@498 46 once in a match seems like it should compose with the sequencing
mas01cr@498 47 query above.)
mas01cr@498 48
mas01cr@498 49 Plan:
mas01cr@498 50
mas01cr@498 51 We have
mas01cr@498 52
mas01cr@498 53 reporter->add_point(),
mas01cr@498 54 reporter->report().
mas01cr@498 55
mas01cr@498 56 Insert into the whole shebang a new class Accumulator, with methods
mas01cr@498 57
mas01cr@498 58 void accumulator->add_point()
mas01cr@498 59 adb_query_results *accumulator->get_points()
mas01cr@498 60
mas01cr@498 61 The accumulator has to be responsible for keeping track of how many
mas01cr@498 62 points (total, or per track) there are so far; ->get_points() has to
mas01cr@498 63 make the final decision about which points to preserve. So sadly we
mas01cr@498 64 can't be completely on the side of the angels and have only one single
mas01cr@498 65 accumulator class, as POINT_QUERY is different from all the others.
mas01cr@498 66 (Though maybe we can with a suitably careful use of the "if"
mas01cr@498 67 construct).
mas01cr@498 68
mas01cr@498 69 We don't have to alter the Reporter class at all. The query loop goes
mas01cr@498 70 roughly
mas01cr@498 71
mas01cr@498 72 choose point pair
mas01cr@498 73 if(everything OK with point pair)
mas01cr@498 74 accumulator->add_point()
mas01cr@498 75 loop
mas01cr@498 76
mas01cr@498 77 results = accumulator->get_points()
mas01cr@498 78
mas01cr@498 79 for matches in results
mas01cr@498 80 reporter->add_point(match)
mas01cr@498 81 loop
mas01cr@498 82
mas01cr@498 83 reporter->report()
mas01cr@498 84
mas01cr@498 85 This separation is engineered (ha) such that everything after the last
mas01cr@498 86 use of the accumulator doesn't need to be in libaudiodb; the return
mas01cr@498 87 value from audiodb_query() can be "results" in the above, and then the
mas01cr@498 88 command-line binary and SOAP server can do whatever weird mangling to
mas01cr@498 89 the results they want to.
mas01cr@498 90
mas01cr@498 91 We still need to be careful in the accumulator to defend against some
mas01cr@498 92 of the weird things that our query implementation might choose to do:
mas01cr@498 93 insert the same hit multiple times or some such.