annotate query.txt @ 497:9d8aee621afb api-inversion

More libtests fixups. Include audiodb_close() calls everywhere (whoops). Add the facility to run tests under valgrind. Unfortunately the error-exitcode flag doesn't actually cause an error exit if the only thing wrong is memory leaks, but it will if there are actual memory errors, which is a start.
author mas01cr
date Sat, 10 Jan 2009 16:07:43 +0000
parents c52561457dcd
children
rev   line source
mas01cr@417 1 Currently supported query types:
mas01cr@417 2
mas01cr@417 3 O2_POINT_QUERY
mas01cr@417 4 * dot_product
mas01cr@417 5
mas01cr@417 6 Find and report, from the database, up to "pointNN"
mas01cr@417 7 near-neighbours of length-1 query sequences.
mas01cr@417 8
mas01cr@417 9 O2_TRACK_QUERY
mas01cr@417 10 * dot_product
mas01cr@417 11
mas01cr@417 12 Find, in each track, up to "pointNN" near-neighbours of length-1
mas01cr@417 13 query sequences, reporting the top "trackNN" tracks, ordered by
mas01cr@417 14 the average distance of the pairwise matches.
mas01cr@417 15
mas01cr@417 16 O2_SEQUENCE_QUERY
mas01cr@417 17 - radius, + radius
mas01cr@417 18 * euclidean_normed, euclidean
mas01cr@417 19
mas01cr@417 20 O2_N_SEQUENCE_QUERY
mas01cr@417 21 - radius, + radius
mas01cr@417 22 * euclidean_normed, euclidean
mas01cr@417 23
mas01cr@417 24 Find, in each track, up to "pointNN" near-neighbours of query
mas01cr@417 25 sequences. Report the results from the "trackNN" top tracks,
mas01cr@417 26 where the tracks are ordered by the average distance of the
mas01cr@417 27 retrieved pairwise matches. The difference between SEQUENCE and
mas01cr@417 28 N_SEQUENCE is that the SEQUENCE case reports only the average,
mas01cr@417 29 while the N_SEQUENCE reports the individual points too.
mas01cr@417 30
mas01cr@417 31 (Ordering by average is arbitrary, and it's not hard to construct
mas01cr@417 32 cases where it is suboptimal. The two cases where it is not
mas01cr@417 33 arbitrary are when pointNN is 1, and when trackNN is equal to the
mas01cr@417 34 number of files in the database.)
mas01cr@417 35
mas01cr@417 36 O2_ONE_TO_ONE_N_SEQUENCE_QUERY
mas01cr@417 37 + radius
mas01cr@417 38 * euclidean_normed
mas01cr@417 39
mas01cr@417 40 For all applicable query sequences, find and report the closest
mas01cr@417 41 target instance point. Each query sequence is responsible for
mas01cr@417 42 exactly one result.
mas01cr@417 43
mas01cr@417 44 (This feels like it should be more orthogonal than a separate
mas01cr@417 45 query type; the restriction on using a target instance point only
mas01cr@417 46 once in a match seems like it should compose with the sequencing
mas01cr@417 47 query above.)
mas01cr@417 48
mas01cr@417 49 Plan:
mas01cr@417 50
mas01cr@417 51 We have
mas01cr@417 52
mas01cr@417 53 reporter->add_point(),
mas01cr@417 54 reporter->report().
mas01cr@417 55
mas01cr@417 56 Insert into the whole shebang a new class Accumulator, with methods
mas01cr@417 57
mas01cr@417 58 void accumulator->add_point()
mas01cr@417 59 adb_query_results *accumulator->get_points()
mas01cr@417 60
mas01cr@417 61 The accumulator has to be responsible for keeping track of how many
mas01cr@417 62 points (total, or per track) there are so far; ->get_points() has to
mas01cr@417 63 make the final decision about which points to preserve. So sadly we
mas01cr@417 64 can't be completely on the side of the angels and have only one single
mas01cr@417 65 accumulator class, as POINT_QUERY is different from all the others.
mas01cr@417 66 (Though maybe we can with a suitably careful use of the "if"
mas01cr@417 67 construct).
mas01cr@417 68
mas01cr@417 69 We don't have to alter the Reporter class at all. The query loop goes
mas01cr@417 70 roughly
mas01cr@417 71
mas01cr@417 72 choose point pair
mas01cr@417 73 if(everything OK with point pair)
mas01cr@417 74 accumulator->add_point()
mas01cr@417 75 loop
mas01cr@417 76
mas01cr@417 77 results = accumulator->get_points()
mas01cr@417 78
mas01cr@417 79 for matches in results
mas01cr@417 80 reporter->add_point(match)
mas01cr@417 81 loop
mas01cr@417 82
mas01cr@417 83 reporter->report()
mas01cr@417 84
mas01cr@417 85 This separation is engineered (ha) such that everything after the last
mas01cr@417 86 use of the accumulator doesn't need to be in libaudiodb; the return
mas01cr@417 87 value from audiodb_query() can be "results" in the above, and then the
mas01cr@417 88 command-line binary and SOAP server can do whatever weird mangling to
mas01cr@417 89 the results they want to.
mas01cr@417 90
mas01cr@417 91 We still need to be careful in the accumulator to defend against some
mas01cr@417 92 of the weird things that our query implementation might choose to do:
mas01cr@417 93 insert the same hit multiple times or some such.