annotate docs/TODO.txt @ 249:1da9a9ed55a3

Slightly refactored the new trackSequenceQueryNNReporter so that it is a derived class of trackAveragingReporter. This reduces code duplication significantly. The reporter is still accessed via the nsequence QUERY directive from the command line.
author mas01mc
date Sun, 17 Feb 2008 16:39:57 +0000
parents 10bcea4e5c40
children
rev   line source
mas01cr@93 1 * development of functionality
mas01cr@93 2
mas01cr@103 3 ** exposure of all non-write functions over Web Services
mas01cr@103 4
mas01cr@103 5 At present, the radius / counting query type isn't supported over
mas01cr@103 6 SOAP. Supporting it involves changing the adb__query() exported
mas01cr@168 7 function, so we need to be careful; or at least defining a new
mas01cr@168 8 exported interface, at which point perhaps it should be designed
mas01cr@168 9 rather than accreted... (probably this will come naturally after the
mas01cr@168 10 following TODO item)
mas01cr@103 11
mas01cr@93 12 ** matrix of possible queries
mas01cr@93 13
mas01cr@93 14 At the moment, there are four content-based query types, each of which
mas01cr@93 15 does something slightly different from what you might expect from its
mas01cr@93 16 name. I think that the space of possible (sensible?) queries is
mas01cr@93 17 larger than this -- though working out the sensible abstraction might
mas01cr@93 18 have to wait for more use cases -- and also that the orthogonality of
mas01cr@93 19 various parameters is missing. (e.g. a silence threshold should be
mas01cr@93 20 applied to all queries or none, if it makes sense at all.)
mas01cr@93 21
mas01cr@93 22 Additionally, query by key (filename) might be important.
mas01cr@93 23
mas01cr@93 24 ** results
mas01cr@93 25
mas01cr@93 26 Need to sort out what the results mean; is it a similarity or a
mas01cr@93 27 distance score, etc. Also, is it possible to support NN queries in a
mas01cr@93 28 non-Euclidean space?
mas01cr@93 29
mas01cr@93 30 ** SOAP / URIs
mas01cr@93 31
mas01cr@93 32 At the moment, the query and database are referred to by paths naming
mas01cr@93 33 files on the SOAP server's filesystem. This makes a limited amount of
mas01cr@93 34 sense for the database (though exposing implementation details of
mas01cr@93 35 ISMS's file system is not a great idea) but makes no sense at all for
mas01cr@93 36 the query. So we need to define a query data structure that can be
mas01cr@93 37 serialised (preferably automatically) by SOAP for use in queries.
mas01cr@93 38
mas01cr@93 39 If we ever support inserting or other write functionality over SOAP,
mas01cr@93 40 this will need doing for feature files (the same as queries) and for
mas01cr@93 41 key lists too.
mas01cr@93 42
mas01cr@93 43 ** Memory management tricks
mas01cr@93 44
mas01cr@93 45 We have a friendly memory access pattern (at least on Unixoids;
mas01cr@93 46 Win32's API isn't a great match for mmap(), so it is significantly
mas01cr@93 47 slower there). Investigate whether madvise() tricks improve
mas01cr@93 48 performance on any OSes. Also, maybe investigate a specialized use of
mas01cr@93 49 GetViewOfFile on win32 to make it tolerable on that platform.
mas01cr@93 50
mas01cr@93 51 ** LSH
mas01cr@93 52
mas01cr@93 53 Integrate the LSH indexing with the database. Can it be done as a
mas01cr@93 54 separate index file, created on demand? What are we trying to
mas01cr@93 55 optimize our on-disk format for, and can it be better optimized by
mas01cr@93 56 having multiple files?
mas01cr@93 57
mas01cr@93 58 ** RDF (not necessarily related to audioDB)
mas01cr@93 59
mas01cr@93 60 Export the results of our experiments (kept in an SQL database) as
mas01cr@93 61 RDF, so that people can infer stuff if they know enough about our
mas01cr@93 62 methods.
mas01cr@93 63
mas01cr@93 64 Possibly also write an export routine for exporting an audioDB as RDF.
mas01cr@93 65 And laugh hollowly as XML parsers fail completely to ingest such a
mas01cr@93 66 monstrous file.
mas01cr@93 67
mas01cr@93 68 * architectural issues
mas01cr@93 69
mas01cr@139 70 ** more safety
mas01cr@139 71
mas01cr@170 72 A couple of areas are not yet safe against runtime faults.
mas01cr@170 73
mas01cr@170 74 *** Large databases might well end up writing off the end of the
mas01cr@170 75 various tables (e.g. track, l2norm).
mas01cr@170 76
mas01cr@170 77 *** transactionality is important; the last thing that should be
mas01cr@170 78 updated on insert are the free pointers (dbH->length,
mas01cr@170 79 dbH->numFiles, maybe others), so that if something goes wrong in
mas01cr@170 80 the meantime the database is not in an inconsistent state.
mas01cr@139 81
mas01cr@93 82 ** API vs command-line
mas01cr@93 83
mas01cr@93 84 While having a command line interface is nice, having the only way to
mas01cr@93 85 initialize a new audioDB instance being by faking up enough of a
mas01cr@93 86 command line to call our wacky constructors is less nice.
mas01cr@93 87 Furthermore, having the "business logic" run by the constructor is
mas01cr@93 88 also a little bit weird.
mas01cr@93 89
mas01cr@93 90 * regression (and other) tests
mas01cr@93 91
mas01cr@93 92 ** Command line interface
mas01cr@93 93
mas01cr@93 94 There is now broad coverage of the audioDB logic, with the major
mas01cr@93 95 exceptions of the batch insert command, and the specifying of
mas01cr@93 96 different keys on import.
mas01cr@93 97
mas01cr@93 98 ** SOAP
mas01cr@93 99
mas01cr@93 100 The shell's support for wait() and equivalents is limited, so there
mas01cr@93 101 are "sleep 1"s dotted around to attempt to avoid race conditions.
mas01cr@93 102 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
mas01cr@93 103 that ought not to be necessary just to run the same test twice...
mas01cr@93 104
mas01cr@93 105 ** Locking
mas01cr@93 106
mas01cr@93 107 The fcntl() locking should be good enough for our uses. Investigate
mas01cr@93 108 whether it is in fact robust enough (including that EAGAIN workaround
mas01cr@93 109 for OS X; read the kernel source to find out where that's coming from
mas01cr@93 110 and report it if possible).
mas01cr@93 111
mas01cr@93 112 ** Benchmarks
mas01cr@93 113
mas01cr@93 114 Get together a realistic set of usage cases, preferably testing each
mas01cr@93 115 of the query types, and benchmark them automatically. This is
mas01cr@93 116 basically a prerequisite of any performance work.
mas01cr@93 117
mas01cr@93 118 * Michael's old TODO list
mas01cr@14 119
mas01cr@14 120 audioDB FIXME:
mas01cr@14 121
mas01cr@14 122 o fix segfault when query is zero-length
mas01mc@20 123 :-) DONE use periodic memunmap on batch insert
mas01cr@14 124 o allow keys to be passed as queries
mas01mc@20 125 :-) DONE rename 'segments' to 'tracks' in code and help files.
mas01cr@14 126 o test suite
mas01cr@14 127 o SOAP to serialize queryFile and keyList
mas01cr@14 128 o SOAP to serialize files on insert / batch insert ?
mas01cr@33 129 :-) DONE don't overwrite existing files on db create
mas01cr@34 130 :-) DONE implement fcntl()-based locking.
mas01cr@35 131 o test locking discipline (particularly over NFS between heterogenous clients)
mas01cr@14 132
mas01mc@20 133 M. Casey 13/08/07
mas01cr@14 134