annotate docs/TODO.txt @ 162:c47ef2b74c10 powertable

Power searches with non-trivial sequences
author mas01cr
date Thu, 01 Nov 2007 16:54:28 +0000
parents 7c7072a8626a
children ecfa25f72b7e
rev   line source
mas01cr@93 1 * development of functionality
mas01cr@93 2
mas01cr@103 3 ** exposure of all non-write functions over Web Services
mas01cr@103 4
mas01cr@103 5 At present, the radius / counting query type isn't supported over
mas01cr@103 6 SOAP. Supporting it involves changing the adb__query() exported
mas01cr@103 7 function, so we need to be careful.
mas01cr@103 8
mas01cr@93 9 ** matrix of possible queries
mas01cr@93 10
mas01cr@93 11 At the moment, there are four content-based query types, each of which
mas01cr@93 12 does something slightly different from what you might expect from its
mas01cr@93 13 name. I think that the space of possible (sensible?) queries is
mas01cr@93 14 larger than this -- though working out the sensible abstraction might
mas01cr@93 15 have to wait for more use cases -- and also that the orthogonality of
mas01cr@93 16 various parameters is missing. (e.g. a silence threshold should be
mas01cr@93 17 applied to all queries or none, if it makes sense at all.)
mas01cr@93 18
mas01cr@93 19 Additionally, query by key (filename) might be important.
mas01cr@93 20
mas01cr@93 21 ** results
mas01cr@93 22
mas01cr@93 23 Need to sort out what the results mean; is it a similarity or a
mas01cr@93 24 distance score, etc. Also, is it possible to support NN queries in a
mas01cr@93 25 non-Euclidean space?
mas01cr@93 26
mas01cr@93 27 ** SOAP / URIs
mas01cr@93 28
mas01cr@93 29 At the moment, the query and database are referred to by paths naming
mas01cr@93 30 files on the SOAP server's filesystem. This makes a limited amount of
mas01cr@93 31 sense for the database (though exposing implementation details of
mas01cr@93 32 ISMS's file system is not a great idea) but makes no sense at all for
mas01cr@93 33 the query. So we need to define a query data structure that can be
mas01cr@93 34 serialised (preferably automatically) by SOAP for use in queries.
mas01cr@93 35
mas01cr@93 36 If we ever support inserting or other write functionality over SOAP,
mas01cr@93 37 this will need doing for feature files (the same as queries) and for
mas01cr@93 38 key lists too.
mas01cr@93 39
mas01cr@93 40 ** Memory management tricks
mas01cr@93 41
mas01cr@93 42 We have a friendly memory access pattern (at least on Unixoids;
mas01cr@93 43 Win32's API isn't a great match for mmap(), so it is significantly
mas01cr@93 44 slower there). Investigate whether madvise() tricks improve
mas01cr@93 45 performance on any OSes. Also, maybe investigate a specialized use of
mas01cr@93 46 GetViewOfFile on win32 to make it tolerable on that platform.
mas01cr@93 47
mas01cr@93 48 ** LSH
mas01cr@93 49
mas01cr@93 50 Integrate the LSH indexing with the database. Can it be done as a
mas01cr@93 51 separate index file, created on demand? What are we trying to
mas01cr@93 52 optimize our on-disk format for, and can it be better optimized by
mas01cr@93 53 having multiple files?
mas01cr@93 54
mas01cr@93 55 ** RDF (not necessarily related to audioDB)
mas01cr@93 56
mas01cr@93 57 Export the results of our experiments (kept in an SQL database) as
mas01cr@93 58 RDF, so that people can infer stuff if they know enough about our
mas01cr@93 59 methods.
mas01cr@93 60
mas01cr@93 61 Possibly also write an export routine for exporting an audioDB as RDF.
mas01cr@93 62 And laugh hollowly as XML parsers fail completely to ingest such a
mas01cr@93 63 monstrous file.
mas01cr@93 64
mas01cr@93 65 * architectural issues
mas01cr@93 66
mas01cr@139 67 ** more safety
mas01cr@139 68
mas01cr@139 69 A couple of areas are not yet safe against runtime faults. The simple
mas01cr@139 70 case is zero-length features, which will lead to division by zero
mas01cr@139 71 errors; more pressingly, large databases might well end up writing off
mas01cr@139 72 the end of the various tables (e.g. track, l2norm).
mas01cr@139 73
mas01cr@93 74 ** API vs command-line
mas01cr@93 75
mas01cr@93 76 While having a command line interface is nice, having the only way to
mas01cr@93 77 initialize a new audioDB instance being by faking up enough of a
mas01cr@93 78 command line to call our wacky constructors is less nice.
mas01cr@93 79 Furthermore, having the "business logic" run by the constructor is
mas01cr@93 80 also a little bit weird.
mas01cr@93 81
mas01cr@93 82 * regression (and other) tests
mas01cr@93 83
mas01cr@93 84 ** Command line interface
mas01cr@93 85
mas01cr@93 86 There is now broad coverage of the audioDB logic, with the major
mas01cr@93 87 exceptions of the batch insert command, and the specifying of
mas01cr@93 88 different keys on import.
mas01cr@93 89
mas01cr@93 90 ** SOAP
mas01cr@93 91
mas01cr@93 92 The shell's support for wait() and equivalents is limited, so there
mas01cr@93 93 are "sleep 1"s dotted around to attempt to avoid race conditions.
mas01cr@93 94 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
mas01cr@93 95 that ought not to be necessary just to run the same test twice...
mas01cr@93 96
mas01cr@93 97 ** Locking
mas01cr@93 98
mas01cr@93 99 The fcntl() locking should be good enough for our uses. Investigate
mas01cr@93 100 whether it is in fact robust enough (including that EAGAIN workaround
mas01cr@93 101 for OS X; read the kernel source to find out where that's coming from
mas01cr@93 102 and report it if possible).
mas01cr@93 103
mas01cr@93 104 ** Benchmarks
mas01cr@93 105
mas01cr@93 106 Get together a realistic set of usage cases, preferably testing each
mas01cr@93 107 of the query types, and benchmark them automatically. This is
mas01cr@93 108 basically a prerequisite of any performance work.
mas01cr@93 109
mas01cr@93 110 * Michael's old TODO list
mas01cr@14 111
mas01cr@14 112 audioDB FIXME:
mas01cr@14 113
mas01cr@14 114 o fix segfault when query is zero-length
mas01mc@20 115 :-) DONE use periodic memunmap on batch insert
mas01cr@14 116 o allow keys to be passed as queries
mas01mc@20 117 :-) DONE rename 'segments' to 'tracks' in code and help files.
mas01cr@14 118 o test suite
mas01cr@14 119 o SOAP to serialize queryFile and keyList
mas01cr@14 120 o SOAP to serialize files on insert / batch insert ?
mas01cr@33 121 :-) DONE don't overwrite existing files on db create
mas01cr@34 122 :-) DONE implement fcntl()-based locking.
mas01cr@35 123 o test locking discipline (particularly over NFS between heterogenous clients)
mas01cr@14 124
mas01mc@20 125 M. Casey 13/08/07
mas01cr@14 126