annotate docs/TODO.txt @ 130:63ca70f2bf37

Add an exhaustive search test.
author mas01cr
date Fri, 19 Oct 2007 17:03:12 +0000
parents 2c58d5193287
children 7c7072a8626a
rev   line source
mas01cr@93 1 * development of functionality
mas01cr@93 2
mas01cr@103 3 ** exposure of all non-write functions over Web Services
mas01cr@103 4
mas01cr@103 5 At present, the radius / counting query type isn't supported over
mas01cr@103 6 SOAP. Supporting it involves changing the adb__query() exported
mas01cr@103 7 function, so we need to be careful.
mas01cr@103 8
mas01cr@93 9 ** matrix of possible queries
mas01cr@93 10
mas01cr@93 11 At the moment, there are four content-based query types, each of which
mas01cr@93 12 does something slightly different from what you might expect from its
mas01cr@93 13 name. I think that the space of possible (sensible?) queries is
mas01cr@93 14 larger than this -- though working out the sensible abstraction might
mas01cr@93 15 have to wait for more use cases -- and also that the orthogonality of
mas01cr@93 16 various parameters is missing. (e.g. a silence threshold should be
mas01cr@93 17 applied to all queries or none, if it makes sense at all.)
mas01cr@93 18
mas01cr@93 19 Additionally, query by key (filename) might be important.
mas01cr@93 20
mas01cr@93 21 ** results
mas01cr@93 22
mas01cr@93 23 Need to sort out what the results mean; is it a similarity or a
mas01cr@93 24 distance score, etc. Also, is it possible to support NN queries in a
mas01cr@93 25 non-Euclidean space?
mas01cr@93 26
mas01cr@93 27 ** SOAP / URIs
mas01cr@93 28
mas01cr@93 29 At the moment, the query and database are referred to by paths naming
mas01cr@93 30 files on the SOAP server's filesystem. This makes a limited amount of
mas01cr@93 31 sense for the database (though exposing implementation details of
mas01cr@93 32 ISMS's file system is not a great idea) but makes no sense at all for
mas01cr@93 33 the query. So we need to define a query data structure that can be
mas01cr@93 34 serialised (preferably automatically) by SOAP for use in queries.
mas01cr@93 35
mas01cr@93 36 If we ever support inserting or other write functionality over SOAP,
mas01cr@93 37 this will need doing for feature files (the same as queries) and for
mas01cr@93 38 key lists too.
mas01cr@93 39
mas01cr@93 40 ** Memory management tricks
mas01cr@93 41
mas01cr@93 42 We have a friendly memory access pattern (at least on Unixoids;
mas01cr@93 43 Win32's API isn't a great match for mmap(), so it is significantly
mas01cr@93 44 slower there). Investigate whether madvise() tricks improve
mas01cr@93 45 performance on any OSes. Also, maybe investigate a specialized use of
mas01cr@93 46 GetViewOfFile on win32 to make it tolerable on that platform.
mas01cr@93 47
mas01cr@93 48 ** LSH
mas01cr@93 49
mas01cr@93 50 Integrate the LSH indexing with the database. Can it be done as a
mas01cr@93 51 separate index file, created on demand? What are we trying to
mas01cr@93 52 optimize our on-disk format for, and can it be better optimized by
mas01cr@93 53 having multiple files?
mas01cr@93 54
mas01cr@93 55 ** RDF (not necessarily related to audioDB)
mas01cr@93 56
mas01cr@93 57 Export the results of our experiments (kept in an SQL database) as
mas01cr@93 58 RDF, so that people can infer stuff if they know enough about our
mas01cr@93 59 methods.
mas01cr@93 60
mas01cr@93 61 Possibly also write an export routine for exporting an audioDB as RDF.
mas01cr@93 62 And laugh hollowly as XML parsers fail completely to ingest such a
mas01cr@93 63 monstrous file.
mas01cr@93 64
mas01cr@93 65 * architectural issues
mas01cr@93 66
mas01cr@93 67 ** API vs command-line
mas01cr@93 68
mas01cr@93 69 While having a command line interface is nice, having the only way to
mas01cr@93 70 initialize a new audioDB instance being by faking up enough of a
mas01cr@93 71 command line to call our wacky constructors is less nice.
mas01cr@93 72 Furthermore, having the "business logic" run by the constructor is
mas01cr@93 73 also a little bit weird.
mas01cr@93 74
mas01cr@93 75 * regression (and other) tests
mas01cr@93 76
mas01cr@93 77 ** Command line interface
mas01cr@93 78
mas01cr@93 79 There is now broad coverage of the audioDB logic, with the major
mas01cr@93 80 exceptions of the batch insert command, and the specifying of
mas01cr@93 81 different keys on import.
mas01cr@93 82
mas01cr@93 83 ** SOAP
mas01cr@93 84
mas01cr@93 85 The shell's support for wait() and equivalents is limited, so there
mas01cr@93 86 are "sleep 1"s dotted around to attempt to avoid race conditions.
mas01cr@93 87 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
mas01cr@93 88 that ought not to be necessary just to run the same test twice...
mas01cr@93 89
mas01cr@93 90 ** Locking
mas01cr@93 91
mas01cr@93 92 The fcntl() locking should be good enough for our uses. Investigate
mas01cr@93 93 whether it is in fact robust enough (including that EAGAIN workaround
mas01cr@93 94 for OS X; read the kernel source to find out where that's coming from
mas01cr@93 95 and report it if possible).
mas01cr@93 96
mas01cr@93 97 ** Benchmarks
mas01cr@93 98
mas01cr@93 99 Get together a realistic set of usage cases, preferably testing each
mas01cr@93 100 of the query types, and benchmark them automatically. This is
mas01cr@93 101 basically a prerequisite of any performance work.
mas01cr@93 102
mas01cr@93 103 * Michael's old TODO list
mas01cr@14 104
mas01cr@14 105 audioDB FIXME:
mas01cr@14 106
mas01cr@14 107 o fix segfault when query is zero-length
mas01mc@20 108 :-) DONE use periodic memunmap on batch insert
mas01cr@14 109 o allow keys to be passed as queries
mas01mc@20 110 :-) DONE rename 'segments' to 'tracks' in code and help files.
mas01cr@14 111 o test suite
mas01cr@14 112 o SOAP to serialize queryFile and keyList
mas01cr@14 113 o SOAP to serialize files on insert / batch insert ?
mas01cr@33 114 :-) DONE don't overwrite existing files on db create
mas01cr@34 115 :-) DONE implement fcntl()-based locking.
mas01cr@35 116 o test locking discipline (particularly over NFS between heterogenous clients)
mas01cr@14 117
mas01mc@20 118 M. Casey 13/08/07
mas01cr@14 119