annotate docs/TODO.txt @ 770:c54bc2ffbf92 tip

update tags
author convert-repo
date Fri, 16 Dec 2011 11:34:01 +0000
parents 10bcea4e5c40
children
rev   line source
mas01cr@93 1 * development of functionality
mas01cr@93 2
mas01cr@103 3 ** exposure of all non-write functions over Web Services
mas01cr@103 4
mas01cr@103 5 At present, the radius / counting query type isn't supported over
mas01cr@103 6 SOAP. Supporting it involves changing the adb__query() exported
mas01cr@168 7 function, so we need to be careful; or at least defining a new
mas01cr@168 8 exported interface, at which point perhaps it should be designed
mas01cr@168 9 rather than accreted... (probably this will come naturally after the
mas01cr@168 10 following TODO item)
mas01cr@103 11
mas01cr@93 12 ** matrix of possible queries
mas01cr@93 13
mas01cr@93 14 At the moment, there are four content-based query types, each of which
mas01cr@93 15 does something slightly different from what you might expect from its
mas01cr@93 16 name. I think that the space of possible (sensible?) queries is
mas01cr@93 17 larger than this -- though working out the sensible abstraction might
mas01cr@93 18 have to wait for more use cases -- and also that the orthogonality of
mas01cr@93 19 various parameters is missing. (e.g. a silence threshold should be
mas01cr@93 20 applied to all queries or none, if it makes sense at all.)
mas01cr@93 21
mas01cr@93 22 Additionally, query by key (filename) might be important.
mas01cr@93 23
mas01cr@93 24 ** results
mas01cr@93 25
mas01cr@93 26 Need to sort out what the results mean; is it a similarity or a
mas01cr@93 27 distance score, etc. Also, is it possible to support NN queries in a
mas01cr@93 28 non-Euclidean space?
mas01cr@93 29
mas01cr@93 30 ** SOAP / URIs
mas01cr@93 31
mas01cr@93 32 At the moment, the query and database are referred to by paths naming
mas01cr@93 33 files on the SOAP server's filesystem. This makes a limited amount of
mas01cr@93 34 sense for the database (though exposing implementation details of
mas01cr@93 35 ISMS's file system is not a great idea) but makes no sense at all for
mas01cr@93 36 the query. So we need to define a query data structure that can be
mas01cr@93 37 serialised (preferably automatically) by SOAP for use in queries.
mas01cr@93 38
mas01cr@93 39 If we ever support inserting or other write functionality over SOAP,
mas01cr@93 40 this will need doing for feature files (the same as queries) and for
mas01cr@93 41 key lists too.
mas01cr@93 42
mas01cr@93 43 ** Memory management tricks
mas01cr@93 44
mas01cr@93 45 We have a friendly memory access pattern (at least on Unixoids;
mas01cr@93 46 Win32's API isn't a great match for mmap(), so it is significantly
mas01cr@93 47 slower there). Investigate whether madvise() tricks improve
mas01cr@93 48 performance on any OSes. Also, maybe investigate a specialized use of
mas01cr@93 49 GetViewOfFile on win32 to make it tolerable on that platform.
mas01cr@93 50
mas01cr@93 51 ** LSH
mas01cr@93 52
mas01cr@93 53 Integrate the LSH indexing with the database. Can it be done as a
mas01cr@93 54 separate index file, created on demand? What are we trying to
mas01cr@93 55 optimize our on-disk format for, and can it be better optimized by
mas01cr@93 56 having multiple files?
mas01cr@93 57
mas01cr@93 58 ** RDF (not necessarily related to audioDB)
mas01cr@93 59
mas01cr@93 60 Export the results of our experiments (kept in an SQL database) as
mas01cr@93 61 RDF, so that people can infer stuff if they know enough about our
mas01cr@93 62 methods.
mas01cr@93 63
mas01cr@93 64 Possibly also write an export routine for exporting an audioDB as RDF.
mas01cr@93 65 And laugh hollowly as XML parsers fail completely to ingest such a
mas01cr@93 66 monstrous file.
mas01cr@93 67
mas01cr@93 68 * architectural issues
mas01cr@93 69
mas01cr@139 70 ** more safety
mas01cr@139 71
mas01cr@170 72 A couple of areas are not yet safe against runtime faults.
mas01cr@170 73
mas01cr@170 74 *** Large databases might well end up writing off the end of the
mas01cr@170 75 various tables (e.g. track, l2norm).
mas01cr@170 76
mas01cr@170 77 *** transactionality is important; the last thing that should be
mas01cr@170 78 updated on insert are the free pointers (dbH->length,
mas01cr@170 79 dbH->numFiles, maybe others), so that if something goes wrong in
mas01cr@170 80 the meantime the database is not in an inconsistent state.
mas01cr@139 81
mas01cr@93 82 ** API vs command-line
mas01cr@93 83
mas01cr@93 84 While having a command line interface is nice, having the only way to
mas01cr@93 85 initialize a new audioDB instance being by faking up enough of a
mas01cr@93 86 command line to call our wacky constructors is less nice.
mas01cr@93 87 Furthermore, having the "business logic" run by the constructor is
mas01cr@93 88 also a little bit weird.
mas01cr@93 89
mas01cr@93 90 * regression (and other) tests
mas01cr@93 91
mas01cr@93 92 ** Command line interface
mas01cr@93 93
mas01cr@93 94 There is now broad coverage of the audioDB logic, with the major
mas01cr@93 95 exceptions of the batch insert command, and the specifying of
mas01cr@93 96 different keys on import.
mas01cr@93 97
mas01cr@93 98 ** SOAP
mas01cr@93 99
mas01cr@93 100 The shell's support for wait() and equivalents is limited, so there
mas01cr@93 101 are "sleep 1"s dotted around to attempt to avoid race conditions.
mas01cr@93 102 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
mas01cr@93 103 that ought not to be necessary just to run the same test twice...
mas01cr@93 104
mas01cr@93 105 ** Locking
mas01cr@93 106
mas01cr@93 107 The fcntl() locking should be good enough for our uses. Investigate
mas01cr@93 108 whether it is in fact robust enough (including that EAGAIN workaround
mas01cr@93 109 for OS X; read the kernel source to find out where that's coming from
mas01cr@93 110 and report it if possible).
mas01cr@93 111
mas01cr@93 112 ** Benchmarks
mas01cr@93 113
mas01cr@93 114 Get together a realistic set of usage cases, preferably testing each
mas01cr@93 115 of the query types, and benchmark them automatically. This is
mas01cr@93 116 basically a prerequisite of any performance work.
mas01cr@93 117
mas01cr@93 118 * Michael's old TODO list
mas01cr@14 119
mas01cr@14 120 audioDB FIXME:
mas01cr@14 121
mas01cr@14 122 o fix segfault when query is zero-length
mas01mc@20 123 :-) DONE use periodic memunmap on batch insert
mas01cr@14 124 o allow keys to be passed as queries
mas01mc@20 125 :-) DONE rename 'segments' to 'tracks' in code and help files.
mas01cr@14 126 o test suite
mas01cr@14 127 o SOAP to serialize queryFile and keyList
mas01cr@14 128 o SOAP to serialize files on insert / batch insert ?
mas01cr@33 129 :-) DONE don't overwrite existing files on db create
mas01cr@34 130 :-) DONE implement fcntl()-based locking.
mas01cr@35 131 o test locking discipline (particularly over NFS between heterogenous clients)
mas01cr@14 132
mas01mc@20 133 M. Casey 13/08/07
mas01cr@14 134