mas01cr@93: * development of functionality mas01cr@93: mas01cr@103: ** exposure of all non-write functions over Web Services mas01cr@103: mas01cr@103: At present, the radius / counting query type isn't supported over mas01cr@103: SOAP. Supporting it involves changing the adb__query() exported mas01cr@168: function, so we need to be careful; or at least defining a new mas01cr@168: exported interface, at which point perhaps it should be designed mas01cr@168: rather than accreted... (probably this will come naturally after the mas01cr@168: following TODO item) mas01cr@103: mas01cr@93: ** matrix of possible queries mas01cr@93: mas01cr@93: At the moment, there are four content-based query types, each of which mas01cr@93: does something slightly different from what you might expect from its mas01cr@93: name. I think that the space of possible (sensible?) queries is mas01cr@93: larger than this -- though working out the sensible abstraction might mas01cr@93: have to wait for more use cases -- and also that the orthogonality of mas01cr@93: various parameters is missing. (e.g. a silence threshold should be mas01cr@93: applied to all queries or none, if it makes sense at all.) mas01cr@93: mas01cr@93: Additionally, query by key (filename) might be important. mas01cr@93: mas01cr@93: ** results mas01cr@93: mas01cr@93: Need to sort out what the results mean; is it a similarity or a mas01cr@93: distance score, etc. Also, is it possible to support NN queries in a mas01cr@93: non-Euclidean space? mas01cr@93: mas01cr@93: ** SOAP / URIs mas01cr@93: mas01cr@93: At the moment, the query and database are referred to by paths naming mas01cr@93: files on the SOAP server's filesystem. This makes a limited amount of mas01cr@93: sense for the database (though exposing implementation details of mas01cr@93: ISMS's file system is not a great idea) but makes no sense at all for mas01cr@93: the query. So we need to define a query data structure that can be mas01cr@93: serialised (preferably automatically) by SOAP for use in queries. mas01cr@93: mas01cr@93: If we ever support inserting or other write functionality over SOAP, mas01cr@93: this will need doing for feature files (the same as queries) and for mas01cr@93: key lists too. mas01cr@93: mas01cr@93: ** Memory management tricks mas01cr@93: mas01cr@93: We have a friendly memory access pattern (at least on Unixoids; mas01cr@93: Win32's API isn't a great match for mmap(), so it is significantly mas01cr@93: slower there). Investigate whether madvise() tricks improve mas01cr@93: performance on any OSes. Also, maybe investigate a specialized use of mas01cr@93: GetViewOfFile on win32 to make it tolerable on that platform. mas01cr@93: mas01cr@93: ** LSH mas01cr@93: mas01cr@93: Integrate the LSH indexing with the database. Can it be done as a mas01cr@93: separate index file, created on demand? What are we trying to mas01cr@93: optimize our on-disk format for, and can it be better optimized by mas01cr@93: having multiple files? mas01cr@93: mas01cr@93: ** RDF (not necessarily related to audioDB) mas01cr@93: mas01cr@93: Export the results of our experiments (kept in an SQL database) as mas01cr@93: RDF, so that people can infer stuff if they know enough about our mas01cr@93: methods. mas01cr@93: mas01cr@93: Possibly also write an export routine for exporting an audioDB as RDF. mas01cr@93: And laugh hollowly as XML parsers fail completely to ingest such a mas01cr@93: monstrous file. mas01cr@93: mas01cr@93: * architectural issues mas01cr@93: mas01cr@139: ** more safety mas01cr@139: mas01cr@170: A couple of areas are not yet safe against runtime faults. mas01cr@170: mas01cr@170: *** Large databases might well end up writing off the end of the mas01cr@170: various tables (e.g. track, l2norm). mas01cr@170: mas01cr@170: *** transactionality is important; the last thing that should be mas01cr@170: updated on insert are the free pointers (dbH->length, mas01cr@170: dbH->numFiles, maybe others), so that if something goes wrong in mas01cr@170: the meantime the database is not in an inconsistent state. mas01cr@139: mas01cr@93: ** API vs command-line mas01cr@93: mas01cr@93: While having a command line interface is nice, having the only way to mas01cr@93: initialize a new audioDB instance being by faking up enough of a mas01cr@93: command line to call our wacky constructors is less nice. mas01cr@93: Furthermore, having the "business logic" run by the constructor is mas01cr@93: also a little bit weird. mas01cr@93: mas01cr@93: * regression (and other) tests mas01cr@93: mas01cr@93: ** Command line interface mas01cr@93: mas01cr@93: There is now broad coverage of the audioDB logic, with the major mas01cr@93: exceptions of the batch insert command, and the specifying of mas01cr@93: different keys on import. mas01cr@93: mas01cr@93: ** SOAP mas01cr@93: mas01cr@93: The shell's support for wait() and equivalents is limited, so there mas01cr@93: are "sleep 1"s dotted around to attempt to avoid race conditions. mas01cr@93: Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack mas01cr@93: that ought not to be necessary just to run the same test twice... mas01cr@93: mas01cr@93: ** Locking mas01cr@93: mas01cr@93: The fcntl() locking should be good enough for our uses. Investigate mas01cr@93: whether it is in fact robust enough (including that EAGAIN workaround mas01cr@93: for OS X; read the kernel source to find out where that's coming from mas01cr@93: and report it if possible). mas01cr@93: mas01cr@93: ** Benchmarks mas01cr@93: mas01cr@93: Get together a realistic set of usage cases, preferably testing each mas01cr@93: of the query types, and benchmark them automatically. This is mas01cr@93: basically a prerequisite of any performance work. mas01cr@93: mas01cr@93: * Michael's old TODO list mas01cr@14: mas01cr@14: audioDB FIXME: mas01cr@14: mas01cr@14: o fix segfault when query is zero-length mas01mc@20: :-) DONE use periodic memunmap on batch insert mas01cr@14: o allow keys to be passed as queries mas01mc@20: :-) DONE rename 'segments' to 'tracks' in code and help files. mas01cr@14: o test suite mas01cr@14: o SOAP to serialize queryFile and keyList mas01cr@14: o SOAP to serialize files on insert / batch insert ? mas01cr@33: :-) DONE don't overwrite existing files on db create mas01cr@34: :-) DONE implement fcntl()-based locking. mas01cr@35: o test locking discipline (particularly over NFS between heterogenous clients) mas01cr@14: mas01mc@20: M. Casey 13/08/07 mas01cr@14: