Mercurial > hg > audiodb
view docs/TODO.txt @ 770:c54bc2ffbf92 tip
update tags
author | convert-repo |
---|---|
date | Fri, 16 Dec 2011 11:34:01 +0000 |
parents | 10bcea4e5c40 |
children |
line wrap: on
line source
* development of functionality ** exposure of all non-write functions over Web Services At present, the radius / counting query type isn't supported over SOAP. Supporting it involves changing the adb__query() exported function, so we need to be careful; or at least defining a new exported interface, at which point perhaps it should be designed rather than accreted... (probably this will come naturally after the following TODO item) ** matrix of possible queries At the moment, there are four content-based query types, each of which does something slightly different from what you might expect from its name. I think that the space of possible (sensible?) queries is larger than this -- though working out the sensible abstraction might have to wait for more use cases -- and also that the orthogonality of various parameters is missing. (e.g. a silence threshold should be applied to all queries or none, if it makes sense at all.) Additionally, query by key (filename) might be important. ** results Need to sort out what the results mean; is it a similarity or a distance score, etc. Also, is it possible to support NN queries in a non-Euclidean space? ** SOAP / URIs At the moment, the query and database are referred to by paths naming files on the SOAP server's filesystem. This makes a limited amount of sense for the database (though exposing implementation details of ISMS's file system is not a great idea) but makes no sense at all for the query. So we need to define a query data structure that can be serialised (preferably automatically) by SOAP for use in queries. If we ever support inserting or other write functionality over SOAP, this will need doing for feature files (the same as queries) and for key lists too. ** Memory management tricks We have a friendly memory access pattern (at least on Unixoids; Win32's API isn't a great match for mmap(), so it is significantly slower there). Investigate whether madvise() tricks improve performance on any OSes. Also, maybe investigate a specialized use of GetViewOfFile on win32 to make it tolerable on that platform. ** LSH Integrate the LSH indexing with the database. Can it be done as a separate index file, created on demand? What are we trying to optimize our on-disk format for, and can it be better optimized by having multiple files? ** RDF (not necessarily related to audioDB) Export the results of our experiments (kept in an SQL database) as RDF, so that people can infer stuff if they know enough about our methods. Possibly also write an export routine for exporting an audioDB as RDF. And laugh hollowly as XML parsers fail completely to ingest such a monstrous file. * architectural issues ** more safety A couple of areas are not yet safe against runtime faults. *** Large databases might well end up writing off the end of the various tables (e.g. track, l2norm). *** transactionality is important; the last thing that should be updated on insert are the free pointers (dbH->length, dbH->numFiles, maybe others), so that if something goes wrong in the meantime the database is not in an inconsistent state. ** API vs command-line While having a command line interface is nice, having the only way to initialize a new audioDB instance being by faking up enough of a command line to call our wacky constructors is less nice. Furthermore, having the "business logic" run by the constructor is also a little bit weird. * regression (and other) tests ** Command line interface There is now broad coverage of the audioDB logic, with the major exceptions of the batch insert command, and the specifying of different keys on import. ** SOAP The shell's support for wait() and equivalents is limited, so there are "sleep 1"s dotted around to attempt to avoid race conditions. Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack that ought not to be necessary just to run the same test twice... ** Locking The fcntl() locking should be good enough for our uses. Investigate whether it is in fact robust enough (including that EAGAIN workaround for OS X; read the kernel source to find out where that's coming from and report it if possible). ** Benchmarks Get together a realistic set of usage cases, preferably testing each of the query types, and benchmark them automatically. This is basically a prerequisite of any performance work. * Michael's old TODO list audioDB FIXME: o fix segfault when query is zero-length :-) DONE use periodic memunmap on batch insert o allow keys to be passed as queries :-) DONE rename 'segments' to 'tracks' in code and help files. o test suite o SOAP to serialize queryFile and keyList o SOAP to serialize files on insert / batch insert ? :-) DONE don't overwrite existing files on db create :-) DONE implement fcntl()-based locking. o test locking discipline (particularly over NFS between heterogenous clients) M. Casey 13/08/07