audiodb: docs/TODO.txt annotate

annotate docs/TODO.txt @ 770:c54bc2ffbf92 tip

update tags

author	convert-repo
date	Fri, 16 Dec 2011 11:34:01 +0000
parents	10bcea4e5c40
children

rev	line source
mas01cr@93	1 * development of functionality
mas01cr@93	2
mas01cr@103	3 ** exposure of all non-write functions over Web Services
mas01cr@103	4
mas01cr@103	5 At present, the radius / counting query type isn't supported over
mas01cr@103	6 SOAP. Supporting it involves changing the adb__query() exported
mas01cr@168	7 function, so we need to be careful; or at least defining a new
mas01cr@168	8 exported interface, at which point perhaps it should be designed
mas01cr@168	9 rather than accreted... (probably this will come naturally after the
mas01cr@168	10 following TODO item)
mas01cr@103	11
mas01cr@93	12 ** matrix of possible queries
mas01cr@93	13
mas01cr@93	14 At the moment, there are four content-based query types, each of which
mas01cr@93	15 does something slightly different from what you might expect from its
mas01cr@93	16 name. I think that the space of possible (sensible?) queries is
mas01cr@93	17 larger than this -- though working out the sensible abstraction might
mas01cr@93	18 have to wait for more use cases -- and also that the orthogonality of
mas01cr@93	19 various parameters is missing. (e.g. a silence threshold should be
mas01cr@93	20 applied to all queries or none, if it makes sense at all.)
mas01cr@93	21
mas01cr@93	22 Additionally, query by key (filename) might be important.
mas01cr@93	23
mas01cr@93	24 ** results
mas01cr@93	25
mas01cr@93	26 Need to sort out what the results mean; is it a similarity or a
mas01cr@93	27 distance score, etc. Also, is it possible to support NN queries in a
mas01cr@93	28 non-Euclidean space?
mas01cr@93	29
mas01cr@93	30 ** SOAP / URIs
mas01cr@93	31
mas01cr@93	32 At the moment, the query and database are referred to by paths naming
mas01cr@93	33 files on the SOAP server's filesystem. This makes a limited amount of
mas01cr@93	34 sense for the database (though exposing implementation details of
mas01cr@93	35 ISMS's file system is not a great idea) but makes no sense at all for
mas01cr@93	36 the query. So we need to define a query data structure that can be
mas01cr@93	37 serialised (preferably automatically) by SOAP for use in queries.
mas01cr@93	38
mas01cr@93	39 If we ever support inserting or other write functionality over SOAP,
mas01cr@93	40 this will need doing for feature files (the same as queries) and for
mas01cr@93	41 key lists too.
mas01cr@93	42
mas01cr@93	43 ** Memory management tricks
mas01cr@93	44
mas01cr@93	45 We have a friendly memory access pattern (at least on Unixoids;
mas01cr@93	46 Win32's API isn't a great match for mmap(), so it is significantly
mas01cr@93	47 slower there). Investigate whether madvise() tricks improve
mas01cr@93	48 performance on any OSes. Also, maybe investigate a specialized use of
mas01cr@93	49 GetViewOfFile on win32 to make it tolerable on that platform.
mas01cr@93	50
mas01cr@93	51 ** LSH
mas01cr@93	52
mas01cr@93	53 Integrate the LSH indexing with the database. Can it be done as a
mas01cr@93	54 separate index file, created on demand? What are we trying to
mas01cr@93	55 optimize our on-disk format for, and can it be better optimized by
mas01cr@93	56 having multiple files?
mas01cr@93	57
mas01cr@93	58 ** RDF (not necessarily related to audioDB)
mas01cr@93	59
mas01cr@93	60 Export the results of our experiments (kept in an SQL database) as
mas01cr@93	61 RDF, so that people can infer stuff if they know enough about our
mas01cr@93	62 methods.
mas01cr@93	63
mas01cr@93	64 Possibly also write an export routine for exporting an audioDB as RDF.
mas01cr@93	65 And laugh hollowly as XML parsers fail completely to ingest such a
mas01cr@93	66 monstrous file.
mas01cr@93	67
mas01cr@93	68 * architectural issues
mas01cr@93	69
mas01cr@139	70 ** more safety
mas01cr@139	71
mas01cr@170	72 A couple of areas are not yet safe against runtime faults.
mas01cr@170	73
mas01cr@170	74 *** Large databases might well end up writing off the end of the
mas01cr@170	75 various tables (e.g. track, l2norm).
mas01cr@170	76
mas01cr@170	77 *** transactionality is important; the last thing that should be
mas01cr@170	78 updated on insert are the free pointers (dbH->length,
mas01cr@170	79 dbH->numFiles, maybe others), so that if something goes wrong in
mas01cr@170	80 the meantime the database is not in an inconsistent state.
mas01cr@139	81
mas01cr@93	82 ** API vs command-line
mas01cr@93	83
mas01cr@93	84 While having a command line interface is nice, having the only way to
mas01cr@93	85 initialize a new audioDB instance being by faking up enough of a
mas01cr@93	86 command line to call our wacky constructors is less nice.
mas01cr@93	87 Furthermore, having the "business logic" run by the constructor is
mas01cr@93	88 also a little bit weird.
mas01cr@93	89
mas01cr@93	90 * regression (and other) tests
mas01cr@93	91
mas01cr@93	92 ** Command line interface
mas01cr@93	93
mas01cr@93	94 There is now broad coverage of the audioDB logic, with the major
mas01cr@93	95 exceptions of the batch insert command, and the specifying of
mas01cr@93	96 different keys on import.
mas01cr@93	97
mas01cr@93	98 ** SOAP
mas01cr@93	99
mas01cr@93	100 The shell's support for wait() and equivalents is limited, so there
mas01cr@93	101 are "sleep 1"s dotted around to attempt to avoid race conditions.
mas01cr@93	102 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
mas01cr@93	103 that ought not to be necessary just to run the same test twice...
mas01cr@93	104
mas01cr@93	105 ** Locking
mas01cr@93	106
mas01cr@93	107 The fcntl() locking should be good enough for our uses. Investigate
mas01cr@93	108 whether it is in fact robust enough (including that EAGAIN workaround
mas01cr@93	109 for OS X; read the kernel source to find out where that's coming from
mas01cr@93	110 and report it if possible).
mas01cr@93	111
mas01cr@93	112 ** Benchmarks
mas01cr@93	113
mas01cr@93	114 Get together a realistic set of usage cases, preferably testing each
mas01cr@93	115 of the query types, and benchmark them automatically. This is
mas01cr@93	116 basically a prerequisite of any performance work.
mas01cr@93	117
mas01cr@93	118 * Michael's old TODO list
mas01cr@14	119
mas01cr@14	120 audioDB FIXME:
mas01cr@14	121
mas01cr@14	122 o fix segfault when query is zero-length
mas01mc@20	123 :-) DONE use periodic memunmap on batch insert
mas01cr@14	124 o allow keys to be passed as queries
mas01mc@20	125 :-) DONE rename 'segments' to 'tracks' in code and help files.
mas01cr@14	126 o test suite
mas01cr@14	127 o SOAP to serialize queryFile and keyList
mas01cr@14	128 o SOAP to serialize files on insert / batch insert ?
mas01cr@33	129 :-) DONE don't overwrite existing files on db create
mas01cr@34	130 :-) DONE implement fcntl()-based locking.
mas01cr@35	131 o test locking discipline (particularly over NFS between heterogenous clients)
mas01cr@14	132
mas01mc@20	133 M. Casey 13/08/07
mas01cr@14	134

Mercurial > hg > audiodb

annotate docs/TODO.txt @ 770:c54bc2ffbf92 tip