mas01cr@93
|
1 * development of functionality
|
mas01cr@93
|
2
|
mas01cr@103
|
3 ** exposure of all non-write functions over Web Services
|
mas01cr@103
|
4
|
mas01cr@103
|
5 At present, the radius / counting query type isn't supported over
|
mas01cr@103
|
6 SOAP. Supporting it involves changing the adb__query() exported
|
mas01cr@168
|
7 function, so we need to be careful; or at least defining a new
|
mas01cr@168
|
8 exported interface, at which point perhaps it should be designed
|
mas01cr@168
|
9 rather than accreted... (probably this will come naturally after the
|
mas01cr@168
|
10 following TODO item)
|
mas01cr@103
|
11
|
mas01cr@93
|
12 ** matrix of possible queries
|
mas01cr@93
|
13
|
mas01cr@93
|
14 At the moment, there are four content-based query types, each of which
|
mas01cr@93
|
15 does something slightly different from what you might expect from its
|
mas01cr@93
|
16 name. I think that the space of possible (sensible?) queries is
|
mas01cr@93
|
17 larger than this -- though working out the sensible abstraction might
|
mas01cr@93
|
18 have to wait for more use cases -- and also that the orthogonality of
|
mas01cr@93
|
19 various parameters is missing. (e.g. a silence threshold should be
|
mas01cr@93
|
20 applied to all queries or none, if it makes sense at all.)
|
mas01cr@93
|
21
|
mas01cr@93
|
22 Additionally, query by key (filename) might be important.
|
mas01cr@93
|
23
|
mas01cr@93
|
24 ** results
|
mas01cr@93
|
25
|
mas01cr@93
|
26 Need to sort out what the results mean; is it a similarity or a
|
mas01cr@93
|
27 distance score, etc. Also, is it possible to support NN queries in a
|
mas01cr@93
|
28 non-Euclidean space?
|
mas01cr@93
|
29
|
mas01cr@93
|
30 ** SOAP / URIs
|
mas01cr@93
|
31
|
mas01cr@93
|
32 At the moment, the query and database are referred to by paths naming
|
mas01cr@93
|
33 files on the SOAP server's filesystem. This makes a limited amount of
|
mas01cr@93
|
34 sense for the database (though exposing implementation details of
|
mas01cr@93
|
35 ISMS's file system is not a great idea) but makes no sense at all for
|
mas01cr@93
|
36 the query. So we need to define a query data structure that can be
|
mas01cr@93
|
37 serialised (preferably automatically) by SOAP for use in queries.
|
mas01cr@93
|
38
|
mas01cr@93
|
39 If we ever support inserting or other write functionality over SOAP,
|
mas01cr@93
|
40 this will need doing for feature files (the same as queries) and for
|
mas01cr@93
|
41 key lists too.
|
mas01cr@93
|
42
|
mas01cr@93
|
43 ** Memory management tricks
|
mas01cr@93
|
44
|
mas01cr@93
|
45 We have a friendly memory access pattern (at least on Unixoids;
|
mas01cr@93
|
46 Win32's API isn't a great match for mmap(), so it is significantly
|
mas01cr@93
|
47 slower there). Investigate whether madvise() tricks improve
|
mas01cr@93
|
48 performance on any OSes. Also, maybe investigate a specialized use of
|
mas01cr@93
|
49 GetViewOfFile on win32 to make it tolerable on that platform.
|
mas01cr@93
|
50
|
mas01cr@93
|
51 ** LSH
|
mas01cr@93
|
52
|
mas01cr@93
|
53 Integrate the LSH indexing with the database. Can it be done as a
|
mas01cr@93
|
54 separate index file, created on demand? What are we trying to
|
mas01cr@93
|
55 optimize our on-disk format for, and can it be better optimized by
|
mas01cr@93
|
56 having multiple files?
|
mas01cr@93
|
57
|
mas01cr@93
|
58 ** RDF (not necessarily related to audioDB)
|
mas01cr@93
|
59
|
mas01cr@93
|
60 Export the results of our experiments (kept in an SQL database) as
|
mas01cr@93
|
61 RDF, so that people can infer stuff if they know enough about our
|
mas01cr@93
|
62 methods.
|
mas01cr@93
|
63
|
mas01cr@93
|
64 Possibly also write an export routine for exporting an audioDB as RDF.
|
mas01cr@93
|
65 And laugh hollowly as XML parsers fail completely to ingest such a
|
mas01cr@93
|
66 monstrous file.
|
mas01cr@93
|
67
|
mas01cr@93
|
68 * architectural issues
|
mas01cr@93
|
69
|
mas01cr@139
|
70 ** more safety
|
mas01cr@139
|
71
|
mas01cr@170
|
72 A couple of areas are not yet safe against runtime faults.
|
mas01cr@170
|
73
|
mas01cr@170
|
74 *** Large databases might well end up writing off the end of the
|
mas01cr@170
|
75 various tables (e.g. track, l2norm).
|
mas01cr@170
|
76
|
mas01cr@170
|
77 *** transactionality is important; the last thing that should be
|
mas01cr@170
|
78 updated on insert are the free pointers (dbH->length,
|
mas01cr@170
|
79 dbH->numFiles, maybe others), so that if something goes wrong in
|
mas01cr@170
|
80 the meantime the database is not in an inconsistent state.
|
mas01cr@139
|
81
|
mas01cr@93
|
82 ** API vs command-line
|
mas01cr@93
|
83
|
mas01cr@93
|
84 While having a command line interface is nice, having the only way to
|
mas01cr@93
|
85 initialize a new audioDB instance being by faking up enough of a
|
mas01cr@93
|
86 command line to call our wacky constructors is less nice.
|
mas01cr@93
|
87 Furthermore, having the "business logic" run by the constructor is
|
mas01cr@93
|
88 also a little bit weird.
|
mas01cr@93
|
89
|
mas01cr@93
|
90 * regression (and other) tests
|
mas01cr@93
|
91
|
mas01cr@93
|
92 ** Command line interface
|
mas01cr@93
|
93
|
mas01cr@93
|
94 There is now broad coverage of the audioDB logic, with the major
|
mas01cr@93
|
95 exceptions of the batch insert command, and the specifying of
|
mas01cr@93
|
96 different keys on import.
|
mas01cr@93
|
97
|
mas01cr@93
|
98 ** SOAP
|
mas01cr@93
|
99
|
mas01cr@93
|
100 The shell's support for wait() and equivalents is limited, so there
|
mas01cr@93
|
101 are "sleep 1"s dotted around to attempt to avoid race conditions.
|
mas01cr@93
|
102 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
|
mas01cr@93
|
103 that ought not to be necessary just to run the same test twice...
|
mas01cr@93
|
104
|
mas01cr@93
|
105 ** Locking
|
mas01cr@93
|
106
|
mas01cr@93
|
107 The fcntl() locking should be good enough for our uses. Investigate
|
mas01cr@93
|
108 whether it is in fact robust enough (including that EAGAIN workaround
|
mas01cr@93
|
109 for OS X; read the kernel source to find out where that's coming from
|
mas01cr@93
|
110 and report it if possible).
|
mas01cr@93
|
111
|
mas01cr@93
|
112 ** Benchmarks
|
mas01cr@93
|
113
|
mas01cr@93
|
114 Get together a realistic set of usage cases, preferably testing each
|
mas01cr@93
|
115 of the query types, and benchmark them automatically. This is
|
mas01cr@93
|
116 basically a prerequisite of any performance work.
|
mas01cr@93
|
117
|
mas01cr@93
|
118 * Michael's old TODO list
|
mas01cr@14
|
119
|
mas01cr@14
|
120 audioDB FIXME:
|
mas01cr@14
|
121
|
mas01cr@14
|
122 o fix segfault when query is zero-length
|
mas01mc@20
|
123 :-) DONE use periodic memunmap on batch insert
|
mas01cr@14
|
124 o allow keys to be passed as queries
|
mas01mc@20
|
125 :-) DONE rename 'segments' to 'tracks' in code and help files.
|
mas01cr@14
|
126 o test suite
|
mas01cr@14
|
127 o SOAP to serialize queryFile and keyList
|
mas01cr@14
|
128 o SOAP to serialize files on insert / batch insert ?
|
mas01cr@33
|
129 :-) DONE don't overwrite existing files on db create
|
mas01cr@34
|
130 :-) DONE implement fcntl()-based locking.
|
mas01cr@35
|
131 o test locking discipline (particularly over NFS between heterogenous clients)
|
mas01cr@14
|
132
|
mas01mc@20
|
133 M. Casey 13/08/07
|
mas01cr@14
|
134
|