mas01cr@93
|
1 * development of functionality
|
mas01cr@93
|
2
|
mas01cr@103
|
3 ** exposure of all non-write functions over Web Services
|
mas01cr@103
|
4
|
mas01cr@103
|
5 At present, the radius / counting query type isn't supported over
|
mas01cr@103
|
6 SOAP. Supporting it involves changing the adb__query() exported
|
mas01cr@103
|
7 function, so we need to be careful.
|
mas01cr@103
|
8
|
mas01cr@93
|
9 ** matrix of possible queries
|
mas01cr@93
|
10
|
mas01cr@93
|
11 At the moment, there are four content-based query types, each of which
|
mas01cr@93
|
12 does something slightly different from what you might expect from its
|
mas01cr@93
|
13 name. I think that the space of possible (sensible?) queries is
|
mas01cr@93
|
14 larger than this -- though working out the sensible abstraction might
|
mas01cr@93
|
15 have to wait for more use cases -- and also that the orthogonality of
|
mas01cr@93
|
16 various parameters is missing. (e.g. a silence threshold should be
|
mas01cr@93
|
17 applied to all queries or none, if it makes sense at all.)
|
mas01cr@93
|
18
|
mas01cr@93
|
19 Additionally, query by key (filename) might be important.
|
mas01cr@93
|
20
|
mas01cr@93
|
21 ** results
|
mas01cr@93
|
22
|
mas01cr@93
|
23 Need to sort out what the results mean; is it a similarity or a
|
mas01cr@93
|
24 distance score, etc. Also, is it possible to support NN queries in a
|
mas01cr@93
|
25 non-Euclidean space?
|
mas01cr@93
|
26
|
mas01cr@93
|
27 ** SOAP / URIs
|
mas01cr@93
|
28
|
mas01cr@93
|
29 At the moment, the query and database are referred to by paths naming
|
mas01cr@93
|
30 files on the SOAP server's filesystem. This makes a limited amount of
|
mas01cr@93
|
31 sense for the database (though exposing implementation details of
|
mas01cr@93
|
32 ISMS's file system is not a great idea) but makes no sense at all for
|
mas01cr@93
|
33 the query. So we need to define a query data structure that can be
|
mas01cr@93
|
34 serialised (preferably automatically) by SOAP for use in queries.
|
mas01cr@93
|
35
|
mas01cr@93
|
36 If we ever support inserting or other write functionality over SOAP,
|
mas01cr@93
|
37 this will need doing for feature files (the same as queries) and for
|
mas01cr@93
|
38 key lists too.
|
mas01cr@93
|
39
|
mas01cr@93
|
40 ** Memory management tricks
|
mas01cr@93
|
41
|
mas01cr@93
|
42 We have a friendly memory access pattern (at least on Unixoids;
|
mas01cr@93
|
43 Win32's API isn't a great match for mmap(), so it is significantly
|
mas01cr@93
|
44 slower there). Investigate whether madvise() tricks improve
|
mas01cr@93
|
45 performance on any OSes. Also, maybe investigate a specialized use of
|
mas01cr@93
|
46 GetViewOfFile on win32 to make it tolerable on that platform.
|
mas01cr@93
|
47
|
mas01cr@93
|
48 ** LSH
|
mas01cr@93
|
49
|
mas01cr@93
|
50 Integrate the LSH indexing with the database. Can it be done as a
|
mas01cr@93
|
51 separate index file, created on demand? What are we trying to
|
mas01cr@93
|
52 optimize our on-disk format for, and can it be better optimized by
|
mas01cr@93
|
53 having multiple files?
|
mas01cr@93
|
54
|
mas01cr@93
|
55 ** RDF (not necessarily related to audioDB)
|
mas01cr@93
|
56
|
mas01cr@93
|
57 Export the results of our experiments (kept in an SQL database) as
|
mas01cr@93
|
58 RDF, so that people can infer stuff if they know enough about our
|
mas01cr@93
|
59 methods.
|
mas01cr@93
|
60
|
mas01cr@93
|
61 Possibly also write an export routine for exporting an audioDB as RDF.
|
mas01cr@93
|
62 And laugh hollowly as XML parsers fail completely to ingest such a
|
mas01cr@93
|
63 monstrous file.
|
mas01cr@93
|
64
|
mas01cr@93
|
65 * architectural issues
|
mas01cr@93
|
66
|
mas01cr@93
|
67 ** API vs command-line
|
mas01cr@93
|
68
|
mas01cr@93
|
69 While having a command line interface is nice, having the only way to
|
mas01cr@93
|
70 initialize a new audioDB instance being by faking up enough of a
|
mas01cr@93
|
71 command line to call our wacky constructors is less nice.
|
mas01cr@93
|
72 Furthermore, having the "business logic" run by the constructor is
|
mas01cr@93
|
73 also a little bit weird.
|
mas01cr@93
|
74
|
mas01cr@93
|
75 * regression (and other) tests
|
mas01cr@93
|
76
|
mas01cr@93
|
77 ** Command line interface
|
mas01cr@93
|
78
|
mas01cr@93
|
79 There is now broad coverage of the audioDB logic, with the major
|
mas01cr@93
|
80 exceptions of the batch insert command, and the specifying of
|
mas01cr@93
|
81 different keys on import.
|
mas01cr@93
|
82
|
mas01cr@93
|
83 ** SOAP
|
mas01cr@93
|
84
|
mas01cr@93
|
85 The shell's support for wait() and equivalents is limited, so there
|
mas01cr@93
|
86 are "sleep 1"s dotted around to attempt to avoid race conditions.
|
mas01cr@93
|
87 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
|
mas01cr@93
|
88 that ought not to be necessary just to run the same test twice...
|
mas01cr@93
|
89
|
mas01cr@93
|
90 ** Locking
|
mas01cr@93
|
91
|
mas01cr@93
|
92 The fcntl() locking should be good enough for our uses. Investigate
|
mas01cr@93
|
93 whether it is in fact robust enough (including that EAGAIN workaround
|
mas01cr@93
|
94 for OS X; read the kernel source to find out where that's coming from
|
mas01cr@93
|
95 and report it if possible).
|
mas01cr@93
|
96
|
mas01cr@93
|
97 ** Benchmarks
|
mas01cr@93
|
98
|
mas01cr@93
|
99 Get together a realistic set of usage cases, preferably testing each
|
mas01cr@93
|
100 of the query types, and benchmark them automatically. This is
|
mas01cr@93
|
101 basically a prerequisite of any performance work.
|
mas01cr@93
|
102
|
mas01cr@93
|
103 * Michael's old TODO list
|
mas01cr@14
|
104
|
mas01cr@14
|
105 audioDB FIXME:
|
mas01cr@14
|
106
|
mas01cr@14
|
107 o fix segfault when query is zero-length
|
mas01mc@20
|
108 :-) DONE use periodic memunmap on batch insert
|
mas01cr@14
|
109 o allow keys to be passed as queries
|
mas01mc@20
|
110 :-) DONE rename 'segments' to 'tracks' in code and help files.
|
mas01cr@14
|
111 o test suite
|
mas01cr@14
|
112 o SOAP to serialize queryFile and keyList
|
mas01cr@14
|
113 o SOAP to serialize files on insert / batch insert ?
|
mas01cr@33
|
114 :-) DONE don't overwrite existing files on db create
|
mas01cr@34
|
115 :-) DONE implement fcntl()-based locking.
|
mas01cr@35
|
116 o test locking discipline (particularly over NFS between heterogenous clients)
|
mas01cr@14
|
117
|
mas01mc@20
|
118 M. Casey 13/08/07
|
mas01cr@14
|
119
|