mas01cr@93
|
1 * development of functionality
|
mas01cr@93
|
2
|
mas01cr@103
|
3 ** exposure of all non-write functions over Web Services
|
mas01cr@103
|
4
|
mas01cr@103
|
5 At present, the radius / counting query type isn't supported over
|
mas01cr@103
|
6 SOAP. Supporting it involves changing the adb__query() exported
|
mas01cr@103
|
7 function, so we need to be careful.
|
mas01cr@103
|
8
|
mas01cr@93
|
9 ** matrix of possible queries
|
mas01cr@93
|
10
|
mas01cr@93
|
11 At the moment, there are four content-based query types, each of which
|
mas01cr@93
|
12 does something slightly different from what you might expect from its
|
mas01cr@93
|
13 name. I think that the space of possible (sensible?) queries is
|
mas01cr@93
|
14 larger than this -- though working out the sensible abstraction might
|
mas01cr@93
|
15 have to wait for more use cases -- and also that the orthogonality of
|
mas01cr@93
|
16 various parameters is missing. (e.g. a silence threshold should be
|
mas01cr@93
|
17 applied to all queries or none, if it makes sense at all.)
|
mas01cr@93
|
18
|
mas01cr@93
|
19 Additionally, query by key (filename) might be important.
|
mas01cr@93
|
20
|
mas01cr@93
|
21 ** results
|
mas01cr@93
|
22
|
mas01cr@93
|
23 Need to sort out what the results mean; is it a similarity or a
|
mas01cr@93
|
24 distance score, etc. Also, is it possible to support NN queries in a
|
mas01cr@93
|
25 non-Euclidean space?
|
mas01cr@93
|
26
|
mas01cr@93
|
27 ** SOAP / URIs
|
mas01cr@93
|
28
|
mas01cr@93
|
29 At the moment, the query and database are referred to by paths naming
|
mas01cr@93
|
30 files on the SOAP server's filesystem. This makes a limited amount of
|
mas01cr@93
|
31 sense for the database (though exposing implementation details of
|
mas01cr@93
|
32 ISMS's file system is not a great idea) but makes no sense at all for
|
mas01cr@93
|
33 the query. So we need to define a query data structure that can be
|
mas01cr@93
|
34 serialised (preferably automatically) by SOAP for use in queries.
|
mas01cr@93
|
35
|
mas01cr@93
|
36 If we ever support inserting or other write functionality over SOAP,
|
mas01cr@93
|
37 this will need doing for feature files (the same as queries) and for
|
mas01cr@93
|
38 key lists too.
|
mas01cr@93
|
39
|
mas01cr@93
|
40 ** Memory management tricks
|
mas01cr@93
|
41
|
mas01cr@93
|
42 We have a friendly memory access pattern (at least on Unixoids;
|
mas01cr@93
|
43 Win32's API isn't a great match for mmap(), so it is significantly
|
mas01cr@93
|
44 slower there). Investigate whether madvise() tricks improve
|
mas01cr@93
|
45 performance on any OSes. Also, maybe investigate a specialized use of
|
mas01cr@93
|
46 GetViewOfFile on win32 to make it tolerable on that platform.
|
mas01cr@93
|
47
|
mas01cr@93
|
48 ** LSH
|
mas01cr@93
|
49
|
mas01cr@93
|
50 Integrate the LSH indexing with the database. Can it be done as a
|
mas01cr@93
|
51 separate index file, created on demand? What are we trying to
|
mas01cr@93
|
52 optimize our on-disk format for, and can it be better optimized by
|
mas01cr@93
|
53 having multiple files?
|
mas01cr@93
|
54
|
mas01cr@93
|
55 ** RDF (not necessarily related to audioDB)
|
mas01cr@93
|
56
|
mas01cr@93
|
57 Export the results of our experiments (kept in an SQL database) as
|
mas01cr@93
|
58 RDF, so that people can infer stuff if they know enough about our
|
mas01cr@93
|
59 methods.
|
mas01cr@93
|
60
|
mas01cr@93
|
61 Possibly also write an export routine for exporting an audioDB as RDF.
|
mas01cr@93
|
62 And laugh hollowly as XML parsers fail completely to ingest such a
|
mas01cr@93
|
63 monstrous file.
|
mas01cr@93
|
64
|
mas01cr@93
|
65 * architectural issues
|
mas01cr@93
|
66
|
mas01cr@139
|
67 ** more safety
|
mas01cr@139
|
68
|
mas01cr@139
|
69 A couple of areas are not yet safe against runtime faults. The simple
|
mas01cr@139
|
70 case is zero-length features, which will lead to division by zero
|
mas01cr@139
|
71 errors; more pressingly, large databases might well end up writing off
|
mas01cr@139
|
72 the end of the various tables (e.g. track, l2norm).
|
mas01cr@139
|
73
|
mas01cr@93
|
74 ** API vs command-line
|
mas01cr@93
|
75
|
mas01cr@93
|
76 While having a command line interface is nice, having the only way to
|
mas01cr@93
|
77 initialize a new audioDB instance being by faking up enough of a
|
mas01cr@93
|
78 command line to call our wacky constructors is less nice.
|
mas01cr@93
|
79 Furthermore, having the "business logic" run by the constructor is
|
mas01cr@93
|
80 also a little bit weird.
|
mas01cr@93
|
81
|
mas01cr@93
|
82 * regression (and other) tests
|
mas01cr@93
|
83
|
mas01cr@93
|
84 ** Command line interface
|
mas01cr@93
|
85
|
mas01cr@93
|
86 There is now broad coverage of the audioDB logic, with the major
|
mas01cr@93
|
87 exceptions of the batch insert command, and the specifying of
|
mas01cr@93
|
88 different keys on import.
|
mas01cr@93
|
89
|
mas01cr@93
|
90 ** SOAP
|
mas01cr@93
|
91
|
mas01cr@93
|
92 The shell's support for wait() and equivalents is limited, so there
|
mas01cr@93
|
93 are "sleep 1"s dotted around to attempt to avoid race conditions.
|
mas01cr@93
|
94 Find a better way. Similarly, using SO_REUSEADDR in bind() is a hack
|
mas01cr@93
|
95 that ought not to be necessary just to run the same test twice...
|
mas01cr@93
|
96
|
mas01cr@93
|
97 ** Locking
|
mas01cr@93
|
98
|
mas01cr@93
|
99 The fcntl() locking should be good enough for our uses. Investigate
|
mas01cr@93
|
100 whether it is in fact robust enough (including that EAGAIN workaround
|
mas01cr@93
|
101 for OS X; read the kernel source to find out where that's coming from
|
mas01cr@93
|
102 and report it if possible).
|
mas01cr@93
|
103
|
mas01cr@93
|
104 ** Benchmarks
|
mas01cr@93
|
105
|
mas01cr@93
|
106 Get together a realistic set of usage cases, preferably testing each
|
mas01cr@93
|
107 of the query types, and benchmark them automatically. This is
|
mas01cr@93
|
108 basically a prerequisite of any performance work.
|
mas01cr@93
|
109
|
mas01cr@93
|
110 * Michael's old TODO list
|
mas01cr@14
|
111
|
mas01cr@14
|
112 audioDB FIXME:
|
mas01cr@14
|
113
|
mas01cr@14
|
114 o fix segfault when query is zero-length
|
mas01mc@20
|
115 :-) DONE use periodic memunmap on batch insert
|
mas01cr@14
|
116 o allow keys to be passed as queries
|
mas01mc@20
|
117 :-) DONE rename 'segments' to 'tracks' in code and help files.
|
mas01cr@14
|
118 o test suite
|
mas01cr@14
|
119 o SOAP to serialize queryFile and keyList
|
mas01cr@14
|
120 o SOAP to serialize files on insert / batch insert ?
|
mas01cr@33
|
121 :-) DONE don't overwrite existing files on db create
|
mas01cr@34
|
122 :-) DONE implement fcntl()-based locking.
|
mas01cr@35
|
123 o test locking discipline (particularly over NFS between heterogenous clients)
|
mas01cr@14
|
124
|
mas01mc@20
|
125 M. Casey 13/08/07
|
mas01cr@14
|
126
|