Wiki » History » Version 1

Luis Figueira, 2011-05-13 12:44 PM
initial paste of notes from openoffice

1 1 Luis Figueira
h1. Wiki
2 1 Luis Figueira
3 1 Luis Figueira
4 1 Luis Figueira
h2. AudioDB as Software Sustainability Challenge
5 1 Luis Figueira
6 1 Luis Figueira
Meeting notes: 11/05/2011
7 1 Luis Figueira
People: Chris Cannam, Christophe Rhodes, Tim Crawford, Benjamin Fields, Luis Figueira
8 1 Luis Figueira
Where: QMUL, 11 May 2011
9 1 Luis Figueira
10 1 Luis Figueira
h3. Current state of AudioDB
11 1 Luis Figueira
12 1 Luis Figueira
* AudioDB is some way from being ready for general use; it may be classified as research-grade code.
13 1 Luis Figueira
* Christophe has some pending fixes which need to be committed to the new repository.
14 1 Luis Figueira
15 1 Luis Figueira
What works/state of tools and documentation
16 1 Luis Figueira
17 1 Luis Figueira
C API (Christophe)
18 1 Luis Figueira
	This is an acceptable design and working correctly
19 1 Luis Figueira
20 1 Luis Figueira
Command-line interface (Christophe)
21 1 Luis Figueira
	Pre-dates the current C API.  Deprecated; should be redesigned
22 1 Luis Figueira
	Although designed as an introspection tool, it is now (sometimes) used to generate real results which involves taking output that was not intended to be parseable and parsing it using e.g. awk – this is extremely brittle
23 1 Luis Figueira
24 1 Luis Figueira
Indexing code using locality-sensitive hashing (LSH) (Michael Casey)
25 1 Luis Figueira
	Implementation is hurried, provisional, unreliable
26 1 Luis Figueira
	Code is currently omitted entirely from our repository in order to avoid conflict with potential commercial applications
27 1 Luis Figueira
	Useful for large collections of songs (>1Million) but not typically necessary for small-collection use cases where an exhaustive search is practicable
28 1 Luis Figueira
29 1 Luis Figueira
Unit tests (Christophe)
30 1 Luis Figueira
	Tests exist for most of the library API, and also covering the same ground with the command-line interface
31 1 Luis Figueira
32 1 Luis Figueira
Language bindings
33 1 Luis Figueira
	Common Lisp (Christophe): complete, with unit tests
34 1 Luis Figueira
	Python (Ben): incomplete, with some unit tests
35 1 Luis Figueira
	pd (Ian Knopke, rewritten by Christophe): suspect
36 1 Luis Figueira
	Java (Mike Jewell): probably incomplete, some unit tests using JUnit
37 1 Luis Figueira
	ActionScript (Mike Jewell): probably incomplete
38 1 Luis Figueira
39 1 Luis Figueira
Tools
40 1 Luis Figueira
	“Fake” RDF triple store that invents triples on the fly (Mike Jewell): working, with limitations
41 1 Luis Figueira
	Cocoa app (Mike Jewell, Ben): only example of an application that covers the complete range from low-level API to end-user interface; not a very pleasant interface however
42 1 Luis Figueira
	Demonstration Web interface using PHP and command line scripts: probably broken
43 1 Luis Figueira
	SOAP service (Michael Casey): deprecated and probably broken
44 1 Luis Figueira
45 1 Luis Figueira
Feature extraction
46 1 Luis Figueira
	By far the most resource-intensive part of populating a database
47 1 Luis Figueira
	Recommended default method is to use Sonic Annotator with the NNLS Chroma plugin
48 1 Luis Figueira
	Shell script exists (populate.sh) to load databases from Sonic Annotator output
49 1 Luis Figueira
	Former audioDB feature extractor (fftExtract) is deprecated and broken
50 1 Luis Figueira
51 1 Luis Figueira
Documentation
52 1 Luis Figueira
	Very little:
53 1 Luis Figueira
Obsolete database population tutorial: http://omras2.org/audioDB/tutorial1
54 1 Luis Figueira
Tiny and not very helpful query tutorial: http://omras2.org/audioDB/tutorial2
55 1 Luis Figueira
Not especially successful hip-hop example: http://omras2.org/audioDB/tutorial3
56 1 Luis Figueira
Man page included with software
57 1 Luis Figueira
	Probably the best overview of the purpose and design of audioDB currently is in “Investigating Music Collections at Different Scales with AudioDB ” (Christophe Rhodes, Tim Crawford, Michael Casey, Mark d’Inverno, JNMR 2010).  A number of earlier publications also refer to audioDB
58 1 Luis Figueira
	Christophe is working on a formal specification for the database semantics, but this is of little use to end users
59 1 Luis Figueira
60 1 Luis Figueira
Database filesystem notes
61 1 Luis Figueira
	Currently the database is just one big sparse file
62 1 Luis Figueira
	Portability problems with this: for example, HFS+ (Mac default filesystem) doesn't support sparse files
63 1 Luis Figueira
	Also maintenance problems: database needs to be sized in advance and resized as needed, cannot just append data – this is certainly doable but is not very convenient
64 1 Luis Figueira
	Christophe would like to change this so as to have one file per column, append-only
65 1 Luis Figueira
66 1 Luis Figueira
Aspirational use cases
67 1 Luis Figueira
	
68 1 Luis Figueira
	Christophe: Intelligent classical music player – find examples of Bach reusing this particular theme
69 1 Luis Figueira
	Ben: Treating playlists (rather than songs) as sequences of features where social tags constitute a feature; he published on this in “Using Song Social Tags and Topic Models to Describe and Compare Playlists ” (WOMRAD 2010)
70 1 Luis Figueira
	Christophe: Observes that other Goldsmiths researchers are interested in treating multimedia data also (e.g. streetview-like scene data)
71 1 Luis Figueira
	Tim: Studying Wagner's use of leitmotif
72 1 Luis Figueira
	Tim: Live querying from musical input device
73 1 Luis Figueira
	Tim: Categorising musical samples for performance use
74 1 Luis Figueira
	Tim: Film music analysis – is all John Williams the same?
75 1 Luis Figueira
	Chris: Is the start of the old Red Dwarf opening credit a quote from Mahler or Bruckner, or does it just sound like one?
76 1 Luis Figueira
	Also noted: Not just music databases: bird songs, street sounds, etc
77 1 Luis Figueira
78 1 Luis Figueira
Possible easy gains
79 1 Luis Figueira
80 1 Luis Figueira
	(But for which target user?  Consumer-level or API-level?)
81 1 Luis Figueira
	Christophe: Suggests aiming at the Red Dwarf example, improve data load for a “standard” data set and query by example
82 1 Luis Figueira
	Ben: Build a standard database using either the Million-Song Dataset or the subset of it that was provided alongside it
83 1 Luis Figueira
84 1 Luis Figueira
Next steps
85 1 Luis Figueira
86 1 Luis Figueira
	Christophe: Commit pending bug fixes to new repository
87 1 Luis Figueira
	Chris and Luis: Build the code and run the unit tests, possibly under coverage analysis
88 1 Luis Figueira
	Chris and Luis: Test and improve the Java and Python bindings
89 1 Luis Figueira
	Chris and Luis: Produce a more user-friendly import tool using the Python bindings (but still running Sonic Annotator behind the scenes)
90 1 Luis Figueira
91 1 Luis Figueira
Note: there's a project grant submission on indexing and PostgresSQL integration – so e.g. re-adding indexing is not currently a priority.