Wiki » History » Version 2

Version 1 (Matthias Mauch, 2012-11-05 03:10 PM) → Version 2/63 (Matthias Mauch, 2012-11-05 04:10 PM)

h1. Wiki

h2. Specification

The Tony tool will be a very simple user interface for the exact annotation of notes, note pitches and performance of human singing and other monophonic instruments.
The reason it can be kept very simple is that it exports beautifully to other applications such as Sonic Visualiser, which have different strengths.


h3. Users

We expect users to be unfamiliar with programming, but to be familiar with music and research in general. In order to illustrate this, these are some possible users:

# Friedrich (56) is a musicologist at a university in northern Germany. Sample publication: "Zur Ästhetik der Stimme bei Wagner - Eine vergleichende Analyse historischer und moderner Aufnahmen". He uses a Windows Vista laptop with, mainly for Email correspondence and keeping lists of audio recordings in an Excel spreadsheet. He listens to Music from CDs, noting down singing characteristics on a paper notepad. He plays the cello and organises outings to the Berlin philharmonics with his Seminar students.
# Sam (31) (27) is an ethnomusicologist at the University of Essex studying the music of Brazilian immigrants in Portugal. She records the immigrants' music on location in Lisbon and compares the rhythmic and melodic variations of the music between Brazil and Portugal. Sam uses a MacBook from 2007. She organises the field recordings and edits them using Audacity. She regularly dances Capoeira in the local Capoeira group in Colchester, Essex.

# Etienne (25) Peter (23) is a graduate student in the psychology department of a Belgian university and part of a project that aims at replicating the "Levithin" effect. He's a Windows user with a bit of a knack for writing Python scripts. He's frustrated with the way his boss asks him to annotate melodies in Praat, but doesn't quite have the experience or support to build a tool that does it better. Etienne is also associated with a computer science guy who's looked into African rhythms.
# David (25) is a super-bright statistics graduate who's also a clarinet player and just started his PhD on music transcription of wind instruments at a Canadian university. He realises that there's not enough training data out there to learn his models, so he would like to annotate lots of clarinet recordings.



h3. Components

* automatic pitch and note transcription methods
* GUI for correction, and additional note-based or phrase-based annotations
* export to RDF and graphics

h4. Automatic Transcription Methods

* pitch extraction
* monophonic via Yin
* interface for other algorithms, potentially polyphonic
* output of salience function, or at least N candidate pitches (even if pitch estimation thinks there's no pitch)
* Note alignment: align notes to MIDI-like representation given a pitch file
* Non-alignment note detection
* Methods should be easily re-executable with time-dependent parameter settings

h4.


GUI


The most important aspect of the GUI is that it should be simple because the users are unlikely to be familiar with complicated programs. The most important point should hence be to keep the layout simple. Hence first some *forbidden things*:



* polyphonic melody processing (diverts attention from main use, prevents easy interaction with monophonic interface)
User A: German musicologist
* audio editing (this is an analysis and annotation program, leave that used to Audacity)
Windows 95 but recently acquired a laptop with Windows 7 installed
* have more than one central pane (too confusing, this is a one task application -- audio uses Word to write papers and visual are already two dimensions, let's not add more)
Outlook for email, but no other computer programs
* load User B:

1. Signal processing: mainly pitch extraction via Yin, but including
more outputs than one audio file (we leave that to Sonic Visualiser) the original Yin (in particular: frequency estimate, even if Yin thinks there's no pitch, and the underlying salience function).
* more than one "instrument" (pointer, scissors, eraser, move tool should be superfluous in 1. Note alignment: align notes given a simple monophonic environment) pitch file
* additional structural annotation: beats, bars... that can 1. Note detection: this may be done post-hoc in Sonic Visualiser

So here are some requirements:

* One *main pane* that always stays in roughly the same, central place.
* displays
implemented as a fixed set variant of features: pitch track, overlaid note track; hidden: note salience, other pitch track candidates.
* note track is editable
* pitch track is not editable separately (why should it?), only via
the note track HMM I made, with three states per output.
1. Interactive GUI with these key features:
* 2 permanent *single-purpose strips* across the length of the main pane
scrolling/zooming
* select (notes or note ranges)
scroll through piece horizontally (in time) and vertically (in pitch)
* annotate note (double click for simple annotation, drop-down for specific fields)
vertical zoom
* Note editing

* edit note start and end times,

* edit continuous note pitch,

* choose note pitch estimate (mean, median, manual)
*
turn note to rest (unpitched)

* impossible to delete lock notes (notes can only be made into rests -- we assume a contiguous, monophonic melody)
* (lock notes, i.e.
(i.e. prevent from re-estimation)

* Audio Playback

* playback original audio

* sonify notes

* sonify pitch track
loop around current note
* Undo note and pitch track changes

* Export notes, pitchtrack, annotations in the following formats
* as csv files (very important)
* as graphics (quite important): pdf, possibly self-contained R script
* as SV session file
* as RDF
* as MIDI

Some explicit differences to Sonic Visualiser:

* Tony has only one pane (I know I mentioned that before, can't ever mention it too often)
* arrow keys should only change the playback position,
notes and only via that the visualisation position pitches
* zoom sliders should be very large -- if possible along the whole sides of the main pane
* only one instrument (i.e. not have a separate selection tool, or scissors tool)
* necessitates permanent selection strip (at the top of the main pane), in order to select contiguous notes)
* double click splits/joins

Some explicit differences to Praat:

* Tony has only one window
* Tony has a concept of a contiguous melody

h4.
1. Further possible GUI features
* automatic re-estimation after edits (kind of Maximum A Posteriori or something)
* export as MIDI
1. Explicitly unwanted features
* delete note (notes can only be made into rests -- we assume a contiguous, monophonic melody)
* horizontal zoom
* polyphonic processing