view publications/sempre2014/mauch_sempre2014_GF_edits.txt @ 492:6e484c58ca25 recording

Restore record button toggle state if user cancels file session save dialog after hitting record
author Chris Cannam
date Mon, 12 Oct 2015 13:24:12 +0100
parents 26224791546f
children
line wrap: on
line source
Paper title.
Matthias Mauch and Chris Cannam: Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications

Abstract.
We present **Tony**, a free, open-source software tool for 
computer-aided pitch track and note annotation of melodic audio content.
The accurate annotation of fundamental frequencies and notes
is essential to the scientific study of 
intonation in singing and other instruments.
Unlike commercial applications for singers and producers 
or other academic tools for generic music annotation and visualisation
**Tony** has been designed for the scientific study of monophonic music:
a) it implements state-of-the art algorithms for pitch and note estimation from audio,
b) it provides visual and auditory feedback of the extracted pitches 
for the identification of detection errors,
b) it provides an intelligent graphical user interface 
through which the user can identify and rapidly correct estimation errors,
c) it provides functions for exporting pitch track and note track 
enabling further processing in spreadsheets or other applications.
Software versions for Windows, OSX and Linux platforms can be downloaded from
http://code.soundsoftware.ac.uk/projects/tony

Keyword 1.
Pitch/Note Analysis

Keyword 2.
Software

Keyword 3.
Singing.

Aims.
We aim to make the scientific annotation of melodic content more efficient.
==> We aim to make the annotation of melodic content for scientific purposes more efficient. 
(also, possibly move this sentence to the end)

Music psychologists interested in the analysis of pitch and intonation 
usually use software programs originally aimed at the analysis of speech
(e.g. Praat http://www.fon.hum.uva.nl/praat/) or generic audio annotation
tools (e.g. Sonic Visualiser http://www.sonicvisualiser.org/)
to extract pitches of notes from audio recordings. 
Since these programs were not conceived for musical pitch analysis, 
the process of extracting note frequencies remains laborious and can take
many times the duration of the recording.

On the other hand, commercial tools such as
Melodyne (http://www.celemony.com/), Songs2See (http://www.songs2see.com/) or 
Sing&See (http://www.singandsee.com/) have 
unknown frequency estimation procedures (proprietary code)
and do not provide export formats needed for scientific analysis.

==> Commercial tools such as Melodyne (http://www.celemony.com/), Songs2See (http://www.songs2see.com/) or 
Sing&See (http://www.singandsee.com/) also exists for these purposes, however 
their frequency estimation procedures are typically not public (proprietary code),
and they do not provide export formats suitable for scientific analysis.


An academic note annotation system [1] exists, but does not feature 
note extraction. It is also not openly available.

==> An note annotation system [1] developed for academic purposes exists, but it does not feature 
note extraction. It is also not openly available. (openly ?? => open source, free/prop.? )

This is why, during our own research on intonation [2], 
we decided to code our own pitch extraction tool that would avoid the shortcomings.

==> This is why we decided to develop our own pitch extraction tool that would avoid 
the above shortcomings during our own research on intonation [2].


Methods.
For automatic pitch and note estimation we use the pYIN method [3]. 
The method provides precise pitch and note estimates and 
automatically determines which parts of the recording are voiced.

The graphical user interface is based upon the 
open source software libraries from Sonic Visualiser.

==> The graphical user interface is based upon 
open source software libraries originally developed for the Sonic Visualiser software.

It features the audio waveform, a spectrogram representation, 
the pitch track and notes. Users can scroll and zoom in time.
**Tony** does not only play back the original audio, 
but also, optionally, sonifications of the pitch track (melody line) 
and the note track (discrete pitches with durations).
Notes' pitches are robustly estimated as the median of the pitch track
that occurs during the duration of the note.

(robustly? I know it's good, but nothing really supports the fairly strong statement here…)

The user can delete, move, cut, merge, crop and extend notes, 
and the note's frequency is adapted accordingly.
The user can delete spurious parts of the pitch track 
and shift the pitch track in frequency.
In order to efficiently correct erroneous pitch tracks, the user can select 
a time interval, and **Tony** will provide various alternative 
pitch tracks. The user can then pick the correct one.

Outcomes.
The system is currently being used for two projects:
for the generation of new training and test data for Music Informatics research, 
and for a new project on intonation in unaccompanied solo singing.

==>  The system is currently being used for two projects:
1) for the generation of new training and test data for Music Informatics research, 
and 2) a research project on intonation in unaccompanied solo singing.

Preliminary feedback by the users suggests that 
the system does indeed facilitate pitch annotation 
and provides vital features that cannot be found in other tools.


Title for final section.
Conclusions

[Q37].
We presented **Tony** a new software tool for computer-assisted
annotation of melodic audio content for scientific analysis.
No other existing program combines pitch and note estimation, 
a graphical user interface with auditory feedback,
rapid, computer-aided correction of pitches and
and extensive exporting facilities.
**Tony** is freely available for use on Windows, OSX and Linux platforms
from http://code.soundsoftware.ac.uk/projects/tony/.

Acknowledgements.
Matthias Mauch is funded by the Royal Academy of Engineering. 
We would like to thank Justin Salamon, Rachel Bittner and Juan Bello 
for their comments and coding help.

Three key references. (APA v6)
[1] Pant, S., Rao, V., & Rao, P. (2010). A melody detection user interface for polyphonic music. 2010 National Conference On Communications (NCC), 2010.
[2] Mauch, M., Frieler, K., & Dixon, S. (under review). Intonation in Unaccompanied Singing: Accuracy, Drift and a Model of Intonation Memory.
[3] Mauch, M., & Dixon, S. (2014). pYIN : a Fundamental Frequency Estimator Using Probabilistic Threshold Distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014).

Comments/queries to organisers.