webaudioevaluationtool: docs/SMC15/smc2015template.tex comparison

comparison docs/SMC15/smc2015template.tex @ 1733:2d4688fa1eab

Paper: comments Josh, extra section

author	Brecht De Man <b.deman@qmul.ac.uk>
date	Mon, 27 Apr 2015 22:18:20 +0100
parents	7435309fd918
children	ffeef0ac7a5f

comparison

equal deleted inserted replaced

-:7435309fd918
+:2d4688fa1eab
 \section{Introduction}\label{sec:introduction}
 %NICK: examples of what kind of audio applications HTML5 has made possible, with references to publications (or website)\\
-Perceptual evaluation of audio plays an important role in a wide range of research on audio quality \cite{schoeffler2013impact,repp}, sound synthesis \cite{de2013real,durr2015implementation}, audio effect design \cite{deman2014a}, source separation \cite{mushram,uhlereiss}, music and emotion analysis \cite{song2013b,song2013a}, and many others \cite{friberg2011comparison}.  % codec design?
+Perceptual evaluation of audio plays an important role in a wide range of research on audio quality \cite{schoeffler2013impact,repp}, sound synthesis \cite{de2013real,durr2015implementation}, audio effect design \cite{deman2014a}, source separation \cite{mushram,uhlereiss}, music and emotion analysis \cite{song2013a,eerola2009prediction}, and many others \cite{friberg2011comparison}.  % codec design?
 %This work is based in part on the APE audio perceptual evaluation interface for MATLAB \cite{deman2014b}. An important drawback of this toolbox is the need to have MATLAB to create a test and even to run (barring the use of an executable generated by MATLAB), and limited compatibility with both earlier and newer versions of MATLAB, which makes it hard to maintain. On the other hand, a web application generally has the advantage of running in most browsers on most applications.
 % IMPORTANT
 %[TO ADD: other interfaces for perceptual evaluation of audio, browser-based or not!] \\
 At this point, we have implemented the interface of the MATLAB-based APE (Audio Perceptual Evaluation) toolbox \cite{deman2014b}. This shows one marker for each simultaneously evaluated audio fragment on one or more horizontal axes, that can be moved to rate or rank the respective fragments in terms of any subjective property, as well as a comment box for every marker, and any extra text boxes for extra comments.
 The reason for such an interface, where all stimuli are presented on a single rating axis (or multiple axes if multiple subjective qualities need to be evaluated), is that it urges the subject to consider the rating and/or ranking of the stimuli relative to one another, as opposed to comparing each individual stimulus to a given reference, as is the case with e.g. a MUSHRA test \cite{mushra}. As such, it is ideal for any type of test where the goal is to carefully compare samples against each other, like perceptual evaluation of different mixes of music recordings \cite{deman2015a} or sound synthesis models \cite{durr2015implementation}, as opposed to comparing results of source separation algorithms \cite{mushram} or audio with lower data rate \cite{mushra} to a high quality reference signal.
 The markers on the slider at the top of the page are positioned randomly, to minimise the bias that may be introduced when the initial positions are near the beginning, end or middle of the slider. Another approach is to place the markers outside of the slider bar at first and have the subject drag them in, but the authors believe this doesn't encourage careful consideration and comparison of the different fragments as the implicit goal of the test becomes to audition and drag each fragment in just once, rather than to compare all fragments rigorously.
-See Figure \ref{fig:interface} for an example of the interface, with eleven fragments and one axis. %? change if a new interface is shown
+See Figure \ref{fig:interface} for an example of the interface, with six fragments and one axis. %? change if a new interface is shown
 %Most of these functions are specific to the APE interface design, for instance the AB test will need a different structure for the audio engine and loading of files, since multiple instances of the same file are required. % more generally these pertain to any typeof multi-stimulus test - not quite useful for AB tests, method of adjustment, ABX, and so on.
 %There are some areas of the design where certain design choices had to be made such as with the markers.
 %For instance, the option to provide free-text comment fields allows for tests with individual vocabulary methods, as opposed to only allowing quantitative scales associated to a fixed set of descriptors.
 When an \textit{audioObject} is created, it is given the URL of the audio sample to load. This is downloaded into the browser asynchronously using the \textit{XMLHttpRequest} object, which downloads any file into the JavaScript environment for further processing. This is particularly useful for the Web Audio API because it supports downloading of files in their binary form for decoding. Once downloaded the file is decoded using the Web Audio API offline decoder. This uses the browser available decoding schemes to decode the audio files into raw float32 arrays, which are in turn passed to the relevant \textit{audioObject} for playback.
 Once each page of the test is completed, identified by pressing the Submit button, the \textit{pageXMLSave(testId)} is called to store all of the collected data until all pages of the test are completed. After the final test and any post-test questions are completed, the \textit{interfaceXMLSave()} function is called. This function generates the final XML file for submission as outlined in Section \ref{sec:setupresultsformats}.
+\section{Support and limitations}\label{sec:support}
 Browsers support various audio file formats and are not consistent in any format. Currently the Web Audio API is best supported in Chrome, Firefox, Opera and Safari. All of these support the use of the uncompressed WAV format. Although not a compact, web friendly format, most transport systems are of a high enough bandwidth this should not be a problem. Ogg Vorbis is another well supported format across the four supported major desktop browsers, as well as MP3 (although Firefox may not support all MP3 types) \footnote{https://developer.mozilla.org/en-US/docs/Web/HTML/\\Supported\_media\_formats}. %https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats
 One issue of the Web Audio API is that the sample rate is assigned by the system sound device, rather than requested and does not have the ability to request a different one. % Does this make sense? The problem is across all audio files.
 As the sampling rate and the effect of resampling may be critical for some listening tests, the default operation when an audio file is loaded with a different sample rate to that of the system is to convert the sample rate. To provide a check for this, the desired sample rate can be supplied with the setup XML and checked against. If the sample rates do not match, a browser alert window is shown asking for the sample rate to be correctly adjusted.
 This happens before any loading or decoding of audio files so the browser will only be instructed to fetch files if the system sample rate meets the requirements, avoiding multiple requests for large files until they are actually needed.
 \item \textbf{Loop fragments}: Repeat current fragment when end is reached, until the `Stop audio' or `Submit' button is clicked.
 \item \textbf{Comments}: Displays a separate comment box for each fragment in the page.
 \item \textbf{General comment}: One comment box, additional to the individual comment boxes, to comment on the test or a feature that some or all of the fragments share.
 \item \textbf{Resampling}: When this is enabled, tracks are resampled to match the subject's system's sample rate (a default feature of the Web Audio API). When it is not, an error is shown when the system does not match the requested sample rate.
 \item \textbf{Randomise page order}: Randomises the order in which different `pages' are presented. % are we calling this 'pages'?
-\item \textbf{Randomise fragment order}: Randomises the order and numbering of the markers and comment boxes corresponding with the fragments. This permutation is stored as well, to be able to interpret references to the numbers in the comments (such as `this is much [brighter] then 4').
+\item \textbf{Randomise fragment order}: Randomises the order and numbering of the markers and comment boxes corresponding to the fragments. This permutation is stored as well, to be able to interpret references to the numbers in the comments (such as `this is much [brighter] then 4').
 \item \textbf{Require playback}: Require that each fragment has been played at least once, if not in full.
 \item \textbf{Require full playback}: If `Require playback' is active, require that each fragment has been played in full.
 \item \textbf{Require moving}: Require that each marker is moved (dragged) at least once.
 \item \textbf{Require comments}: This option allows requiring the subject to require a comment for each track.
 \item \textbf{Repeat test}: Number of times each page in the test should be repeated (none by default), to allow familiarisation with the content and experiment, and to investigate consistency of user and variability due to familiarity. In the setup, each 'page' can be given a repeat count. These are all gathered before shuffling the order so repeated tests are not back-to-back if possible.
 The results also contain information collected by any defined pre/post questions. These are referenced against the setup XML by using the same ID so readable responses can be obtained. Taking from the earlier example of setting up a pre-test question, an example response can be seen in Figure \ref{fig:xmlOut}.
 Each page of testing is returned with the results of the entire page included in the structure. One `audioElement' node is created per audio fragment per page, along with its ID. This includes several child nodes including the rating between 0 and 1, the comment, and any other collected metrics including how long the element was listened for, the initial position, boolean flags if the element was listened to, if the element was moved and if the element comment box had any comment. Furthermore, each user action (manipulation of any interface element, such as playback or moving a marker) can be logged along with a the corresponding time code.
 We also store session data such as the browser the tool was used in.
 We provide the option to store the results locally, and/or to have them sent to a server.
 %Here is an example of the set up XML and the results XML: % perhaps best to refer to each XML after each section (set up <> results)
 % Should we include an Example of the input and output XML structure?? --> Sure.
 In this paper we have presented an approach to creating a browser-based listening test environment that can be used for a variety of types of perceptual evaluation of audio.
 Specifically, we discussed the use of the toolbox in the context of assessment of preference for different production practices, with identical source material.
 The purpose of this paper is to outline the design of this tool, to describe our implementation using basic HTML5 functionality, and to discuss design challenges and limitations of our approach. This tool differentiates itself from other perceptual audio tools by enabling web technologies for multiple participants to perform the test without the need for proprietary software such as MATLAB. The tool also allows for any interface to be built using HTML5 elements to create a variety of dynamic, multiple-stimulus listening test interfaces. It enables quick setup of simple tests with the ability to manage complex tests through a single file. Finally it uses the XML document format to store the results allowing for processing and analysis of results in various third party software such as MATLAB or Python.
 % future work
-Further work may include the development of other common test designs, such as MUSHRA \cite{mushra}, 2D valence and arousal rating, and others. We will add functionality to assist with setting up large-scale tests with remote subjects, so this becomes straightforward and intuitive.
+Further work may include the development of other common test designs, such as MUSHRA \cite{mushra}, 2D valence and arousal/activity \cite{ratingeerola2009prediction}, and others. We will add functionality to assist with setting up large-scale tests with remote subjects, so this becomes straightforward and intuitive.
 In addition, we will keep on improving and expanding the tool, and highly welcome feedback and contributions from the community.
 The source code of this tool can be found on \\ \texttt{code.soundsoftware.ac.uk/projects/}\\ \texttt{webaudioevaluationtool}.

Mercurial > hg > webaudioevaluationtool

comparison docs/SMC15/smc2015template.tex @ 1733:2d4688fa1eab