comparison docs/SMC15/smc2015template.tex @ 1733:2d4688fa1eab

Paper: comments Josh, extra section
author Brecht De Man <b.deman@qmul.ac.uk>
date Mon, 27 Apr 2015 22:18:20 +0100
parents 7435309fd918
children ffeef0ac7a5f
comparison
equal deleted inserted replaced
1732:7435309fd918 1733:2d4688fa1eab
150 150
151 \section{Introduction}\label{sec:introduction} 151 \section{Introduction}\label{sec:introduction}
152 152
153 %NICK: examples of what kind of audio applications HTML5 has made possible, with references to publications (or website)\\ 153 %NICK: examples of what kind of audio applications HTML5 has made possible, with references to publications (or website)\\
154 154
155 Perceptual evaluation of audio plays an important role in a wide range of research on audio quality \cite{schoeffler2013impact,repp}, sound synthesis \cite{de2013real,durr2015implementation}, audio effect design \cite{deman2014a}, source separation \cite{mushram,uhlereiss}, music and emotion analysis \cite{song2013b,song2013a}, and many others \cite{friberg2011comparison}. % codec design? 155 Perceptual evaluation of audio plays an important role in a wide range of research on audio quality \cite{schoeffler2013impact,repp}, sound synthesis \cite{de2013real,durr2015implementation}, audio effect design \cite{deman2014a}, source separation \cite{mushram,uhlereiss}, music and emotion analysis \cite{song2013a,eerola2009prediction}, and many others \cite{friberg2011comparison}. % codec design?
156 156
157 %This work is based in part on the APE audio perceptual evaluation interface for MATLAB \cite{deman2014b}. An important drawback of this toolbox is the need to have MATLAB to create a test and even to run (barring the use of an executable generated by MATLAB), and limited compatibility with both earlier and newer versions of MATLAB, which makes it hard to maintain. On the other hand, a web application generally has the advantage of running in most browsers on most applications. 157 %This work is based in part on the APE audio perceptual evaluation interface for MATLAB \cite{deman2014b}. An important drawback of this toolbox is the need to have MATLAB to create a test and even to run (barring the use of an executable generated by MATLAB), and limited compatibility with both earlier and newer versions of MATLAB, which makes it hard to maintain. On the other hand, a web application generally has the advantage of running in most browsers on most applications.
158 158
159 % IMPORTANT 159 % IMPORTANT
160 %[TO ADD: other interfaces for perceptual evaluation of audio, browser-based or not!] \\ 160 %[TO ADD: other interfaces for perceptual evaluation of audio, browser-based or not!] \\
227 At this point, we have implemented the interface of the MATLAB-based APE (Audio Perceptual Evaluation) toolbox \cite{deman2014b}. This shows one marker for each simultaneously evaluated audio fragment on one or more horizontal axes, that can be moved to rate or rank the respective fragments in terms of any subjective property, as well as a comment box for every marker, and any extra text boxes for extra comments. 227 At this point, we have implemented the interface of the MATLAB-based APE (Audio Perceptual Evaluation) toolbox \cite{deman2014b}. This shows one marker for each simultaneously evaluated audio fragment on one or more horizontal axes, that can be moved to rate or rank the respective fragments in terms of any subjective property, as well as a comment box for every marker, and any extra text boxes for extra comments.
228 The reason for such an interface, where all stimuli are presented on a single rating axis (or multiple axes if multiple subjective qualities need to be evaluated), is that it urges the subject to consider the rating and/or ranking of the stimuli relative to one another, as opposed to comparing each individual stimulus to a given reference, as is the case with e.g. a MUSHRA test \cite{mushra}. As such, it is ideal for any type of test where the goal is to carefully compare samples against each other, like perceptual evaluation of different mixes of music recordings \cite{deman2015a} or sound synthesis models \cite{durr2015implementation}, as opposed to comparing results of source separation algorithms \cite{mushram} or audio with lower data rate \cite{mushra} to a high quality reference signal. 228 The reason for such an interface, where all stimuli are presented on a single rating axis (or multiple axes if multiple subjective qualities need to be evaluated), is that it urges the subject to consider the rating and/or ranking of the stimuli relative to one another, as opposed to comparing each individual stimulus to a given reference, as is the case with e.g. a MUSHRA test \cite{mushra}. As such, it is ideal for any type of test where the goal is to carefully compare samples against each other, like perceptual evaluation of different mixes of music recordings \cite{deman2015a} or sound synthesis models \cite{durr2015implementation}, as opposed to comparing results of source separation algorithms \cite{mushram} or audio with lower data rate \cite{mushra} to a high quality reference signal.
229 229
230 The markers on the slider at the top of the page are positioned randomly, to minimise the bias that may be introduced when the initial positions are near the beginning, end or middle of the slider. Another approach is to place the markers outside of the slider bar at first and have the subject drag them in, but the authors believe this doesn't encourage careful consideration and comparison of the different fragments as the implicit goal of the test becomes to audition and drag each fragment in just once, rather than to compare all fragments rigorously. 230 The markers on the slider at the top of the page are positioned randomly, to minimise the bias that may be introduced when the initial positions are near the beginning, end or middle of the slider. Another approach is to place the markers outside of the slider bar at first and have the subject drag them in, but the authors believe this doesn't encourage careful consideration and comparison of the different fragments as the implicit goal of the test becomes to audition and drag each fragment in just once, rather than to compare all fragments rigorously.
231 231
232 See Figure \ref{fig:interface} for an example of the interface, with eleven fragments and one axis. %? change if a new interface is shown 232 See Figure \ref{fig:interface} for an example of the interface, with six fragments and one axis. %? change if a new interface is shown
233 233
234 %Most of these functions are specific to the APE interface design, for instance the AB test will need a different structure for the audio engine and loading of files, since multiple instances of the same file are required. % more generally these pertain to any typeof multi-stimulus test - not quite useful for AB tests, method of adjustment, ABX, and so on. 234 %Most of these functions are specific to the APE interface design, for instance the AB test will need a different structure for the audio engine and loading of files, since multiple instances of the same file are required. % more generally these pertain to any typeof multi-stimulus test - not quite useful for AB tests, method of adjustment, ABX, and so on.
235 %There are some areas of the design where certain design choices had to be made such as with the markers. 235 %There are some areas of the design where certain design choices had to be made such as with the markers.
236 236
237 %For instance, the option to provide free-text comment fields allows for tests with individual vocabulary methods, as opposed to only allowing quantitative scales associated to a fixed set of descriptors. 237 %For instance, the option to provide free-text comment fields allows for tests with individual vocabulary methods, as opposed to only allowing quantitative scales associated to a fixed set of descriptors.
267 267
268 When an \textit{audioObject} is created, it is given the URL of the audio sample to load. This is downloaded into the browser asynchronously using the \textit{XMLHttpRequest} object, which downloads any file into the JavaScript environment for further processing. This is particularly useful for the Web Audio API because it supports downloading of files in their binary form for decoding. Once downloaded the file is decoded using the Web Audio API offline decoder. This uses the browser available decoding schemes to decode the audio files into raw float32 arrays, which are in turn passed to the relevant \textit{audioObject} for playback. 268 When an \textit{audioObject} is created, it is given the URL of the audio sample to load. This is downloaded into the browser asynchronously using the \textit{XMLHttpRequest} object, which downloads any file into the JavaScript environment for further processing. This is particularly useful for the Web Audio API because it supports downloading of files in their binary form for decoding. Once downloaded the file is decoded using the Web Audio API offline decoder. This uses the browser available decoding schemes to decode the audio files into raw float32 arrays, which are in turn passed to the relevant \textit{audioObject} for playback.
269 269
270 Once each page of the test is completed, identified by pressing the Submit button, the \textit{pageXMLSave(testId)} is called to store all of the collected data until all pages of the test are completed. After the final test and any post-test questions are completed, the \textit{interfaceXMLSave()} function is called. This function generates the final XML file for submission as outlined in Section \ref{sec:setupresultsformats}. 270 Once each page of the test is completed, identified by pressing the Submit button, the \textit{pageXMLSave(testId)} is called to store all of the collected data until all pages of the test are completed. After the final test and any post-test questions are completed, the \textit{interfaceXMLSave()} function is called. This function generates the final XML file for submission as outlined in Section \ref{sec:setupresultsformats}.
271 271
272 272 \section{Support and limitations}\label{sec:support}
273 273
274 Browsers support various audio file formats and are not consistent in any format. Currently the Web Audio API is best supported in Chrome, Firefox, Opera and Safari. All of these support the use of the uncompressed WAV format. Although not a compact, web friendly format, most transport systems are of a high enough bandwidth this should not be a problem. Ogg Vorbis is another well supported format across the four supported major desktop browsers, as well as MP3 (although Firefox may not support all MP3 types) \footnote{https://developer.mozilla.org/en-US/docs/Web/HTML/\\Supported\_media\_formats}. %https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats 274 Browsers support various audio file formats and are not consistent in any format. Currently the Web Audio API is best supported in Chrome, Firefox, Opera and Safari. All of these support the use of the uncompressed WAV format. Although not a compact, web friendly format, most transport systems are of a high enough bandwidth this should not be a problem. Ogg Vorbis is another well supported format across the four supported major desktop browsers, as well as MP3 (although Firefox may not support all MP3 types) \footnote{https://developer.mozilla.org/en-US/docs/Web/HTML/\\Supported\_media\_formats}. %https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats
275 One issue of the Web Audio API is that the sample rate is assigned by the system sound device, rather than requested and does not have the ability to request a different one. % Does this make sense? The problem is across all audio files. 275 One issue of the Web Audio API is that the sample rate is assigned by the system sound device, rather than requested and does not have the ability to request a different one. % Does this make sense? The problem is across all audio files.
276 As the sampling rate and the effect of resampling may be critical for some listening tests, the default operation when an audio file is loaded with a different sample rate to that of the system is to convert the sample rate. To provide a check for this, the desired sample rate can be supplied with the setup XML and checked against. If the sample rates do not match, a browser alert window is shown asking for the sample rate to be correctly adjusted. 276 As the sampling rate and the effect of resampling may be critical for some listening tests, the default operation when an audio file is loaded with a different sample rate to that of the system is to convert the sample rate. To provide a check for this, the desired sample rate can be supplied with the setup XML and checked against. If the sample rates do not match, a browser alert window is shown asking for the sample rate to be correctly adjusted.
277 This happens before any loading or decoding of audio files so the browser will only be instructed to fetch files if the system sample rate meets the requirements, avoiding multiple requests for large files until they are actually needed. 277 This happens before any loading or decoding of audio files so the browser will only be instructed to fetch files if the system sample rate meets the requirements, avoiding multiple requests for large files until they are actually needed.
303 \item \textbf{Loop fragments}: Repeat current fragment when end is reached, until the `Stop audio' or `Submit' button is clicked. 303 \item \textbf{Loop fragments}: Repeat current fragment when end is reached, until the `Stop audio' or `Submit' button is clicked.
304 \item \textbf{Comments}: Displays a separate comment box for each fragment in the page. 304 \item \textbf{Comments}: Displays a separate comment box for each fragment in the page.
305 \item \textbf{General comment}: One comment box, additional to the individual comment boxes, to comment on the test or a feature that some or all of the fragments share. 305 \item \textbf{General comment}: One comment box, additional to the individual comment boxes, to comment on the test or a feature that some or all of the fragments share.
306 \item \textbf{Resampling}: When this is enabled, tracks are resampled to match the subject's system's sample rate (a default feature of the Web Audio API). When it is not, an error is shown when the system does not match the requested sample rate. 306 \item \textbf{Resampling}: When this is enabled, tracks are resampled to match the subject's system's sample rate (a default feature of the Web Audio API). When it is not, an error is shown when the system does not match the requested sample rate.
307 \item \textbf{Randomise page order}: Randomises the order in which different `pages' are presented. % are we calling this 'pages'? 307 \item \textbf{Randomise page order}: Randomises the order in which different `pages' are presented. % are we calling this 'pages'?
308 \item \textbf{Randomise fragment order}: Randomises the order and numbering of the markers and comment boxes corresponding with the fragments. This permutation is stored as well, to be able to interpret references to the numbers in the comments (such as `this is much [brighter] then 4'). 308 \item \textbf{Randomise fragment order}: Randomises the order and numbering of the markers and comment boxes corresponding to the fragments. This permutation is stored as well, to be able to interpret references to the numbers in the comments (such as `this is much [brighter] then 4').
309 \item \textbf{Require playback}: Require that each fragment has been played at least once, if not in full. 309 \item \textbf{Require playback}: Require that each fragment has been played at least once, if not in full.
310 \item \textbf{Require full playback}: If `Require playback' is active, require that each fragment has been played in full. 310 \item \textbf{Require full playback}: If `Require playback' is active, require that each fragment has been played in full.
311 \item \textbf{Require moving}: Require that each marker is moved (dragged) at least once. 311 \item \textbf{Require moving}: Require that each marker is moved (dragged) at least once.
312 \item \textbf{Require comments}: This option allows requiring the subject to require a comment for each track. 312 \item \textbf{Require comments}: This option allows requiring the subject to require a comment for each track.
313 \item \textbf{Repeat test}: Number of times each page in the test should be repeated (none by default), to allow familiarisation with the content and experiment, and to investigate consistency of user and variability due to familiarity. In the setup, each 'page' can be given a repeat count. These are all gathered before shuffling the order so repeated tests are not back-to-back if possible. 313 \item \textbf{Repeat test}: Number of times each page in the test should be repeated (none by default), to allow familiarisation with the content and experiment, and to investigate consistency of user and variability due to familiarity. In the setup, each 'page' can be given a repeat count. These are all gathered before shuffling the order so repeated tests are not back-to-back if possible.
339 339
340 The results also contain information collected by any defined pre/post questions. These are referenced against the setup XML by using the same ID so readable responses can be obtained. Taking from the earlier example of setting up a pre-test question, an example response can be seen in Figure \ref{fig:xmlOut}. 340 The results also contain information collected by any defined pre/post questions. These are referenced against the setup XML by using the same ID so readable responses can be obtained. Taking from the earlier example of setting up a pre-test question, an example response can be seen in Figure \ref{fig:xmlOut}.
341 341
342 Each page of testing is returned with the results of the entire page included in the structure. One `audioElement' node is created per audio fragment per page, along with its ID. This includes several child nodes including the rating between 0 and 1, the comment, and any other collected metrics including how long the element was listened for, the initial position, boolean flags if the element was listened to, if the element was moved and if the element comment box had any comment. Furthermore, each user action (manipulation of any interface element, such as playback or moving a marker) can be logged along with a the corresponding time code. 342 Each page of testing is returned with the results of the entire page included in the structure. One `audioElement' node is created per audio fragment per page, along with its ID. This includes several child nodes including the rating between 0 and 1, the comment, and any other collected metrics including how long the element was listened for, the initial position, boolean flags if the element was listened to, if the element was moved and if the element comment box had any comment. Furthermore, each user action (manipulation of any interface element, such as playback or moving a marker) can be logged along with a the corresponding time code.
343 We also store session data such as the browser the tool was used in. 343 We also store session data such as the browser the tool was used in.
344
345 We provide the option to store the results locally, and/or to have them sent to a server. 344 We provide the option to store the results locally, and/or to have them sent to a server.
346 345
347 %Here is an example of the set up XML and the results XML: % perhaps best to refer to each XML after each section (set up <> results) 346 %Here is an example of the set up XML and the results XML: % perhaps best to refer to each XML after each section (set up <> results)
348 % Should we include an Example of the input and output XML structure?? --> Sure. 347 % Should we include an Example of the input and output XML structure?? --> Sure.
349 348
381 In this paper we have presented an approach to creating a browser-based listening test environment that can be used for a variety of types of perceptual evaluation of audio. 380 In this paper we have presented an approach to creating a browser-based listening test environment that can be used for a variety of types of perceptual evaluation of audio.
382 Specifically, we discussed the use of the toolbox in the context of assessment of preference for different production practices, with identical source material. 381 Specifically, we discussed the use of the toolbox in the context of assessment of preference for different production practices, with identical source material.
383 The purpose of this paper is to outline the design of this tool, to describe our implementation using basic HTML5 functionality, and to discuss design challenges and limitations of our approach. This tool differentiates itself from other perceptual audio tools by enabling web technologies for multiple participants to perform the test without the need for proprietary software such as MATLAB. The tool also allows for any interface to be built using HTML5 elements to create a variety of dynamic, multiple-stimulus listening test interfaces. It enables quick setup of simple tests with the ability to manage complex tests through a single file. Finally it uses the XML document format to store the results allowing for processing and analysis of results in various third party software such as MATLAB or Python. 382 The purpose of this paper is to outline the design of this tool, to describe our implementation using basic HTML5 functionality, and to discuss design challenges and limitations of our approach. This tool differentiates itself from other perceptual audio tools by enabling web technologies for multiple participants to perform the test without the need for proprietary software such as MATLAB. The tool also allows for any interface to be built using HTML5 elements to create a variety of dynamic, multiple-stimulus listening test interfaces. It enables quick setup of simple tests with the ability to manage complex tests through a single file. Finally it uses the XML document format to store the results allowing for processing and analysis of results in various third party software such as MATLAB or Python.
384 383
385 % future work 384 % future work
386 Further work may include the development of other common test designs, such as MUSHRA \cite{mushra}, 2D valence and arousal rating, and others. We will add functionality to assist with setting up large-scale tests with remote subjects, so this becomes straightforward and intuitive. 385 Further work may include the development of other common test designs, such as MUSHRA \cite{mushra}, 2D valence and arousal/activity \cite{ratingeerola2009prediction}, and others. We will add functionality to assist with setting up large-scale tests with remote subjects, so this becomes straightforward and intuitive.
387 In addition, we will keep on improving and expanding the tool, and highly welcome feedback and contributions from the community. 386 In addition, we will keep on improving and expanding the tool, and highly welcome feedback and contributions from the community.
388 387
389 The source code of this tool can be found on \\ \texttt{code.soundsoftware.ac.uk/projects/}\\ \texttt{webaudioevaluationtool}. 388 The source code of this tool can be found on \\ \texttt{code.soundsoftware.ac.uk/projects/}\\ \texttt{webaudioevaluationtool}.
390 389
391 390