Source-filter models and deep learning models for voice synthesis

Devansh Zurale

Abstract

The relationship and, under certain conditions equivalence, between LPC and a piecewise cylindrical waveguide (or Kelly-Lochbaum) model of the vocal tract is found to be largely tied to the lip reflection and transmission. Herein, I will review the elements of a piecewise waveguide model. The obtained corresponding transfer functions are presented for two cases, one in which the boundary losses are scalar, showing a more obvious relationship to the all-pole LPC estimation, and one in which the lip reflection is frequency dependent, which introduces a zero in the transfer function and a more obfuscated relationship to LPC. Furthermore, I will review some state of the art glottal flow inversion methods while shedding some light on the importance of the lip radiation filter in the accuracy of these methods.

Bio

I am currently pursuing a Ph.D. in Computer Music at the University of California, San Diego. I previously did my Masters in Music Technology from Carnegie Mellon University. My primary research interest has been singing voice analysis and synthesis through a deeper understanding of the source-filter theory. I am now also delving in the area of spatial audio, specifically towards exploring some deep learning methods for HRTF personalization. In addition to my academic side, I am a trained Hindustani Classical vocalist. I also play the keyboard and the drums and I find an interest in a wide variety of music genres. I find that jazz and Indian classical genres of music go really well with each other and I enjoy performing and listening to music that intersects the two.

singing_voice_presentation.pptx 13.3 MB, downloaded 48 times Polina Proutskova, 2021-07-24 01:08 AM