Latent Timbre Synthesis
- PDF / 2,435,143 Bytes
- 18 Pages / 595.276 x 790.866 pts Page_size
- 57 Downloads / 188 Views
(0123456789().,-volV)(0123456789(). ,- volV)
S. I : NEURAL NETWORKS IN ART, SOUND AND DESIGN
Latent Timbre Synthesis Audio-based variational auto-encoders for music composition and sound design applications Kıvanç Tatar1
•
Daniel Bisig2 • Philippe Pasquier1
Received: 26 June 2020 / Accepted: 5 October 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract We present the Latent Timbre Synthesis, a new audio synthesis method using deep learning. The synthesis method allows composers and sound designers to interpolate and extrapolate between the timbre of multiple sounds using the latent space of audio frames. We provide the details of two Variational Autoencoder architectures for the Latent Timbre Synthesis and compare their advantages and drawbacks. The implementation includes a fully working application with a graphical user interface, called interpolate_two, which enables practitioners to generate timbres between two audio excerpts of their selection using interpolation and extrapolation in the latent space of audio frames. Our implementation is open source, and we aim to improve the accessibility of this technology by providing a guide for users with any technical background. Our study includes a qualitative analysis where nine composers evaluated the Latent Timbre Synthesis and the interpolate_two application within their practices. Keywords Audio synthesis Neural networks Signal processing Computer assisted music composition
1 Introduction Modern sound synthesizers come loaded with many parameters, with very large nonlinear, non-modal, search spaces. This richness comes to the detriment of searchability as one cannot easily or efficiently find a particular sound, particular sonic textures, or generate a transition between two textures. Consequently, sound designers and musicians most often rely on audio samples (of instruments, sound effects) and their manipulation rather than the more flexible sound synthesis approach of these sounds and their sonic surroundings. In previous work on synthesizer preset generation [35], we demonstrated how, given a & Kıvanc¸ Tatar [email protected] Daniel Bisig [email protected] Philippe Pasquier [email protected] 1
Simon Fraser University, Vancouver, BC, Canada
2
Zurich University of the Arts, Zurich, Switzerland
target sample, PresetGen can find a preset that generates the closest to the sample. In this work, we investigate a new DL based-method by which a synthesizer model is trained on selected audio textures, allowing musicians and sound designers to achieve their synthesis goals by exploring a sonic space, with interpolation and extrapolation between sonic textures. The rise in popularity of deep learning (DL) architectures has led to promising new research using deep learning (DL) for musical applications, audio transformation and sound synthesis [2]. The demand for sound synthesizers is projected to grow at an accelerated rate in the next five years [39], and in parallel, there is increasing interest in flexible, versatile, yet controllab
Data Loading...