Show and Tell Demonstration

Title: High quality real time sound transformation using advanced sinusoidal and noise modeling techniques

Date and Location:

Wednesday, April 22, 15:00 - 17:30, Location: Show and Tell Area A

Presented by

A. Roebel, Analysis/Synthesis Team

Description

The sinusoids and noise signal model is one of the methods of choice if it comes to high quality transformations of musical sounds. Recent enhancements of the sinusoidal signal model, notably the inclusion of the shape invariant signal transformation of speech described by Quatieri/McAulay for the sinusoidal model in 1992 allow to use the sinusoidal sound transformation technology for high quality transformation of speech.

Timbre modification for speech or musical instruments do not only require basic signal transposition. At the same time a precise estimation and modification of the spectral envelope is required. To avoid the fundamental problems that are related to the estimation of auto regressive vocal tract filter models from the speech signal an efficient cepstrum based technique has been developed that allows to reliably estimate the vocal tract filter transfer function. The special advantage of the method is the fact that it allows to derive a nearly optimal filter model order from the fundamental frequency of the speaker.

In the show and tell event we will present a prototype system that combines algorithms for shape invariant signal processing technology and efficient envelope estimation and modification. The system allows high quality for advanced speech transformations like gender changes together with changes of the perceived age of the speaker from child, adult to very old voices. These transformations require pitch changes of 1 (man to woman) to 2 octaves (man to girl) and produce voice quality that often is hard to distinguish from a natural recording (man to woman).

The implementation of the envelope modification algorithms allow gradual cross fades between the spectral and time envelopes of different sound sources which can be used to generate personalities with ambiguous timbre characteristics. Due to the fact that the underlying sound transformation is based on a sinusoidal model high quality sound transformation can be obtained for non speech sounds as well. Accordingly the gradual mixing of the spectral and time envelopes can be applied to arbitrary sound sources, notably allows us to regenerate vocoder effects with significantly improved quality.

The implementation is highly optimized using SIMD technology such that all transformation can be performed in real time using recent desktop computers. Most of the parameters that are available to describe the for voice transformation algorithms are derived from high level source and target descriptions (Man, Woman, old, young) such that the application can be used without background in signal processing.


©2016 Conference Management Services, Inc. -||- email: webmaster@icassp09.com -||- Last updated Thursday, February 12, 2009