Show and Tell Demonstration

Title: Easy Prosody - a GUI for Modifying Text-to-Speech Prosody

Date and Location:

Thursday, April 23, 15:30 - 18:00, Location: Show and Tell Area A

Presented by

Yao Qian, Chang Guo, Frank Soong

Description

Synthesized voice has become more commonly used in many applications where spoken responses are needed, e.g. phone-based information query, reservation and ordering via a speech user interface, speech playback of email/SMS and spoken appointment reminder. The quality of synthesized speech generated from a state-of-the-art Text-to-Speech (TTS) system can meet the quality requirement of such applications in general. Occasionally, a sentence with an inappropriate prosody may still be generated. A user friendly interface for easy modification of such prosody is highly desirable. In other applications like Computer-Assisted Language Learning (CALL), synthesizing a stylized prosody demonstrated by the teacher but in student’s own voice can be very useful. This demo will show a speech analysis, modification and synthesis system where a graphic user interface can display the different attributes of synthesized speech, including: waveform, spectrogram, loudness, pitch, along with the segmentation boundaries of speech units (phone, syllable, words, etc.) The prosody related parameters like duration, pitch, loudness can be conveniently modified with a pen/mouse by the user and corresponding speech is re-synthesized and played back. The modification can be done at different unit levels like phoneme, syllable, word, phrase or sentence. The applications of this technology are:

Acquisition and learning of natural prosody to improve TTS quality;
Computer aided language learning (CALL);
Speech synthesis with transplanted or manually specified prosody;
Singing voice synthesis;
Speech perception study.